Add health verdicts and acceptance tests
This commit is contained in:
@@ -29,6 +29,7 @@ local-fs.target
|
||||
Reason: the modules are shipped in the ISO overlay under `/usr/local/lib/nvidia/`, not in the host module tree.
|
||||
- `bee-audit.service` does not wait for `network-online.target`; audit is local and must run even if DHCP is broken.
|
||||
- `bee-audit.service` logs audit failures but does not turn partial collector problems into a boot blocker.
|
||||
- Audit JSON now includes a `hardware.summary` block with overall verdict and warning/failure counts.
|
||||
|
||||
## Console and login flow
|
||||
|
||||
@@ -59,7 +60,7 @@ build.sh [--authorized-keys /path/to/keys]
|
||||
3. inject authorized_keys into staged `root/.ssh/` (or set password fallback marker)
|
||||
4. copy `bee` binary → staged `/usr/local/bin/bee`
|
||||
5. copy vendor binaries from `iso/vendor/` → staged `/usr/local/bin/`
|
||||
(`storcli64`, `sas2ircu`, `sas3ircu`, `mstflint` — each optional)
|
||||
(`storcli64`, `sas2ircu`, `sas3ircu`, `arcconf`, `ssacli` — optional; `mstflint` comes from the Debian package set)
|
||||
6. `build-nvidia-module.sh`:
|
||||
a. install Debian kernel headers if missing
|
||||
b. download NVIDIA `.run` installer (sha256 verified, cached in `dist/`)
|
||||
@@ -119,10 +120,15 @@ Current validation state:
|
||||
3. memory collector (dmidecode -t 17)
|
||||
4. storage collector (lsblk -J, smartctl -j, nvme id-ctrl, nvme smart-log)
|
||||
5. pcie collector (lspci -vmm -D, /sys/bus/pci/devices/)
|
||||
6. psu collector (ipmitool fru — silent if no /dev/ipmi0)
|
||||
6. psu collector (ipmitool fru + sdr — silent if no /dev/ipmi0)
|
||||
7. nvidia enrichment (nvidia-smi — skipped if binary absent or driver not loaded)
|
||||
8. output JSON → /var/log/bee-audit.json
|
||||
9. QR summary to stdout (qrencode if available)
|
||||
```
|
||||
|
||||
Every collector returns `nil, nil` on tool-not-found. Errors are logged, never fatal.
|
||||
|
||||
Acceptance flows:
|
||||
- `bee sat nvidia` → diagnostic archive with `nvidia-smi -q` + `nvidia-bug-report` + lightweight `bee-gpu-stress`
|
||||
- `bee sat memory` → `memtester` archive
|
||||
- `bee sat storage` → SMART/NVMe diagnostic archive and short self-test trigger where supported
|
||||
|
||||
@@ -19,6 +19,9 @@ Fills gaps where Redfish/logpile is blind:
|
||||
## In scope
|
||||
|
||||
- Read-only hardware inventory: board, CPU, memory, storage, PCIe, PSU, GPU, NIC, RAID
|
||||
- Machine-readable health summary derived from collector verdicts
|
||||
- Operator-triggered acceptance tests for NVIDIA, memory, and storage
|
||||
- NVIDIA SAT includes both diagnostic collection and lightweight GPU stress via `bee-gpu-stress`
|
||||
- Automatic boot audit with operator-facing local console and SSH access
|
||||
- NVIDIA proprietary driver loaded at boot for GPU enrichment via `nvidia-smi`
|
||||
- SSH access (OpenSSH) always available for inspection and debugging
|
||||
@@ -81,7 +84,7 @@ Fills gaps where Redfish/logpile is blind:
|
||||
| `audit/internal/schema/` | HardwareIngestRequest types |
|
||||
| `iso/builder/` | ISO build scripts and `live-build` profile |
|
||||
| `iso/overlay/` | Source overlay copied into a staged build overlay |
|
||||
| `iso/vendor/` | Optional pre-built vendor binaries (storcli64, sas2ircu, sas3ircu, mstflint, …) |
|
||||
| `iso/vendor/` | Optional pre-built vendor binaries (storcli64, sas2ircu, sas3ircu, arcconf, ssacli, …) |
|
||||
| `iso/builder/VERSIONS` | Pinned versions: Debian, Go, NVIDIA driver, kernel ABI |
|
||||
| `iso/builder/smoketest.sh` | Post-boot smoke test — run via SSH to verify live ISO |
|
||||
| `iso/overlay/etc/profile.d/bee.sh` | `menu` helper + tty1 auto-start policy |
|
||||
|
||||
Reference in New Issue
Block a user