Add health verdicts and acceptance tests
This commit is contained in:
@@ -29,6 +29,7 @@ local-fs.target
|
||||
Reason: the modules are shipped in the ISO overlay under `/usr/local/lib/nvidia/`, not in the host module tree.
|
||||
- `bee-audit.service` does not wait for `network-online.target`; audit is local and must run even if DHCP is broken.
|
||||
- `bee-audit.service` logs audit failures but does not turn partial collector problems into a boot blocker.
|
||||
- Audit JSON now includes a `hardware.summary` block with overall verdict and warning/failure counts.
|
||||
|
||||
## Console and login flow
|
||||
|
||||
@@ -59,7 +60,7 @@ build.sh [--authorized-keys /path/to/keys]
|
||||
3. inject authorized_keys into staged `root/.ssh/` (or set password fallback marker)
|
||||
4. copy `bee` binary → staged `/usr/local/bin/bee`
|
||||
5. copy vendor binaries from `iso/vendor/` → staged `/usr/local/bin/`
|
||||
(`storcli64`, `sas2ircu`, `sas3ircu`, `mstflint` — each optional)
|
||||
(`storcli64`, `sas2ircu`, `sas3ircu`, `arcconf`, `ssacli` — optional; `mstflint` comes from the Debian package set)
|
||||
6. `build-nvidia-module.sh`:
|
||||
a. install Debian kernel headers if missing
|
||||
b. download NVIDIA `.run` installer (sha256 verified, cached in `dist/`)
|
||||
@@ -119,10 +120,15 @@ Current validation state:
|
||||
3. memory collector (dmidecode -t 17)
|
||||
4. storage collector (lsblk -J, smartctl -j, nvme id-ctrl, nvme smart-log)
|
||||
5. pcie collector (lspci -vmm -D, /sys/bus/pci/devices/)
|
||||
6. psu collector (ipmitool fru — silent if no /dev/ipmi0)
|
||||
6. psu collector (ipmitool fru + sdr — silent if no /dev/ipmi0)
|
||||
7. nvidia enrichment (nvidia-smi — skipped if binary absent or driver not loaded)
|
||||
8. output JSON → /var/log/bee-audit.json
|
||||
9. QR summary to stdout (qrencode if available)
|
||||
```
|
||||
|
||||
Every collector returns `nil, nil` on tool-not-found. Errors are logged, never fatal.
|
||||
|
||||
Acceptance flows:
|
||||
- `bee sat nvidia` → diagnostic archive with `nvidia-smi -q` + `nvidia-bug-report` + lightweight `bee-gpu-stress`
|
||||
- `bee sat memory` → `memtester` archive
|
||||
- `bee sat storage` → SMART/NVMe diagnostic archive and short self-test trigger where supported
|
||||
|
||||
Reference in New Issue
Block a user