iso: improve burn-in, export, and live boot

This commit is contained in:
Mikhail Chusavitin
2026-03-26 18:56:19 +03:00
parent 67a215c66f
commit fc5c2019aa
23 changed files with 1706 additions and 168 deletions

View File

@@ -21,7 +21,8 @@ Fills gaps where Redfish/logpile is blind:
- Read-only hardware inventory: board, CPU, memory, storage, PCIe, PSU, GPU, NIC, RAID
- Machine-readable health summary derived from collector verdicts
- Operator-triggered acceptance tests for NVIDIA, memory, and storage
- NVIDIA SAT includes both diagnostic collection and lightweight GPU stress via `bee-gpu-stress`
- NVIDIA SAT includes both diagnostic collection and mixed-precision GPU stress via `bee-gpu-stress`
- `bee-gpu-stress` should exercise tensor/inference paths (`fp16`, `fp32`/TF32, `fp8`, `fp4` when supported by the GPU/userspace stack) and fall back to Driver API PTX burn only if cuBLASLt is unavailable
- Automatic boot audit with operator-facing local console and SSH access
- NVIDIA proprietary driver loaded at boot for GPU enrichment via `nvidia-smi`
- SSH access (OpenSSH) always available for inspection and debugging
@@ -69,6 +70,7 @@ Fills gaps where Redfish/logpile is blind:
| SSH | OpenSSH server |
| NVIDIA driver | Proprietary `.run` installer, built against Debian kernel headers |
| NVIDIA modules | Loaded via `insmod` from `/usr/local/lib/nvidia/` |
| GPU stress backend | `bee-gpu-stress` + cuBLASLt/cuBLAS/cudart mixed-precision GEMM, with Driver API PTX fallback |
| Builder | Debian 12 host/VM or Debian 12 container image |
## Operator UX
@@ -78,6 +80,7 @@ Fills gaps where Redfish/logpile is blind:
- The TUI itself executes privileged actions as `root` via `sudo -n`
- SSH remains available independently of the local console path
- VM-oriented builds also include `qemu-guest-agent` and serial console support for debugging
- The ISO boots with `toram`, so loss of the original USB/BMC virtual media after boot should not break already-installed runtime binaries
## Runtime split
@@ -85,6 +88,7 @@ Fills gaps where Redfish/logpile is blind:
- Live-ISO-only responsibilities stay in `iso/` integration code
- Live ISO launches the Go CLI with `--runtime livecd`
- Local/manual runs use `--runtime auto` or `--runtime local`
- Live ISO targets must have enough RAM for the full compressed live medium plus runtime working set because the boot medium is copied into memory at startup
## Key paths