- Remove audit/internal/tui/ (~3000 LOC, bubbletea/lipgloss/reanimator deps) - Add /api/* REST+SSE endpoints: audit, SAT (nvidia/memory/storage/cpu), services, network, export, tools, live metrics stream - Add async job manager with SSE streaming for long-running operations - Add platform.SampleLiveMetrics() for live fan/temp/power/GPU polling - Add multi-page web UI (vanilla JS): Dashboard, Metrics charts, Tests, Burn-in, Network, Services, Export, Tools - Add bee-desktop.service: openbox + Xorg + Chromium opening http://localhost/ - Add openbox/tint2/xorg/xinit/xterm/chromium to ISO package list - Update .profile, bee.sh, and bible-local docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
114 lines
5.8 KiB
Markdown
114 lines
5.8 KiB
Markdown
# System Overview — bee
|
|
|
|
## What it does
|
|
|
|
Hardware audit LiveCD. Boots on a server via BMC virtual media or USB.
|
|
Collects hardware inventory at OS level (not through BMC/Redfish).
|
|
Produces `HardwareIngestRequest` JSON compatible with the contract in `bible-local/docs/hardware-ingest-contract.md`.
|
|
|
|
## Why it exists
|
|
|
|
Fills gaps where Redfish/logpile is blind:
|
|
- NVMe serials and SMART data
|
|
- DIMM serials and slot layout
|
|
- GPU serials and VBIOS versions
|
|
- Physical disks behind RAID controllers
|
|
- Full SMART wear telemetry
|
|
- NIC firmware versions
|
|
|
|
## In scope
|
|
|
|
- Read-only hardware inventory: board, CPU, memory, storage, PCIe, PSU, GPU, NIC, RAID
|
|
- Machine-readable health summary derived from collector verdicts
|
|
- Operator-triggered acceptance tests for NVIDIA, memory, and storage
|
|
- NVIDIA SAT includes both diagnostic collection and mixed-precision GPU stress via `bee-gpu-stress`
|
|
- `bee-gpu-stress` should exercise tensor/inference paths (`fp16`, `fp32`/TF32, `fp8`, `fp4` when supported by the GPU/userspace stack) and fall back to Driver API PTX burn only if cuBLASLt is unavailable
|
|
- Automatic boot audit with operator-facing local console and SSH access
|
|
- NVIDIA proprietary driver loaded at boot for GPU enrichment via `nvidia-smi`
|
|
- SSH access (OpenSSH) always available for inspection and debugging
|
|
- Full web UI via `bee web` on port 80: interactive control panel with live metrics, SAT tests, network config, service management, export, and tools
|
|
- Local operator desktop: openbox + Xorg + Chromium auto-opening `http://localhost/`
|
|
- Local `tty1` operator UX: `bee` autologin, openbox desktop auto-starts with Chromium on `http://localhost/`
|
|
|
|
## Network isolation — CRITICAL
|
|
|
|
**The live CD runs in an isolated network segment with no internet access.**
|
|
|
|
- All tools, drivers, and binaries MUST be pre-baked into the ISO at build time
|
|
- No package installation at boot — packages are installed during ISO creation, not at runtime
|
|
- No downloads at boot — NVIDIA modules, vendor tools, and all binaries come from the ISO overlay
|
|
- DHCP is used only for LAN access (SSH from operator laptop); internet is NOT assumed
|
|
- Any feature requiring network downloads cannot be added to the live CD
|
|
|
|
## Out of scope
|
|
|
|
- Any writes to the server being audited
|
|
- Network configuration changes (persistent)
|
|
- BMC/IPMI configuration
|
|
- Anything requiring persistent storage on the audited machine
|
|
- Windows support
|
|
- Any functionality requiring internet access at boot
|
|
- Component lifecycle/history across multiple snapshots
|
|
- Status transition history (`status_history`, `status_changed_at`) derived from previous exports
|
|
- Replacement detection between two or more audit runs
|
|
|
|
## Contract boundary
|
|
|
|
- `bee` is responsible for the current hardware snapshot only.
|
|
- `bee` should populate current component state, hardware inventory, telemetry, and `status_checked_at`.
|
|
- Historical status transitions and component replacement logic belong to the centralized ingest/lifecycle system, not to `bee`.
|
|
- Contract fields that have no honest local source on a generic Linux host may remain empty.
|
|
|
|
## Tech stack
|
|
|
|
| Component | Technology |
|
|
|---|---|
|
|
| Audit binary | Go, static, `CGO_ENABLED=0` |
|
|
| Live ISO | Debian 12 (bookworm), amd64 live-build image |
|
|
| ISO build | Debian `live-build` + overlay sync into `config/includes.chroot/` |
|
|
| Init system | `systemd` |
|
|
| SSH | OpenSSH server |
|
|
| NVIDIA driver | Proprietary `.run` installer, built against Debian kernel headers |
|
|
| NVIDIA modules | Loaded via `insmod` from `/usr/local/lib/nvidia/` |
|
|
| GPU stress backend | `bee-gpu-stress` + cuBLASLt/cuBLAS/cudart mixed-precision GEMM, with Driver API PTX fallback |
|
|
| Builder | Debian 12 host/VM or Debian 12 container image |
|
|
|
|
## Operator UX
|
|
|
|
- On the live ISO, `tty1` autologins as `bee`
|
|
- `bee-desktop.service` starts X11 + openbox + Chromium on display `:0`
|
|
- Chromium opens `http://localhost/` — the full web UI
|
|
- SSH remains available independently of the local console path
|
|
- Remote operators can open `http://<ip>/` in any browser on the same LAN
|
|
- VM-oriented builds also include `qemu-guest-agent` and serial console support for debugging
|
|
- The ISO boots with `toram`, so loss of the original USB/BMC virtual media after boot should not break already-installed runtime binaries
|
|
|
|
## Runtime split
|
|
|
|
- The main Go application must run both on a normal Linux host and inside the live ISO
|
|
- Live-ISO-only responsibilities stay in `iso/` integration code
|
|
- Live ISO launches the Go CLI with `--runtime livecd`
|
|
- Local/manual runs use `--runtime auto` or `--runtime local`
|
|
- Live ISO targets must have enough RAM for the full compressed live medium plus runtime working set because the boot medium is copied into memory at startup
|
|
|
|
## Key paths
|
|
|
|
| Path | Purpose |
|
|
|---|---|
|
|
| `audit/cmd/bee/` | Main CLI entry point |
|
|
| `audit/internal/collector/` | Per-subsystem collectors |
|
|
| `audit/internal/schema/` | HardwareIngestRequest types |
|
|
| `iso/builder/` | ISO build scripts and `live-build` profile |
|
|
| `iso/overlay/` | Source overlay copied into a staged build overlay |
|
|
| `iso/vendor/` | Optional pre-built vendor binaries (storcli64, sas2ircu, sas3ircu, arcconf, ssacli, …) |
|
|
| `internal/chart/` | Git submodule with `reanimator/chart`, embedded into `bee web` |
|
|
| `iso/builder/VERSIONS` | Pinned versions: Debian, Go, NVIDIA driver, kernel ABI |
|
|
| `iso/builder/smoketest.sh` | Post-boot smoke test — run via SSH to verify live ISO |
|
|
| `iso/overlay/etc/profile.d/bee.sh` | tty1 welcome message with web UI URLs |
|
|
| `iso/overlay/home/bee/.profile` | `bee` shell profile (PATH only) |
|
|
| `iso/overlay/etc/systemd/system/bee-desktop.service` | starts X11 + openbox + chromium |
|
|
| `iso/overlay/usr/local/bin/bee-desktop` | startx wrapper for bee-desktop.service |
|
|
| `iso/overlay/usr/local/bin/bee-openbox-session` | xinitrc: tint2 + chromium + openbox |
|
|
| `dist/` | Build outputs (gitignored) |
|
|
| `iso/out/` | Downloaded ISO files (gitignored) |
|