Files
bee/bible-local/architecture/system-overview.md
Mikhail Chusavitin 1768bb58dd Merge debug/prod into single ISO build, fix NVIDIA module loading
## ISO build consolidation
- Remove separate debug/prod split: overlay-debug/, build-debug.sh,
  mkimg.bee_debug.sh, genapkovl-bee_debug.sh all deleted
- Single overlay: iso/overlay/ (was overlay-debug content)
- Single build script: build.sh (SSH, TUI, NVIDIA, vendor tools, bee-release)
- Single mkimage profile: bee (with dropbear, dialog, strace, gcompat, etc.)

## NVIDIA fixes
- Modules now stored at /usr/local/lib/nvidia/ instead of
  /lib/modules/<kver>/extra/nvidia/ — modloop squashfs mounts over that
  path at boot making overlay content there inaccessible
- bee-nvidia init: load via insmod (absolute path), not modprobe
- bee-nvidia init: create libnvidia-ml.so.1/libcuda.so.1 symlinks in /usr/lib/
- build-nvidia-module.sh: always install linux-lts-dev (not conditional) —
  stale 6.6.x headers caused wrong-kernel modules that never loaded at runtime
- build-nvidia-module.sh: create soname symlinks in cache
- KERNEL_VERSION in VERSIONS updated 6.6 → 6.12
- gcompat added to ISO packages (nvidia-smi is a glibc binary on musl Alpine)

## Service ordering
- bee-audit: add `after bee-nvidia` so NVIDIA enrichment always succeeds

## New tooling
- iso/builder/smoketest.sh: SSH smoke test for post-boot ISO validation
- iso/builder/build-gpu-burn.sh: builds gpu_burn vendor binary (CUDA 12.8+)
- vendor/gpu_burn included automatically if placed in iso/vendor/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 20:14:18 +03:00

64 lines
2.3 KiB
Markdown

# System Overview — bee
## What it does
Hardware audit LiveCD. Boots on a server via BMC virtual media or USB.
Collects hardware inventory at OS level (not through BMC/Redfish).
Produces `HardwareIngestRequest` JSON compatible with core/reanimator.
## Why it exists
Fills gaps where Redfish/logpile is blind:
- NVMe serials and SMART data
- DIMM serials and slot layout
- GPU serials and VBIOS versions
- Physical disks behind RAID controllers
- Full SMART wear telemetry
- NIC firmware versions
## In scope
- Read-only hardware inventory: board, CPU, memory, storage, PCIe, PSU, GPU, NIC, RAID
- Unattended operation — no user interaction required
- NVIDIA proprietary driver loaded at boot for GPU enrichment via `nvidia-smi`
- SSH access (dropbear) always available for inspection and debugging
- Interactive TUI (`bee-tui`) for network setup, service management, GPU tests
- GPU stress testing via `gpu_burn` (vendor binary, optional)
## Out of scope
- Any writes to the server being audited
- Network configuration changes (persistent)
- BMC/IPMI configuration
- Anything requiring persistent storage on the audited machine
- Windows support
## Tech stack
| Component | Technology |
|---|---|
| Audit binary | Go, static, `CGO_ENABLED=0` |
| LiveCD | Alpine Linux 3.21, linux-lts 6.12.x |
| ISO build | Alpine mkimage + apkovl overlay (`iso/overlay/`) |
| Init system | OpenRC |
| SSH | Dropbear (always included) |
| NVIDIA driver | Proprietary `.run` installer, built against linux-lts headers |
| NVIDIA modules | Loaded via `insmod` from `/usr/local/lib/nvidia/` (not modloop path) |
| glibc compat | `gcompat` — required for `nvidia-smi` (glibc binary on musl Alpine) |
| Builder VM | Alpine 3.21 |
## Key paths
| Path | Purpose |
|---|---|
| `audit/cmd/audit/` | CLI entry point |
| `audit/internal/collector/` | Per-subsystem collectors |
| `audit/internal/schema/` | HardwareIngestRequest types |
| `iso/builder/` | ISO build scripts and mkimage profile |
| `iso/overlay/` | Single overlay: files injected into ISO via apkovl |
| `iso/vendor/` | Optional pre-built vendor binaries (storcli64, gpu_burn, …) |
| `iso/builder/VERSIONS` | Pinned versions: Alpine, Go, NVIDIA driver, kernel |
| `iso/builder/smoketest.sh` | Post-boot smoke test — run via SSH to verify live ISO |
| `dist/` | Build outputs (gitignored) |
| `iso/out/` | Downloaded ISO files (gitignored) |