Files
bee/bible-local/architecture/system-overview.md
Mikhail Chusavitin 1768bb58dd Merge debug/prod into single ISO build, fix NVIDIA module loading
## ISO build consolidation
- Remove separate debug/prod split: overlay-debug/, build-debug.sh,
  mkimg.bee_debug.sh, genapkovl-bee_debug.sh all deleted
- Single overlay: iso/overlay/ (was overlay-debug content)
- Single build script: build.sh (SSH, TUI, NVIDIA, vendor tools, bee-release)
- Single mkimage profile: bee (with dropbear, dialog, strace, gcompat, etc.)

## NVIDIA fixes
- Modules now stored at /usr/local/lib/nvidia/ instead of
  /lib/modules/<kver>/extra/nvidia/ — modloop squashfs mounts over that
  path at boot making overlay content there inaccessible
- bee-nvidia init: load via insmod (absolute path), not modprobe
- bee-nvidia init: create libnvidia-ml.so.1/libcuda.so.1 symlinks in /usr/lib/
- build-nvidia-module.sh: always install linux-lts-dev (not conditional) —
  stale 6.6.x headers caused wrong-kernel modules that never loaded at runtime
- build-nvidia-module.sh: create soname symlinks in cache
- KERNEL_VERSION in VERSIONS updated 6.6 → 6.12
- gcompat added to ISO packages (nvidia-smi is a glibc binary on musl Alpine)

## Service ordering
- bee-audit: add `after bee-nvidia` so NVIDIA enrichment always succeeds

## New tooling
- iso/builder/smoketest.sh: SSH smoke test for post-boot ISO validation
- iso/builder/build-gpu-burn.sh: builds gpu_burn vendor binary (CUDA 12.8+)
- vendor/gpu_burn included automatically if placed in iso/vendor/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 20:14:18 +03:00

2.3 KiB

System Overview — bee

What it does

Hardware audit LiveCD. Boots on a server via BMC virtual media or USB. Collects hardware inventory at OS level (not through BMC/Redfish). Produces HardwareIngestRequest JSON compatible with core/reanimator.

Why it exists

Fills gaps where Redfish/logpile is blind:

  • NVMe serials and SMART data
  • DIMM serials and slot layout
  • GPU serials and VBIOS versions
  • Physical disks behind RAID controllers
  • Full SMART wear telemetry
  • NIC firmware versions

In scope

  • Read-only hardware inventory: board, CPU, memory, storage, PCIe, PSU, GPU, NIC, RAID
  • Unattended operation — no user interaction required
  • NVIDIA proprietary driver loaded at boot for GPU enrichment via nvidia-smi
  • SSH access (dropbear) always available for inspection and debugging
  • Interactive TUI (bee-tui) for network setup, service management, GPU tests
  • GPU stress testing via gpu_burn (vendor binary, optional)

Out of scope

  • Any writes to the server being audited
  • Network configuration changes (persistent)
  • BMC/IPMI configuration
  • Anything requiring persistent storage on the audited machine
  • Windows support

Tech stack

Component Technology
Audit binary Go, static, CGO_ENABLED=0
LiveCD Alpine Linux 3.21, linux-lts 6.12.x
ISO build Alpine mkimage + apkovl overlay (iso/overlay/)
Init system OpenRC
SSH Dropbear (always included)
NVIDIA driver Proprietary .run installer, built against linux-lts headers
NVIDIA modules Loaded via insmod from /usr/local/lib/nvidia/ (not modloop path)
glibc compat gcompat — required for nvidia-smi (glibc binary on musl Alpine)
Builder VM Alpine 3.21

Key paths

Path Purpose
audit/cmd/audit/ CLI entry point
audit/internal/collector/ Per-subsystem collectors
audit/internal/schema/ HardwareIngestRequest types
iso/builder/ ISO build scripts and mkimage profile
iso/overlay/ Single overlay: files injected into ISO via apkovl
iso/vendor/ Optional pre-built vendor binaries (storcli64, gpu_burn, …)
iso/builder/VERSIONS Pinned versions: Alpine, Go, NVIDIA driver, kernel
iso/builder/smoketest.sh Post-boot smoke test — run via SSH to verify live ISO
dist/ Build outputs (gitignored)
iso/out/ Downloaded ISO files (gitignored)