Files
bee/bible-local/architecture/system-overview.md
2026-03-26 18:56:19 +03:00

5.5 KiB

System Overview — bee

What it does

Hardware audit LiveCD. Boots on a server via BMC virtual media or USB. Collects hardware inventory at OS level (not through BMC/Redfish). Produces HardwareIngestRequest JSON compatible with the contract in bible-local/docs/hardware-ingest-contract.md.

Why it exists

Fills gaps where Redfish/logpile is blind:

  • NVMe serials and SMART data
  • DIMM serials and slot layout
  • GPU serials and VBIOS versions
  • Physical disks behind RAID controllers
  • Full SMART wear telemetry
  • NIC firmware versions

In scope

  • Read-only hardware inventory: board, CPU, memory, storage, PCIe, PSU, GPU, NIC, RAID
  • Machine-readable health summary derived from collector verdicts
  • Operator-triggered acceptance tests for NVIDIA, memory, and storage
  • NVIDIA SAT includes both diagnostic collection and mixed-precision GPU stress via bee-gpu-stress
  • bee-gpu-stress should exercise tensor/inference paths (fp16, fp32/TF32, fp8, fp4 when supported by the GPU/userspace stack) and fall back to Driver API PTX burn only if cuBLASLt is unavailable
  • Automatic boot audit with operator-facing local console and SSH access
  • NVIDIA proprietary driver loaded at boot for GPU enrichment via nvidia-smi
  • SSH access (OpenSSH) always available for inspection and debugging
  • Interactive Go TUI via bee tui for network setup, service management, and acceptance tests
  • Read-only web viewer via bee web, rendering the latest audit snapshot through the embedded Reanimator Chart
  • Local tty1 operator UX: bee autologin, menu auto-start, privileged actions via sudo -n

Network isolation — CRITICAL

The live CD runs in an isolated network segment with no internet access.

  • All tools, drivers, and binaries MUST be pre-baked into the ISO at build time
  • No package installation at boot — packages are installed during ISO creation, not at runtime
  • No downloads at boot — NVIDIA modules, vendor tools, and all binaries come from the ISO overlay
  • DHCP is used only for LAN access (SSH from operator laptop); internet is NOT assumed
  • Any feature requiring network downloads cannot be added to the live CD

Out of scope

  • Any writes to the server being audited
  • Network configuration changes (persistent)
  • BMC/IPMI configuration
  • Anything requiring persistent storage on the audited machine
  • Windows support
  • Any functionality requiring internet access at boot
  • Component lifecycle/history across multiple snapshots
  • Status transition history (status_history, status_changed_at) derived from previous exports
  • Replacement detection between two or more audit runs

Contract boundary

  • bee is responsible for the current hardware snapshot only.
  • bee should populate current component state, hardware inventory, telemetry, and status_checked_at.
  • Historical status transitions and component replacement logic belong to the centralized ingest/lifecycle system, not to bee.
  • Contract fields that have no honest local source on a generic Linux host may remain empty.

Tech stack

Component Technology
Audit binary Go, static, CGO_ENABLED=0
Live ISO Debian 12 (bookworm), amd64 live-build image
ISO build Debian live-build + overlay sync into config/includes.chroot/
Init system systemd
SSH OpenSSH server
NVIDIA driver Proprietary .run installer, built against Debian kernel headers
NVIDIA modules Loaded via insmod from /usr/local/lib/nvidia/
GPU stress backend bee-gpu-stress + cuBLASLt/cuBLAS/cudart mixed-precision GEMM, with Driver API PTX fallback
Builder Debian 12 host/VM or Debian 12 container image

Operator UX

  • On the live ISO, tty1 autologins as bee
  • The login profile auto-runs menu, which enters the Go TUI
  • The TUI itself executes privileged actions as root via sudo -n
  • SSH remains available independently of the local console path
  • VM-oriented builds also include qemu-guest-agent and serial console support for debugging
  • The ISO boots with toram, so loss of the original USB/BMC virtual media after boot should not break already-installed runtime binaries

Runtime split

  • The main Go application must run both on a normal Linux host and inside the live ISO
  • Live-ISO-only responsibilities stay in iso/ integration code
  • Live ISO launches the Go CLI with --runtime livecd
  • Local/manual runs use --runtime auto or --runtime local
  • Live ISO targets must have enough RAM for the full compressed live medium plus runtime working set because the boot medium is copied into memory at startup

Key paths

Path Purpose
audit/cmd/bee/ Main CLI entry point
audit/internal/collector/ Per-subsystem collectors
audit/internal/schema/ HardwareIngestRequest types
iso/builder/ ISO build scripts and live-build profile
iso/overlay/ Source overlay copied into a staged build overlay
iso/vendor/ Optional pre-built vendor binaries (storcli64, sas2ircu, sas3ircu, arcconf, ssacli, …)
internal/chart/ Git submodule with reanimator/chart, embedded into bee web
iso/builder/VERSIONS Pinned versions: Debian, Go, NVIDIA driver, kernel ABI
iso/builder/smoketest.sh Post-boot smoke test — run via SSH to verify live ISO
iso/overlay/etc/profile.d/bee.sh menu helper + tty1 auto-start policy
iso/overlay/home/bee/.profile bee shell profile for local console startup
dist/ Build outputs (gitignored)
iso/out/ Downloaded ISO files (gitignored)