# BEE — Build Plan Hardware audit LiveCD for offline server inventory. Produces `HardwareIngestRequest` JSON compatible with core/reanimator. **Principle:** OS-level collection — reads hardware directly, not through BMC. Fully unattended — no user interaction required at any stage. Boot → update → audit → output → done. All errors are logged, never presented interactively. Every failure path has a silent fallback. Fills the gaps where logpile/Redfish is blind: NVMe, DIMM serials, GPU serials, physical disks behind RAID, full SMART, NIC firmware. --- ## Phase 1 — Go Audit Binary Self-contained static binary. Runs on any Linux (including Alpine LiveCD). Calls system utilities, parses their output, produces `HardwareIngestRequest` JSON. ### 1.1 — Project scaffold - `audit/go.mod` — module `bee/audit` - `audit/cmd/audit/main.go` — CLI entry point: flags, orchestration, JSON output - `audit/internal/schema/` — copy of `HardwareIngestRequest` types from core (no import dependency) - `audit/internal/collector/` — empty package stubs for all collectors - `const Version = "1.0"` in main - Output modes: stdout (default), file path flag `--output /path/to/file.json` - Tests: none yet (stubs only) ### 1.2 — Board collector Source: `dmidecode -t 0` (BIOS), `-t 1` (System), `-t 2` (Baseboard) Collects: - `board.serial_number` — from System Information - `board.manufacturer`, `board.product_name` — from System Information - `board.part_number` — from Baseboard - `board.uuid` — from System Information - `firmware[BIOS]` — vendor, version, release date from BIOS Information Tests: table tests with `testdata/dmidecode_*.txt` fixtures ### 1.3 — CPU collector Source: `dmidecode -t 4` Collects: - socket index, model, manufacturer, status - cores, threads, current/max frequency - firmware: microcode version from `/sys/devices/system/cpu/cpu0/microcode/version` - serial: not available on Intel Xeon → fallback `-CPU-` (matches core logic) Tests: table tests with dmidecode fixtures ### 1.4 — Memory collector Source: `dmidecode -t 17` Collects: - slot, location, present flag - size_mb, type (DDR4/DDR5), max_speed_mhz, current_speed_mhz - manufacturer, serial_number, part_number - status from "Data Width" / "No Module Installed" detection Tests: table tests with dmidecode fixtures (populated + empty slots) ### 1.5 — Storage collector Sources: - `lsblk -J -o NAME,TYPE,SIZE,SERIAL,MODEL,TRAN,MOUNTPOINT` — device enumeration - `smartctl -j -i /dev/X` — serial, model, firmware, interface per device - `nvme id-ctrl /dev/nvmeX -o json` — NVMe: serial (sn), firmware (fr), model (mn), size Collects per device: - type: SSD/HDD/NVMe - model, serial_number, manufacturer, firmware - size_gb, interface (SATA/SAS/NVMe) - slot: from lsblk HCTL where available Tests: table tests with `smartctl -j` JSON fixtures and `nvme id-ctrl` JSON fixtures ### 1.6 — PCIe collector Sources: - `lspci -vmm -D` — slot, vendor, device, class - `lspci -vvv -D` — link width/speed (LnkSta, LnkCap) - embedded pci.ids (same submodule as logpile: `third_party/pciids`) — model name lookup - `/sys/bus/pci/devices//` — actual negotiated link state from kernel Collects per device: - bdf, vendor_id, device_id, device_class - manufacturer, model (via pciids lookup if empty) - link_width, link_speed, max_link_width, max_link_speed - serial_number: device-specific (see per-type enrichment in 1.8, 1.9) Dedup: by serial → bdf (mirrors logpile canonical device repository logic) Tests: table tests with lspci fixtures ### 1.7 — PSU collector Source: `ipmitool fru` — primary (only source for PSU data from OS) Fallback: `dmidecode -t 39` (System Power Supply, limited availability) Collects: - slot, present, model, vendor, serial_number, part_number - wattage_w from FRU or dmidecode - firmware from ipmitool FRU `Board Extra` fields - input_power_w, output_power_w, input_voltage from `ipmitool sdr` Tests: table tests with ipmitool fru text fixtures ### 1.8 — NVIDIA GPU enrichment Prerequisite: NVIDIA driver loaded (checked via `nvidia-smi -L` exit code) Sources: - `nvidia-smi --query-gpu=index,name,serial,vbios_version,temperature.gpu,power.draw,ecc.errors.uncorrected.aggregate.total --format=csv,noheader,nounits` - BDF correlation: `nvidia-smi --query-gpu=index,pci.bus_id --format=csv,noheader` → match to PCIe collector records Enriches PCIe records for NVIDIA devices: - `serial_number` — real GPU serial from nvidia-smi - `firmware` — VBIOS version - `status` — derived from ECC uncorrected errors (0 = OK, >0 = WARNING) - telemetry: temperature_c, power_w added to PCIe record attributes Fallback (no driver): PCIe record stays as-is with serial fallback `-PCIE-` Tests: table tests with nvidia-smi CSV fixtures ### 1.8b — Component wear / age telemetry Every component that stores its own usage history must have that data collected and placed in the `attributes` / `telemetry` map of the respective record. This is a cross-cutting concern applied on top of the per-collector steps. **Storage (SATA/SAS) — smartctl:** - `Power_On_Hours` (attr 9) — total hours powered on - `Power_Cycle_Count` (attr 12) - `Reallocated_Sector_Ct` (attr 5) — wear indicator - `Wear_Leveling_Count` (attr 177, SSD) — NAND wear - `Total_LBAs_Written` (attr 241) — bytes written lifetime - `SSD_Life_Left` (attr 231) — % remaining if reported - Collected as `telemetry` map keys: `power_on_hours`, `power_cycles`, `reallocated_sectors`, `wear_leveling_pct`, `total_lba_written`, `life_remaining_pct` **NVMe — nvme smart-log:** - `power_on_hours` — lifetime hours - `power_cycles` - `unsafe_shutdowns` — abnormal power loss count - `percentage_used` — % of rated lifetime consumed (0–100) - `data_units_written` — 512KB units written lifetime - `controller_busy_time` — hours controller was busy - Collected via `nvme smart-log /dev/nvmeX -o json` **NVIDIA GPU — nvidia-smi:** - `ecc.errors.uncorrected.aggregate.total` — lifetime uncorrected ECC errors - `ecc.errors.corrected.aggregate.total` — lifetime corrected ECC errors - `clocks_throttle_reasons.hw_slowdown` — thermal/power throttle state - Stored in PCIe device `telemetry` **NIC SFP/QSFP transceivers — ethtool:** - `ethtool -m ` — DOM (Digital Optical Monitoring) if supported - Extracts: TX power, RX power, temperature, voltage, bias current - Also: `ethtool -i ` → firmware version - `ip -s link show ` → tx_packets, rx_packets, tx_errors, rx_errors (uptime proxy) - Stored in PCIe device `telemetry`: `sfp_temperature_c`, `sfp_tx_power_dbm`, `sfp_rx_power_dbm` **PSU — ipmitool sdr:** - Input/output power readings over time not stored by BMC (point-in-time only) - `ipmitool fru` may include manufacture date for age estimation - Stored: `input_power_w`, `output_power_w`, `input_voltage` (already in PSU schema) **All wear telemetry placement rules:** - Numeric wear indicators go into `telemetry` map (machine-readable, importable by core) - Boolean flags (throttle_active, ecc_errors_present) go into `attributes` map - Never flatten into named top-level fields not in the schema — use maps Tests: table tests for each SMART parser (JSON fixtures from smartctl/nvme smart-log) ### 1.9 — Mellanox/NVIDIA NIC enrichment Source: `mstflint -d q` — if mstflint present and device is Mellanox (vendor_id 0x15b3) Fallback: `ethtool -i ` — firmware-version field Enriches PCIe/NIC records: - `firmware` — from mstflint `FW Version` or ethtool `firmware-version` - `serial_number` — from mstflint `Board Serial Number` if available Detection: by PCI vendor_id (0x15b3 = Mellanox/NVIDIA Networking) from PCIe collector Tests: table tests with mstflint output fixtures ### 1.10 — RAID controller enrichment Source: tool selected by PCI vendor_id: | PCI vendor_id | Tool | Controller | |---|---|---| | 0x1000 | `storcli64 /c show all J` | Broadcom MegaRAID | | 0x1000 (SAS) | `sas2ircu display` / `sas3ircu display` | LSI SAS 2.x/3.x | | 0x9005 | `arcconf getconfig ` | Adaptec | | 0x103c | `ssacli ctrl slot= pd all show detail` | HPE Smart Array | Collects physical drives behind controller (not visible to OS as block devices): - serial_number, model, manufacturer, firmware - size_gb, interface (SAS/SATA), slot/bay - status (Online/Failed/Rebuilding → OK/CRITICAL/WARNING) No hardcoded vendor names in detection logic — pure PCI vendor_id map. Tests: table tests with storcli/sas2ircu text fixtures ### 1.11 — Output and USB write `--output stdout` (default): pretty-printed JSON to stdout `--output file:`: write JSON to explicit path `--output usb`: auto-detect first removable block device, mount it, write `audit--.json` USB detection: scan `/sys/block/*/removable`, pick first `1`, mount to `/tmp/bee-usb` QR summary to stdout (always): board serial + model + component counts — fits in one QR code Uses `qrencode` if present, else skips silently ### 1.12 — Integration test (local) `scripts/test-local.sh` — runs audit binary on developer machine (Linux), captures JSON, validates required fields are present (board.serial_number non-empty, cpus non-empty, etc.) Not a unit test — requires real hardware access. Documents how to run for verification. --- ## Phase 2 — Alpine LiveCD ISO image bootable via BMC virtual media. Runs audit binary automatically on boot. ### 2.1 — Builder environment `iso/builder/Dockerfile` — Alpine 3.21 build environment with: - `alpine-sdk`, `abuild`, `squashfs-tools`, `xorriso` - Go toolchain (for binary compilation inside builder) - NVIDIA driver `.run` pre-fetched during image build `iso/builder/build.sh` — orchestrates full ISO build: 1. Compile Go binary (static, `CGO_ENABLED=0`) 2. Compile NVIDIA kernel module against Alpine 3.21 LTS kernel headers 3. Run `mkimage.sh` with bee profile 4. Output: `dist/bee-.iso` ### 2.2 — NVIDIA driver build Alpine 3.21, LTS kernel 6.6 — fixed versions in builder. `iso/builder/build-nvidia.sh`: - Download `NVIDIA-Linux-x86_64-.run` (version pinned in `iso/builder/VERSIONS`) - Extract kernel module sources - Compile against `linux-lts-dev` headers - Strip and package as `nvidia--k6.6.ko.tar.gz` for inclusion in overlay `iso/overlay/usr/local/bin/load-nvidia.sh`: - `insmod` sequence: nvidia.ko → nvidia-modeset.ko → nvidia-uvm.ko - Verify: `nvidia-smi -L` → log result - On failure: log warning, continue (audit runs without GPU enrichment) ### 2.3 — Alpine mkimage profile `iso/builder/mkimg.bee.sh` — Alpine mkimage profile: - Base: `alpine-base` - Kernel: `linux-lts` - Packages: `dmidecode smartmontools nvme-cli pciutils ipmitool util-linux e2fsprogs qrencode` - Overlay: `iso/overlay/` included as apkovl ### 2.4 — Network bring-up on boot `iso/overlay/usr/local/bin/bee-network.sh`: - Enumerate all network interfaces: `ip link show` → filter out loopback and virtual (docker/bridge) - For each physical interface: `ip link set up` + `udhcpc -i -t 5 -T 3 -n` - Log each interface result (got IP / timeout / no carrier) - Continue regardless — network is best-effort for auto-update `iso/overlay/etc/init.d/bee-network`: - runlevel: default, before: bee-update - Calls bee-network.sh - Does not block boot if DHCP fails on all interfaces ### 2.5 — OpenRC boot service (bee-audit) `iso/overlay/etc/init.d/bee-audit`: - runlevel: default, after: bee-update - start(): load-nvidia.sh → /usr/local/bin/audit --output usb - on completion: print QR summary to /dev/tty1 (always, even if USB write failed) - log everything to /var/log/bee-audit.log - exits 0 regardless of partial failures — unattended, no prompts, no waits Unattended invariants: - No TTY prompts ever. All decisions are automatic. - Missing USB: output goes to /tmp/bee-audit--.json, QR shown on screen. - Missing NVIDIA driver: GPU records have status UNKNOWN, audit continues. - Missing ipmitool/storcli/any tool: that collector is skipped, rest continue. - Timeout on any external command: 30s hard limit via `timeout` wrapper, then skip. - Boot never hangs waiting for user input. `iso/overlay/etc/runlevels/default/bee-audit` symlink ### 2.6 — Vendor utilities in overlay `iso/overlay/usr/local/bin/` includes pre-fetched proprietary tools: - `storcli64` (Broadcom) - `sas2ircu`, `sas3ircu` (Broadcom/LSI) - `mstflint` (NVIDIA Networking / Mellanox) `scripts/fetch-vendor.sh` — downloads and places these before ISO build. Checksums verified. Tools not committed to git — fetched at build time. `iso/vendor/.gitkeep` — placeholder, directory gitignored except .gitkeep ### 2.7 — Auto-update of audit binary (USB + network) Two update paths, tried in order on every boot: **Path A — USB (no network required, higher priority):** `bee-update.sh` scans mounted removable media for an update package before checking network. Looks for: `/bee-update/bee-audit-linux-amd64` + `/bee-update/bee-audit-linux-amd64.sha256` Steps: 1. Find USB mount point (same detection as audit output: `/sys/block/*/removable`) 2. Check for `bee-update/bee-audit-linux-amd64` on the USB root 3. Read version from `bee-update/VERSION` file (plain text, e.g. `1.3`) 4. Compare with running binary version (`/usr/local/bin/audit --version`) 5. If USB version > running: verify SHA256 checksum, replace binary, log update 6. Re-run audit if updated **Authenticity verification — Ed25519 multi-key trust (stdlib only, no external tools):** Problem: SHA256 alone does not prevent a crafted attack — an attacker places their binary and a matching SHA256 next to it. The LiveCD would accept it. Solution: Ed25519 asymmetric signatures via Go stdlib `crypto/ed25519`. Multiple developer public keys are supported. A binary update is accepted if its signature verifies against ANY of the embedded trusted public keys. This mirrors the SSH authorized_keys model: add a developer → add their public key. Remove a developer → rebuild without their key. **Key management — centralized across all projects:** Public keys live in a dedicated repo at git.mchus.pro/mchus/keys (or similar): ``` keys/ developers/ mchusavitin.pub ← Ed25519 public key, base64, one line developer2.pub README.md ← how to generate a key pair ``` Public keys are safe to commit — they are not secret. Private keys stay on each developer's machine, never committed anywhere. Key generation (one-time per developer, run locally): ```sh # scripts/keygen.sh — also lives in the keys repo openssl genpkey -algorithm ed25519 -out ~/.bee-release.key openssl pkey -in ~/.bee-release.key -pubout -outform DER \ | tail -c 32 | base64 > mchusavitin.pub ``` **Embedding public keys at release time (not compile time):** Public keys are injected via `-ldflags` at build time from the keys repo. The binary does not hardcode keys — they are provided by the release script. ```go // audit/internal/updater/trust.go // trustedKeysRaw is injected at build time via -ldflags // format: base64(key1):base64(key2):... var trustedKeysRaw string func trustedKeys() ([]ed25519.PublicKey, error) { if trustedKeysRaw == "" { return nil, fmt.Errorf("binary built without trusted keys — updates disabled") } var keys []ed25519.PublicKey for _, enc := range strings.Split(trustedKeysRaw, ":") { b, err := base64.StdEncoding.DecodeString(strings.TrimSpace(enc)) if err != nil || len(b) != ed25519.PublicKeySize { return nil, fmt.Errorf("invalid trusted key: %w", err) } keys = append(keys, ed25519.PublicKey(b)) } return keys, nil } func verifySignature(binaryPath, sigPath string) error { keys, err := trustedKeys() if err != nil { return err } data, _ := os.ReadFile(binaryPath) sig, _ := os.ReadFile(sigPath) // 64 bytes raw Ed25519 signature for _, key := range keys { if ed25519.Verify(key, data, sig) { return nil // any trusted key accepts → pass } } return fmt.Errorf("signature verification failed: no trusted key matched") } ``` Release build injects keys: ```sh # scripts/build-release.sh KEYS=$(paste -sd: keys/developers/*.pub) go build -ldflags "-X bee/audit/internal/updater/trust.trustedKeysRaw=${KEYS}" \ -o dist/bee-audit-linux-amd64 ./cmd/audit ``` Signing (release engineer signs with their private key): ```sh # scripts/sign-release.sh openssl pkeyutl -sign -inkey ~/.bee-release.key \ -rawin -in "$1" -out "$1.sig" ``` Binary built without `-ldflags` injection (e.g. local dev build) has `trustedKeysRaw=""` → updates are disabled, logged as INFO, audit continues normally. Update rejected silently (logged as WARNING, audit continues with current binary) if: - `.sig` file missing - Signature does not match any trusted key - `trustedKeysRaw` empty (dev build) Update package layout on USB: ``` /bee-update/ bee-audit-linux-amd64 ← new binary (also signed with embedded keys) bee-audit-linux-amd64.sig ← Ed25519 signature (64 bytes raw) VERSION ← plain version string e.g. "1.3" ``` Admin workflow: download `bee-audit-linux-amd64` + `bee-audit-linux-amd64.sig` from Gitea release assets, place in `bee-update/` on USB. **Path B — Network (requires DHCP on at least one interface):** 1. Check network: ping git.mchus.pro -c 1 -W 3 || skip 2. Fetch: `GET https://git.mchus.pro/api/v1/repos//bee/releases/latest` 3. Parse tag_name, asset URLs for `bee-audit-linux-amd64` + `bee-audit-linux-amd64.sig` 4. Compare tag with running version 5. If newer: download both files to /tmp, verify Ed25519 signature against all trusted keys 6. Replace binary on pass, log and skip on fail 7. Re-run audit if updated **Ordering:** USB update checked first, network checked second. If USB update applied and verified, network check is skipped. `iso/overlay/etc/init.d/bee-update`: - runlevel: default - after: bee-network (network path needs interfaces up) - before: bee-audit (audit runs with latest binary) - Calls bee-update.sh Triggered after bee-audit completes, only if network is available. `iso/overlay/usr/local/bin/bee-update.sh`: ``` 1. Check network: ping git.mchus.pro -c 1 -W 3 || exit 0 2. Fetch latest release metadata: GET https://git.mchus.pro/api/v1/repos//bee/releases/latest 3. Parse: extract tag_name, asset URL for bee-audit-linux-amd64 4. Compare tag_name with /usr/local/bin/audit --version output 5. If newer: download to /tmp/bee-audit-new, verify SHA256 checksum from release assets 6. Replace /usr/local/bin/audit (tmpfs — survives until reboot) 7. Log: updated from vX.Y to vX.Z 8. Re-run audit if update happened: /usr/local/bin/audit --output usb ``` `iso/overlay/etc/init.d/bee-update`: - runlevel: default - after: bee-audit, network - Calls bee-update.sh Release naming convention: binary asset named `bee-audit-linux-amd64` per release tag. ### 2.8 — Release workflow `iso/builder/VERSIONS` — pinned versions: ``` AUDIT_VERSION=1.0 ALPINE_VERSION=3.21 KERNEL_VERSION=6.12 NVIDIA_DRIVER_VERSION=590.48.01 ``` LiveCD release = full ISO rebuild. Binary-only patch = new Gitea release with binary asset. On boot with network: ISO auto-patches its binary without full rebuild. ISO version embedded in `/etc/bee-release`: ``` BEE_ISO_VERSION=1.0 BEE_AUDIT_VERSION=1.0 BUILD_DATE=2026-03-05 ``` --- ## Eating order Builder environment is set up early (after 1.3) so every subsequent collector is developed and tested directly on real hardware in the actual Alpine environment. No "works on my Mac" drift. ``` 1.0 keys repo setup → git.mchus.pro/mchus/keys, keygen.sh, developer pubkeys 1.1 scaffold + schema types → binary runs, outputs empty JSON 1.2 board collector → first real data 1.3 CPU collector → +CPUs --- BUILDER + DEBUG ISO (unblock real-hardware testing) --- 2.1 builder VM setup → Alpine VM with build deps + Go toolchain 2.2 debug ISO profile → minimal Alpine ISO: audit binary + dropbear SSH + all packages 2.3 boot on real server → SSH in, verify packages present, run audit manually --- CONTINUE COLLECTORS (tested on real hardware from here) --- 1.4 memory collector → +DIMMs 1.5 storage collector → +disks (SATA/SAS/NVMe) 1.6 PCIe collector → +all PCIe devices 1.7 PSU collector → +power supplies 1.8 NVIDIA GPU enrichment → +GPU serial/VBIOS 1.8b wear/age telemetry → +SMART hours, NVMe % used, SFP DOM, ECC 1.9 Mellanox NIC enrichment → +NIC firmware/serial 1.10 RAID enrichment → +physical disks behind RAID 1.11 output + USB write → production-ready output --- PRODUCTION ISO --- 2.4 NVIDIA driver build → driver compiled into overlay 2.5 network bring-up on boot → DHCP on all interfaces 2.6 OpenRC boot service → audit runs on boot automatically 2.7 vendor utilities → storcli/sas2ircu/mstflint in image 2.8 auto-update → binary self-patches from Gitea 2.9 release workflow → versioning + release notes ```