docs: add bible-local with architecture and decisions, fix PLAN.md versions
- bible-local/architecture/system-overview.md: scope, tech stack, key paths - bible-local/architecture/runtime-flows.md: boot sequence, ISO build, collector flow - bible-local/decisions/2026-03-05-nvidia-proprietary-driver.md - PLAN.md: update KERNEL_VERSION 6.6→6.12, NVIDIA 550.54.15→590.48.01 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
4
PLAN.md
4
PLAN.md
@@ -485,8 +485,8 @@ Release naming convention: binary asset named `bee-audit-linux-amd64` per releas
|
|||||||
```
|
```
|
||||||
AUDIT_VERSION=1.0
|
AUDIT_VERSION=1.0
|
||||||
ALPINE_VERSION=3.21
|
ALPINE_VERSION=3.21
|
||||||
KERNEL_VERSION=6.6
|
KERNEL_VERSION=6.12
|
||||||
NVIDIA_DRIVER_VERSION=550.54.15
|
NVIDIA_DRIVER_VERSION=590.48.01
|
||||||
```
|
```
|
||||||
|
|
||||||
LiveCD release = full ISO rebuild. Binary-only patch = new Gitea release with binary asset.
|
LiveCD release = full ISO rebuild. Binary-only patch = new Gitea release with binary asset.
|
||||||
|
|||||||
12
bible-local/README.md
Normal file
12
bible-local/README.md
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
# bee — Project Bible
|
||||||
|
|
||||||
|
Project-specific architecture, decisions, and runtime contracts.
|
||||||
|
Generic engineering rules live in `bible/rules/patterns/`.
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| File | Contents |
|
||||||
|
|---|---|
|
||||||
|
| `architecture/system-overview.md` | What bee does, scope, tech stack |
|
||||||
|
| `architecture/runtime-flows.md` | Boot sequence, audit flow, service order |
|
||||||
|
| `decisions/` | Architectural decision log |
|
||||||
78
bible-local/architecture/runtime-flows.md
Normal file
78
bible-local/architecture/runtime-flows.md
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
# Runtime Flows — bee
|
||||||
|
|
||||||
|
## Boot sequence (debug ISO)
|
||||||
|
|
||||||
|
OpenRC default runlevel, service start order:
|
||||||
|
|
||||||
|
```
|
||||||
|
localmount
|
||||||
|
└── bee-sshsetup (creates bee user, sets password fallback)
|
||||||
|
└── dropbear (SSH on port 22 — starts regardless of network)
|
||||||
|
└── bee-network (udhcpc -b on all physical interfaces, non-blocking)
|
||||||
|
└── bee-nvidia (depmod -a, modprobe nvidia nvidia-modeset nvidia-uvm)
|
||||||
|
└── bee-audit-debug (runs audit binary, logs to /var/log/bee-audit.json)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Critical invariants:**
|
||||||
|
- Dropbear MUST start without network. Custom init in overlay has `need localmount` only — NOT `need net`.
|
||||||
|
- bee-network uses `udhcpc -b` (background daemon) so it retries indefinitely when cable connected later.
|
||||||
|
- bee-audit-debug uses `eend 0` always — never fails boot even if audit errors.
|
||||||
|
|
||||||
|
## ISO build sequence
|
||||||
|
|
||||||
|
```
|
||||||
|
build-debug.sh
|
||||||
|
1. compile audit binary (skip if .go files older than binary)
|
||||||
|
2. build-nvidia-module.sh:
|
||||||
|
a. download NVIDIA .run installer (sha256 verified, cached)
|
||||||
|
b. extract installer
|
||||||
|
c. build kernel modules against linux-lts-dev headers
|
||||||
|
d. extract nvidia-smi + libnvidia-ml from installer
|
||||||
|
e. cache in dist/nvidia-<version>-<kver>/
|
||||||
|
3. inject authorized_keys into overlay
|
||||||
|
4. inject audit binary → overlay/usr/local/bin/audit
|
||||||
|
5. inject NVIDIA .ko → overlay/lib/modules/<kver>/extra/nvidia/
|
||||||
|
6. inject nvidia-smi → overlay/usr/local/bin/nvidia-smi
|
||||||
|
7. copy mkimg profile + genapkovl to ~/.mkimage/ AND /var/tmp/
|
||||||
|
8. mkimage.sh (from /var/tmp, TMPDIR=/var/tmp):
|
||||||
|
kernel_* section — cached (linux-lts modloop, lz4 compressed)
|
||||||
|
apks_* section — cached (downloaded packages)
|
||||||
|
syslinux_* / grub_* — cached
|
||||||
|
apkovl — always regenerated (genapkovl-bee_debug.sh)
|
||||||
|
final ISO — always assembled
|
||||||
|
```
|
||||||
|
|
||||||
|
**Critical invariants:**
|
||||||
|
- `genapkovl-bee_debug.sh` must be in `/var/tmp/` (CWD when mkimage runs), not only `~/.mkimage/`.
|
||||||
|
- `TMPDIR=/var/tmp` required — tmpfs /tmp is only ~1GB, too small for kernel firmware.
|
||||||
|
- Workdir cleanup preserves `apks_*`, `kernel_*`, `syslinux_*`, `grub_*` — only clears apkovl and final image.
|
||||||
|
- `run-builder.sh` runs build in `screen` session to survive SSH disconnects during long NVIDIA downloads.
|
||||||
|
|
||||||
|
## apkovl mechanism
|
||||||
|
|
||||||
|
The apkovl is a `.tar.gz` injected into the ISO at `/boot/`. Alpine's initramfs extracts it at boot, overlaying `/etc`, `/usr`, `/root` on the tmpfs root.
|
||||||
|
|
||||||
|
`genapkovl-bee_debug.sh` generates the tarball containing:
|
||||||
|
- `/etc/apk/world` — package list (apk installs these on first boot)
|
||||||
|
- `/etc/runlevels/*/` — OpenRC service symlinks
|
||||||
|
- `/etc/conf.d/dropbear` — DROPBEAR_OPTS="-R -B"
|
||||||
|
- `/etc/network/interfaces` — lo only (bee-network handles DHCP)
|
||||||
|
- `/etc/hostname`
|
||||||
|
- Everything from `iso/overlay-debug/` (init scripts, binaries, ssh keys)
|
||||||
|
|
||||||
|
## Collector flow
|
||||||
|
|
||||||
|
```
|
||||||
|
audit binary start
|
||||||
|
1. board collector (dmidecode -t 0,1,2)
|
||||||
|
2. cpu collector (dmidecode -t 4)
|
||||||
|
3. memory collector (dmidecode -t 17)
|
||||||
|
4. storage collector (lsblk -J, smartctl -j, nvme id-ctrl, nvme smart-log)
|
||||||
|
5. pcie collector (lspci -vmm -D, /sys/bus/pci/devices/)
|
||||||
|
6. psu collector (ipmitool fru — silent if no /dev/ipmi0)
|
||||||
|
7. nvidia enrichment (nvidia-smi — skipped if driver not loaded)
|
||||||
|
8. output JSON to stdout / file / usb
|
||||||
|
9. QR summary to stdout (qrencode if available)
|
||||||
|
```
|
||||||
|
|
||||||
|
Every collector returns `nil, nil` on tool-not-found. Errors are logged, never fatal.
|
||||||
58
bible-local/architecture/system-overview.md
Normal file
58
bible-local/architecture/system-overview.md
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
# System Overview — bee
|
||||||
|
|
||||||
|
## What it does
|
||||||
|
|
||||||
|
Hardware audit LiveCD. Boots on a server via BMC virtual media or USB.
|
||||||
|
Collects hardware inventory at OS level (not through BMC/Redfish).
|
||||||
|
Produces `HardwareIngestRequest` JSON compatible with core/reanimator.
|
||||||
|
|
||||||
|
## Why it exists
|
||||||
|
|
||||||
|
Fills gaps where Redfish/logpile is blind:
|
||||||
|
- NVMe serials and SMART data
|
||||||
|
- DIMM serials and slot layout
|
||||||
|
- GPU serials and VBIOS versions
|
||||||
|
- Physical disks behind RAID controllers
|
||||||
|
- Full SMART wear telemetry
|
||||||
|
- NIC firmware versions
|
||||||
|
|
||||||
|
## In scope
|
||||||
|
|
||||||
|
- Read-only hardware inventory: board, CPU, memory, storage, PCIe, PSU, GPU, NIC, RAID
|
||||||
|
- Unattended operation — no user interaction at any stage
|
||||||
|
- NVIDIA proprietary driver loaded at boot for GPU enrichment
|
||||||
|
- SSH access in debug ISO for development and testing
|
||||||
|
- Auto-update of audit binary from Gitea releases (production ISO)
|
||||||
|
|
||||||
|
## Out of scope
|
||||||
|
|
||||||
|
- Any writes to the server being audited
|
||||||
|
- Network configuration changes
|
||||||
|
- BMC/IPMI configuration
|
||||||
|
- Anything requiring persistent storage on the audited machine
|
||||||
|
- Windows support
|
||||||
|
|
||||||
|
## Tech stack
|
||||||
|
|
||||||
|
| Component | Technology |
|
||||||
|
|---|---|
|
||||||
|
| Audit binary | Go, static, `CGO_ENABLED=0` |
|
||||||
|
| LiveCD | Alpine Linux 3.21, linux-lts 6.12.x |
|
||||||
|
| ISO build | Alpine mkimage + apkovl overlay |
|
||||||
|
| Init system | OpenRC |
|
||||||
|
| SSH (debug) | Dropbear |
|
||||||
|
| NVIDIA driver | Proprietary `.run` installer, built against linux-lts headers |
|
||||||
|
| Builder VM | Alpine 3.21, 172.27.0.4 |
|
||||||
|
|
||||||
|
## Key paths
|
||||||
|
|
||||||
|
| Path | Purpose |
|
||||||
|
|---|---|
|
||||||
|
| `audit/cmd/audit/` | CLI entry point |
|
||||||
|
| `audit/internal/collector/` | Per-subsystem collectors |
|
||||||
|
| `audit/internal/schema/` | HardwareIngestRequest types |
|
||||||
|
| `iso/builder/` | ISO build scripts and mkimage profile |
|
||||||
|
| `iso/overlay-debug/` | Files injected into debug ISO via apkovl |
|
||||||
|
| `iso/builder/VERSIONS` | Pinned versions: Alpine, Go, NVIDIA driver |
|
||||||
|
| `dist/` | Build outputs (gitignored) |
|
||||||
|
| `iso/out/` | Downloaded ISO files (gitignored) |
|
||||||
@@ -0,0 +1,23 @@
|
|||||||
|
# Decision: Use NVIDIA proprietary driver, not open kernel modules
|
||||||
|
|
||||||
|
**Date:** 2026-03-05
|
||||||
|
**Status:** active
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
bee needs to collect GPU serial numbers, VBIOS versions, and ECC telemetry via `nvidia-smi`.
|
||||||
|
Two options exist: NVIDIA open-gpu-kernel-modules (MIT/GPLv2, GitHub) or the official
|
||||||
|
proprietary `.run` installer.
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Use the official proprietary NVIDIA `.run` installer for both kernel modules and `nvidia-smi`.
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- Kernel modules and nvidia-smi come from a single verified source.
|
||||||
|
- NVIDIA publishes `.sha256sum` alongside each installer — download and verify before use.
|
||||||
|
- Driver version pinned in `iso/builder/VERSIONS` as `NVIDIA_DRIVER_VERSION`.
|
||||||
|
- Build process: download `.run`, extract, compile `kernel/` sources against `linux-lts-dev`.
|
||||||
|
- Modules cached in `dist/nvidia-<version>-<kver>/` — rebuild only on version or kernel change.
|
||||||
|
- ISO size increases by ~50MB for .ko files + nvidia-smi.
|
||||||
7
bible-local/decisions/README.md
Normal file
7
bible-local/decisions/README.md
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
# Architectural Decision Log
|
||||||
|
|
||||||
|
One file per decision, named `YYYY-MM-DD-short-topic.md`.
|
||||||
|
|
||||||
|
| Date | Decision | Status |
|
||||||
|
|---|---|---|
|
||||||
|
| 2026-03-05 | Use NVIDIA proprietary driver | active |
|
||||||
Reference in New Issue
Block a user