## ISO build consolidation - Remove separate debug/prod split: overlay-debug/, build-debug.sh, mkimg.bee_debug.sh, genapkovl-bee_debug.sh all deleted - Single overlay: iso/overlay/ (was overlay-debug content) - Single build script: build.sh (SSH, TUI, NVIDIA, vendor tools, bee-release) - Single mkimage profile: bee (with dropbear, dialog, strace, gcompat, etc.) ## NVIDIA fixes - Modules now stored at /usr/local/lib/nvidia/ instead of /lib/modules/<kver>/extra/nvidia/ — modloop squashfs mounts over that path at boot making overlay content there inaccessible - bee-nvidia init: load via insmod (absolute path), not modprobe - bee-nvidia init: create libnvidia-ml.so.1/libcuda.so.1 symlinks in /usr/lib/ - build-nvidia-module.sh: always install linux-lts-dev (not conditional) — stale 6.6.x headers caused wrong-kernel modules that never loaded at runtime - build-nvidia-module.sh: create soname symlinks in cache - KERNEL_VERSION in VERSIONS updated 6.6 → 6.12 - gcompat added to ISO packages (nvidia-smi is a glibc binary on musl Alpine) ## Service ordering - bee-audit: add `after bee-nvidia` so NVIDIA enrichment always succeeds ## New tooling - iso/builder/smoketest.sh: SSH smoke test for post-boot ISO validation - iso/builder/build-gpu-burn.sh: builds gpu_burn vendor binary (CUDA 12.8+) - vendor/gpu_burn included automatically if placed in iso/vendor/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.5 KiB
Runtime Flows — bee
Boot sequence (single ISO)
OpenRC default runlevel, service start order:
localmount
├── bee-sshsetup (creates bee user, sets password; runs before dropbear)
│ └── dropbear (SSH on port 22 — starts without network)
├── bee-network (udhcpc -b on all physical interfaces, non-blocking)
│ └── bee-nvidia (insmod nvidia*.ko from /usr/local/lib/nvidia/,
│ creates libnvidia-ml.so.1 symlinks in /usr/lib/)
│ └── bee-audit (runs audit binary → /var/log/bee-audit.json)
Critical invariants:
- Dropbear MUST start without network.
bee-sshsetuphasneed localmountonly. bee-networkusesudhcpc -b(background) — retries indefinitely if no cable.bee-nvidialoads modules viainsmodwith absolute paths — NOTmodprobe. Reason: modloop squashfs mounts over/lib/modules/<kver>/at boot, making it read-only. The overlay's modules at that path are inaccessible. Modules are stored at/usr/local/lib/nvidia/(overlay path, always writable).bee-nvidiacreateslibnvidia-ml.so.1symlinks in/usr/lib/— required becausenvidia-smiis a glibc binary that looks for the soname symlink, not the versioned file.gcompatpackage provides/lib64/ld-linux-x86-64.so.2for glibc compat on Alpine musl.bee-auditusesafter bee-nvidia— ensures NVIDIA enrichment succeeds.bee-audituseseend 0always — never fails boot even if audit errors.
ISO build sequence
build.sh [--authorized-keys /path/to/keys]
1. compile audit binary (skip if .go files older than binary)
2. inject authorized_keys into overlay/root/.ssh/ (or set password fallback)
3. copy audit binary → overlay/usr/local/bin/audit
4. copy vendor binaries from iso/vendor/ → overlay/usr/local/bin/
(storcli64, sas2ircu, sas3ircu, mstflint, gpu_burn — each optional)
5. build-nvidia-module.sh:
a. apk add linux-lts-dev (always, to get current Alpine 3.21 kernel headers)
b. detect KVER from /usr/src/linux-headers-*
c. download NVIDIA .run installer (sha256 verified, cached in dist/)
d. extract installer
e. build kernel modules against linux-lts headers
f. create libnvidia-ml.so.1 / libcuda.so.1 symlinks in cache
g. cache in dist/nvidia-<version>-<kver>/
6. inject NVIDIA .ko → overlay/usr/local/lib/nvidia/
7. inject nvidia-smi → overlay/usr/local/bin/nvidia-smi
8. inject libnvidia-ml + libcuda → overlay/usr/lib/
9. write overlay/etc/bee-release (versions + git commit)
10. export BEE_BUILD_INFO for motd substitution
11. mkimage.sh (from /var/tmp, TMPDIR=/var/tmp):
kernel_* section — cached (linux-lts modloop)
apks_* section — cached (downloaded packages)
syslinux_* / grub_* — cached
apkovl — always regenerated (genapkovl-bee.sh)
final ISO — always assembled
Critical invariants:
linux-lts-devis always installed (not conditional) — stale 6.6.x headers on the builder would cause modules to be built for the wrong kernel and never load at runtime.- NVIDIA modules go to
overlay/usr/local/lib/nvidia/— NOTlib/modules/<kver>/extra/. genapkovl-bee.shmust be copied to/var/tmp/(CWD when mkimage runs).TMPDIR=/var/tmprequired — tmpfs/tmpis only ~1GB, too small for kernel firmware.- Workdir cleanup preserves
apks_*,kernel_*,syslinux_*,grub_*cache dirs.
gpu_burn vendor binary
gpu_burn requires CUDA nvcc to build. It is NOT built as part of the main ISO build.
Build separately on the builder VM and place in iso/vendor/gpu_burn:
sh iso/builder/build-gpu-burn.sh dist/
cp dist/gpu_burn iso/vendor/gpu_burn
cp dist/compare.ptx iso/vendor/compare.ptx
Requires: CUDA 12.8+ (supports GCC 14, Alpine 3.21), libxml2, g++, make, git.
The build.sh will include it automatically if iso/vendor/gpu_burn exists.
Post-boot smoke test
After booting a live ISO, run to verify all critical components:
ssh root@<ip> 'sh -s' < iso/builder/smoketest.sh
Exit code 0 = all required checks pass. All FAIL lines must be zero before shipping.
Key checks: NVIDIA modules loaded, nvidia-smi sees all GPUs, lib symlinks present, gcompat installed, services running, audit completed with NVIDIA enrichment, internet.
apkovl mechanism
The apkovl is a .tar.gz injected into the ISO at /boot/. Alpine initramfs extracts
it at boot, overlaying /etc, /usr, /root, /lib on the tmpfs root.
genapkovl-bee.sh generates the tarball containing:
/etc/apk/world— package list (apk installs on first boot)/etc/runlevels/*/— OpenRC service symlinks/etc/conf.d/dropbear—DROPBEAR_OPTS="-R -B"/etc/network/interfaces— lo only (bee-network handles DHCP)/etc/hostname- Everything from
iso/overlay/(init scripts, binaries, ssh keys, tui)
Collector flow
audit binary start
1. board collector (dmidecode -t 0,1,2)
2. cpu collector (dmidecode -t 4)
3. memory collector (dmidecode -t 17)
4. storage collector (lsblk -J, smartctl -j, nvme id-ctrl, nvme smart-log)
5. pcie collector (lspci -vmm -D, /sys/bus/pci/devices/)
6. psu collector (ipmitool fru — silent if no /dev/ipmi0)
7. nvidia enrichment (nvidia-smi — skipped if binary absent or driver not loaded)
8. output JSON → /var/log/bee-audit.json
9. QR summary to stdout (qrencode if available)
Every collector returns nil, nil on tool-not-found. Errors are logged, never fatal.