Both build-nvidia-module.sh (apk add) and mkimage.sh (--repository) now explicitly use dl-cdn. Local builder mirror config is ignored. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.9 KiB
Runtime Flows — bee
Network isolation — CRITICAL
The live CD runs in an isolated network segment with no internet access.
All binaries, kernel modules, and tools must be baked into the ISO at build time.
No apk add, no downloads, no package manager calls are allowed at boot.
DHCP is used only for LAN (operator SSH access). Internet is NOT available.
Boot sequence (single ISO)
OpenRC default runlevel, service start order:
localmount
├── bee-sshsetup (creates bee user, sets password; runs before dropbear)
│ └── dropbear (SSH on port 22 — starts without network)
├── bee-network (udhcpc -b on all physical interfaces, non-blocking)
│ └── bee-nvidia (insmod nvidia*.ko from /usr/local/lib/nvidia/,
│ creates libnvidia-ml.so.1 symlinks in /usr/lib/)
│ └── bee-audit (runs audit binary → /var/log/bee-audit.json)
Critical invariants:
- Dropbear MUST start without network.
bee-sshsetuphasneed localmountonly. bee-networkusesudhcpc -b(background) — retries indefinitely if no cable.bee-nvidialoads modules viainsmodwith absolute paths — NOTmodprobe. Reason: modloop squashfs mounts over/lib/modules/<kver>/at boot, making it read-only. The overlay's modules at that path are inaccessible. Modules are stored at/usr/local/lib/nvidia/(overlay path, always writable).bee-nvidiacreateslibnvidia-ml.so.1symlinks in/usr/lib/— required becausenvidia-smiis a glibc binary that looks for the soname symlink, not the versioned file.gcompatpackage provides/lib64/ld-linux-x86-64.so.2for glibc compat on Alpine musl.bee-auditusesafter bee-nvidia— ensures NVIDIA enrichment succeeds.bee-audituseseend 0always — never fails boot even if audit errors.
ISO build sequence
build.sh [--authorized-keys /path/to/keys]
1. compile audit binary (skip if .go files older than binary)
2. inject authorized_keys into overlay/root/.ssh/ (or set password fallback)
3. copy audit binary → overlay/usr/local/bin/audit
4. copy vendor binaries from iso/vendor/ → overlay/usr/local/bin/
(storcli64, sas2ircu, sas3ircu, mstflint, gpu_burn — each optional)
5. build-nvidia-module.sh:
a. apk add linux-lts-dev (always, to get current Alpine 3.21 kernel headers)
b. detect KVER from /usr/src/linux-headers-*
c. download NVIDIA .run installer (sha256 verified, cached in dist/)
d. extract installer
e. build kernel modules against linux-lts headers
f. create libnvidia-ml.so.1 / libcuda.so.1 symlinks in cache
g. cache in dist/nvidia-<version>-<kver>/
6. inject NVIDIA .ko → overlay/usr/local/lib/nvidia/
7. inject nvidia-smi → overlay/usr/local/bin/nvidia-smi
8. inject libnvidia-ml + libcuda → overlay/usr/lib/
9. write overlay/etc/bee-release (versions + git commit)
10. export BEE_BUILD_INFO for motd substitution
11. mkimage.sh (from /var/tmp, TMPDIR=/var/tmp):
kernel_* section — cached (linux-lts modloop)
apks_* section — cached (downloaded packages)
syslinux_* / grub_* — cached
apkovl — always regenerated (genapkovl-bee.sh)
final ISO — always assembled
Critical invariants:
KERNEL_PKG_VERSIONiniso/builder/VERSIONSpins the exact Alpine package version (e.g.6.12.76-r0). This version is used in THREE places that MUST stay in sync:build-nvidia-module.sh—apk add linux-lts-dev=${KERNEL_PKG_VERSION}(compile headers)mkimg.bee.sh—linux-lts=${KERNEL_PKG_VERSION}in apks list (ISO kernel)build.sh— build-time verification that headers match pin (fails loudly if not) When Alpine releases a new linux-lts patch (e.g. r0 → r1), update KERNEL_PKG_VERSION in VERSIONS — that's the only place to change. The build will fail loudly if the pin doesn't match the installed headers, so stale pins are caught immediately.
- All three must use the same APK mirror:
dl-cdn.alpinelinux.org. Bothbuild-nvidia-module.sh(apk add) andmkimage.sh(--repository) explicitly usehttps://dl-cdn.alpinelinux.org/alpine/v${ALPINE_VERSION}/main|community. Never use the builder's local/etc/apk/repositories— its mirror may serve a different package state, causing "unable to select package" failures. linux-lts-devis always installed (not conditional) — stale 6.6.x headers on the builder would cause modules to be built for the wrong kernel and never load at runtime.- NVIDIA modules go to
overlay/usr/local/lib/nvidia/— NOTlib/modules/<kver>/extra/. genapkovl-bee.shmust be copied to/var/tmp/(CWD when mkimage runs).TMPDIR=/var/tmprequired — tmpfs/tmpis only ~1GB, too small for kernel firmware.- Workdir cleanup preserves
apks_*,kernel_*,syslinux_*,grub_*cache dirs.
gpu_burn vendor binary
gpu_burn requires CUDA nvcc to build. It is NOT built as part of the main ISO build.
Build separately on the builder VM and place in iso/vendor/gpu_burn:
sh iso/builder/build-gpu-burn.sh dist/
cp dist/gpu_burn iso/vendor/gpu_burn
cp dist/compare.ptx iso/vendor/compare.ptx
Requires: CUDA 12.8+ (supports GCC 14, Alpine 3.21), libxml2, g++, make, git.
The build.sh will include it automatically if iso/vendor/gpu_burn exists.
Post-boot smoke test
After booting a live ISO, run to verify all critical components:
ssh root@<ip> 'sh -s' < iso/builder/smoketest.sh
Exit code 0 = all required checks pass. All FAIL lines must be zero before shipping.
Key checks: NVIDIA modules loaded, nvidia-smi sees all GPUs, lib symlinks present, gcompat installed, services running, audit completed with NVIDIA enrichment, internet.
apkovl mechanism
The apkovl is a .tar.gz injected into the ISO at /boot/. Alpine initramfs extracts
it at boot, overlaying /etc, /usr, /root, /lib on the tmpfs root.
genapkovl-bee.sh generates the tarball containing:
/etc/apk/world— package list (apk installs on first boot)/etc/runlevels/*/— OpenRC service symlinks/etc/conf.d/dropbear—DROPBEAR_OPTS="-R -B"/etc/network/interfaces— lo only (bee-network handles DHCP)/etc/hostname- Everything from
iso/overlay/(init scripts, binaries, ssh keys, tui)
Collector flow
audit binary start
1. board collector (dmidecode -t 0,1,2)
2. cpu collector (dmidecode -t 4)
3. memory collector (dmidecode -t 17)
4. storage collector (lsblk -J, smartctl -j, nvme id-ctrl, nvme smart-log)
5. pcie collector (lspci -vmm -D, /sys/bus/pci/devices/)
6. psu collector (ipmitool fru — silent if no /dev/ipmi0)
7. nvidia enrichment (nvidia-smi — skipped if binary absent or driver not loaded)
8. output JSON → /var/log/bee-audit.json
9. QR summary to stdout (qrencode if available)
Every collector returns nil, nil on tool-not-found. Errors are logged, never fatal.