Files
bee/bible-local/architecture/runtime-flows.md
Mikhail Chusavitin 1feb956e30 Fix: use dl-cdn.alpinelinux.org everywhere for consistent package resolution
Both build-nvidia-module.sh (apk add) and mkimage.sh (--repository) now
explicitly use dl-cdn. Local builder mirror config is ignored.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 11:28:14 +03:00

6.9 KiB

Runtime Flows — bee

Network isolation — CRITICAL

The live CD runs in an isolated network segment with no internet access. All binaries, kernel modules, and tools must be baked into the ISO at build time. No apk add, no downloads, no package manager calls are allowed at boot. DHCP is used only for LAN (operator SSH access). Internet is NOT available.

Boot sequence (single ISO)

OpenRC default runlevel, service start order:

localmount
  ├── bee-sshsetup   (creates bee user, sets password; runs before dropbear)
  │     └── dropbear  (SSH on port 22 — starts without network)
  ├── bee-network    (udhcpc -b on all physical interfaces, non-blocking)
  │     └── bee-nvidia  (insmod nvidia*.ko from /usr/local/lib/nvidia/,
  │                      creates libnvidia-ml.so.1 symlinks in /usr/lib/)
  │           └── bee-audit  (runs audit binary → /var/log/bee-audit.json)

Critical invariants:

  • Dropbear MUST start without network. bee-sshsetup has need localmount only.
  • bee-network uses udhcpc -b (background) — retries indefinitely if no cable.
  • bee-nvidia loads modules via insmod with absolute paths — NOT modprobe. Reason: modloop squashfs mounts over /lib/modules/<kver>/ at boot, making it read-only. The overlay's modules at that path are inaccessible. Modules are stored at /usr/local/lib/nvidia/ (overlay path, always writable).
  • bee-nvidia creates libnvidia-ml.so.1 symlinks in /usr/lib/ — required because nvidia-smi is a glibc binary that looks for the soname symlink, not the versioned file.
  • gcompat package provides /lib64/ld-linux-x86-64.so.2 for glibc compat on Alpine musl.
  • bee-audit uses after bee-nvidia — ensures NVIDIA enrichment succeeds.
  • bee-audit uses eend 0 always — never fails boot even if audit errors.

ISO build sequence

build.sh [--authorized-keys /path/to/keys]
  1. compile audit binary (skip if .go files older than binary)
  2. inject authorized_keys into overlay/root/.ssh/ (or set password fallback)
  3. copy audit binary → overlay/usr/local/bin/audit
  4. copy vendor binaries from iso/vendor/ → overlay/usr/local/bin/
     (storcli64, sas2ircu, sas3ircu, mstflint, gpu_burn — each optional)
  5. build-nvidia-module.sh:
       a. apk add linux-lts-dev (always, to get current Alpine 3.21 kernel headers)
       b. detect KVER from /usr/src/linux-headers-*
       c. download NVIDIA .run installer (sha256 verified, cached in dist/)
       d. extract installer
       e. build kernel modules against linux-lts headers
       f. create libnvidia-ml.so.1 / libcuda.so.1 symlinks in cache
       g. cache in dist/nvidia-<version>-<kver>/
  6. inject NVIDIA .ko → overlay/usr/local/lib/nvidia/
  7. inject nvidia-smi → overlay/usr/local/bin/nvidia-smi
  8. inject libnvidia-ml + libcuda → overlay/usr/lib/
  9. write overlay/etc/bee-release (versions + git commit)
  10. export BEE_BUILD_INFO for motd substitution
  11. mkimage.sh (from /var/tmp, TMPDIR=/var/tmp):
        kernel_* section  — cached (linux-lts modloop)
        apks_* section    — cached (downloaded packages)
        syslinux_* / grub_* — cached
        apkovl            — always regenerated (genapkovl-bee.sh)
        final ISO         — always assembled

Critical invariants:

  • KERNEL_PKG_VERSION in iso/builder/VERSIONS pins the exact Alpine package version (e.g. 6.12.76-r0). This version is used in THREE places that MUST stay in sync:
    1. build-nvidia-module.shapk add linux-lts-dev=${KERNEL_PKG_VERSION} (compile headers)
    2. mkimg.bee.shlinux-lts=${KERNEL_PKG_VERSION} in apks list (ISO kernel)
    3. build.sh — build-time verification that headers match pin (fails loudly if not) When Alpine releases a new linux-lts patch (e.g. r0 → r1), update KERNEL_PKG_VERSION in VERSIONS — that's the only place to change. The build will fail loudly if the pin doesn't match the installed headers, so stale pins are caught immediately.
  • All three must use the same APK mirror: dl-cdn.alpinelinux.org. Both build-nvidia-module.sh (apk add) and mkimage.sh (--repository) explicitly use https://dl-cdn.alpinelinux.org/alpine/v${ALPINE_VERSION}/main|community. Never use the builder's local /etc/apk/repositories — its mirror may serve a different package state, causing "unable to select package" failures.
  • linux-lts-dev is always installed (not conditional) — stale 6.6.x headers on the builder would cause modules to be built for the wrong kernel and never load at runtime.
  • NVIDIA modules go to overlay/usr/local/lib/nvidia/ — NOT lib/modules/<kver>/extra/.
  • genapkovl-bee.sh must be copied to /var/tmp/ (CWD when mkimage runs).
  • TMPDIR=/var/tmp required — tmpfs /tmp is only ~1GB, too small for kernel firmware.
  • Workdir cleanup preserves apks_*, kernel_*, syslinux_*, grub_* cache dirs.

gpu_burn vendor binary

gpu_burn requires CUDA nvcc to build. It is NOT built as part of the main ISO build. Build separately on the builder VM and place in iso/vendor/gpu_burn:

sh iso/builder/build-gpu-burn.sh dist/
cp dist/gpu_burn iso/vendor/gpu_burn
cp dist/compare.ptx iso/vendor/compare.ptx

Requires: CUDA 12.8+ (supports GCC 14, Alpine 3.21), libxml2, g++, make, git. The build.sh will include it automatically if iso/vendor/gpu_burn exists.

Post-boot smoke test

After booting a live ISO, run to verify all critical components:

ssh root@<ip> 'sh -s' < iso/builder/smoketest.sh

Exit code 0 = all required checks pass. All FAIL lines must be zero before shipping.

Key checks: NVIDIA modules loaded, nvidia-smi sees all GPUs, lib symlinks present, gcompat installed, services running, audit completed with NVIDIA enrichment, internet.

apkovl mechanism

The apkovl is a .tar.gz injected into the ISO at /boot/. Alpine initramfs extracts it at boot, overlaying /etc, /usr, /root, /lib on the tmpfs root.

genapkovl-bee.sh generates the tarball containing:

  • /etc/apk/world — package list (apk installs on first boot)
  • /etc/runlevels/*/ — OpenRC service symlinks
  • /etc/conf.d/dropbearDROPBEAR_OPTS="-R -B"
  • /etc/network/interfaces — lo only (bee-network handles DHCP)
  • /etc/hostname
  • Everything from iso/overlay/ (init scripts, binaries, ssh keys, tui)

Collector flow

audit binary start
  1. board collector   (dmidecode -t 0,1,2)
  2. cpu collector     (dmidecode -t 4)
  3. memory collector  (dmidecode -t 17)
  4. storage collector (lsblk -J, smartctl -j, nvme id-ctrl, nvme smart-log)
  5. pcie collector    (lspci -vmm -D, /sys/bus/pci/devices/)
  6. psu collector     (ipmitool fru — silent if no /dev/ipmi0)
  7. nvidia enrichment (nvidia-smi — skipped if binary absent or driver not loaded)
  8. output JSON → /var/log/bee-audit.json
  9. QR summary to stdout (qrencode if available)

Every collector returns nil, nil on tool-not-found. Errors are logged, never fatal.