105 lines
5.1 KiB
Markdown
105 lines
5.1 KiB
Markdown
# Runtime Flows — bee
|
|
|
|
## Network isolation — CRITICAL
|
|
|
|
**The live CD runs in an isolated network segment with no internet access.**
|
|
All binaries, kernel modules, and tools must be baked into the ISO at build time.
|
|
No package installation, no downloads, and no package manager calls are allowed at boot.
|
|
DHCP is used only for LAN (operator SSH access). Internet is NOT available.
|
|
|
|
## Boot sequence (single ISO)
|
|
|
|
`systemd` boot order:
|
|
|
|
```
|
|
local-fs.target
|
|
├── bee-sshsetup.service (enables SSH key auth; password fallback only if marker exists)
|
|
│ └── ssh.service (OpenSSH on port 22 — starts without network)
|
|
├── bee-network.service (starts `dhclient -nw` on all physical interfaces, non-blocking)
|
|
├── bee-nvidia.service (insmod nvidia*.ko from /usr/local/lib/nvidia/,
|
|
│ creates /dev/nvidia* nodes)
|
|
└── bee-audit.service (runs `bee audit` → /var/log/bee-audit.json,
|
|
never blocks boot on partial collector failures)
|
|
```
|
|
|
|
**Critical invariants:**
|
|
- OpenSSH MUST start without network. `bee-sshsetup.service` runs before `ssh.service`.
|
|
- `bee-network.service` uses `dhclient -nw` (background) — network bring-up is best effort and non-blocking.
|
|
- `bee-nvidia.service` loads modules via `insmod` with absolute paths — NOT `modprobe`.
|
|
Reason: the modules are shipped in the ISO overlay under `/usr/local/lib/nvidia/`, not in the host module tree.
|
|
- `bee-audit.service` does not wait for `network-online.target`; audit is local and must run even if DHCP is broken.
|
|
- `bee-audit.service` logs audit failures but does not turn partial collector problems into a boot blocker.
|
|
|
|
## ISO build sequence
|
|
|
|
```
|
|
build.sh [--authorized-keys /path/to/keys]
|
|
1. compile `bee` binary (skip if .go files older than binary)
|
|
2. create a temporary overlay staging dir under `dist/`
|
|
3. inject authorized_keys into staged `root/.ssh/` (or set password fallback marker)
|
|
4. copy `bee` binary → staged `/usr/local/bin/bee`
|
|
5. copy vendor binaries from `iso/vendor/` → staged `/usr/local/bin/`
|
|
(`storcli64`, `sas2ircu`, `sas3ircu`, `mstflint` — each optional)
|
|
6. `build-nvidia-module.sh`:
|
|
a. install Debian kernel headers if missing
|
|
b. download NVIDIA `.run` installer (sha256 verified, cached in `dist/`)
|
|
c. extract installer
|
|
d. build kernel modules against Debian headers
|
|
e. create `libnvidia-ml.so.1` / `libcuda.so.1` symlinks in cache
|
|
f. cache in `dist/nvidia-<version>-<kver>/`
|
|
7. inject NVIDIA `.ko` → staged `/usr/local/lib/nvidia/`
|
|
8. inject `nvidia-smi` → staged `/usr/local/bin/nvidia-smi`
|
|
9. inject `libnvidia-ml` + `libcuda` → staged `/usr/lib/`
|
|
10. write staged `/etc/bee-release` (versions + git commit)
|
|
11. patch staged `motd` with build metadata
|
|
12. copy `iso/builder/` into a temporary live-build workdir under `dist/`
|
|
13. sync staged overlay into workdir `config/includes.chroot/`
|
|
14. run `lb config && lb build` inside the temporary workdir
|
|
(either on a Debian host/VM or inside the privileged builder container)
|
|
```
|
|
|
|
**Critical invariants:**
|
|
- `DEBIAN_KERNEL_ABI` in `iso/builder/VERSIONS` pins the exact kernel ABI used in BOTH places:
|
|
1. `setup-builder.sh` / `build-in-container.sh` / `build-nvidia-module.sh` — Debian kernel headers for module build
|
|
2. `auto/config` — `linux-image-${DEBIAN_KERNEL_ABI}` in the ISO
|
|
- NVIDIA modules go to staged `usr/local/lib/nvidia/` — NOT to `/lib/modules/<kver>/extra/`.
|
|
- The source overlay in `iso/overlay/` is treated as immutable source. Build-time files are injected only into the staged overlay.
|
|
- The live-build workdir under `dist/` is disposable; source files under `iso/builder/` stay clean.
|
|
- Container build requires `--privileged` because `live-build` uses mounts/chroots/loop devices during ISO assembly.
|
|
|
|
## Post-boot smoke test
|
|
|
|
After booting a live ISO, run to verify all critical components:
|
|
|
|
```sh
|
|
ssh root@<ip> 'sh -s' < iso/builder/smoketest.sh
|
|
```
|
|
|
|
Exit code 0 = all required checks pass. All `FAIL` lines must be zero before shipping.
|
|
|
|
Key checks: NVIDIA modules loaded, `nvidia-smi` sees all GPUs, lib symlinks present,
|
|
systemd services running, audit completed with NVIDIA enrichment, LAN reachability.
|
|
|
|
## Overlay mechanism
|
|
|
|
`live-build` copies files from `config/includes.chroot/` into the ISO filesystem.
|
|
`build.sh` prepares a staged overlay, then syncs it into a temporary workdir's
|
|
`config/includes.chroot/` before running `lb build`.
|
|
|
|
## Collector flow
|
|
|
|
```
|
|
`bee audit` start
|
|
1. board collector (dmidecode -t 0,1,2)
|
|
2. cpu collector (dmidecode -t 4)
|
|
3. memory collector (dmidecode -t 17)
|
|
4. storage collector (lsblk -J, smartctl -j, nvme id-ctrl, nvme smart-log)
|
|
5. pcie collector (lspci -vmm -D, /sys/bus/pci/devices/)
|
|
6. psu collector (ipmitool fru — silent if no /dev/ipmi0)
|
|
7. nvidia enrichment (nvidia-smi — skipped if binary absent or driver not loaded)
|
|
8. output JSON → /var/log/bee-audit.json
|
|
9. QR summary to stdout (qrencode if available)
|
|
```
|
|
|
|
Every collector returns `nil, nil` on tool-not-found. Errors are logged, never fatal.
|