build-nvidia-module.sh:
- Replace silent glob cp for libnvidia-ml/libcuda with find + explicit error
if library not found in extract dir (catches installer layout changes)
- Fix circular symlink bug: don't create .so.1 -> .so.1 if versioned file
is already named .so.1
- Verify .ko count > 0 after build, fail loudly if none produced
- Show lib cache in final summary
bee-nvidia:
- mknod failures are now logged with ewarn instead of silently suppressed
- If nvidia not in /proc/devices (no GPU hardware), log clearly and exit clean
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gcompat alone provides only the ELF interpreter entry point (/lib64/ld-linux-x86-64.so.2).
It does NOT provide libpthread.so.0, libm.so.6, libdl.so.2, libc.so.6 stubs.
libnvidia-ml.so.590 has NEEDED: libpthread.so.0 etc. When nvidia-smi calls
dlopen("libnvidia-ml.so.1"), musl's linker fails to satisfy these deps
→ NVML_ERROR_LIBRARY_NOT_FOUND (exit 12), "couldn't find libnvidia-ml.so".
libc6-compat provides the missing stubs (libpthread.so.0, libm.so.6, libdl.so.2,
libc.so.6, librt.so.1) as musl redirects, enabling dlopen of glibc shared objects.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Alpine uses mdev which has no rules for NVIDIA devices. Without /dev/nvidiactl
and /dev/nvidia{0-7}, nvidia-smi returns NVML_ERROR_LIBRARY_NOT_FOUND (exit 12)
even though kernel modules are loaded and libraries are present.
Fix: after insmod, read major numbers from /proc/devices and mknod the required
character devices (/dev/nvidiactl, /dev/nvidia{0-7}, /dev/nvidia-uvm).
Add /dev/nvidia* node checks to smoketest for earlier failure detection.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The static KERNEL_PKG_VERSION pin was the root cause of nvidia-smi never
working: modules were compiled for pinned version (e.g. 6.12.76-r0) but
the ISO kernel was unpinned (latest from repo at build time). When Alpine
updated linux-lts, the two diverged silently.
Fix: both steps now use whatever linux-lts is current in Alpine 3.21 main
at build time. build-nvidia-module.sh uses `apk add --update linux-lts-dev`
(no version pin), mkimage gets the same package from the same mirror.
Module cache is still keyed by detected KVER so rebuilds remain fast.
Removed: KERNEL_VERSION, KERNEL_PKG_VERSION from VERSIONS, all pin references
from build.sh and build-nvidia-module.sh.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bee-audit init.d: use --output file: so "audit output written" is logged
(stdout mode silently redirects, never emits the slog confirmation)
- build-nvidia-module.sh: use $KERNEL_SRC in find for .ko collection
(was hardcoded $EXTRACT_DIR/kernel, silent failure if path differs)
- smoketest: add bee-audit to required services (was never checked)
- smoketest: remove legacy bee-audit-debug from service list
- smoketest: internet ping → warn (live CD runs in isolated network, no internet)
- build.sh: auto-copy smoketest.sh → overlay/usr/local/bin/bee-smoketest
(removes manual sync hazard; smoketest.sh is now single source of truth)
- remove static overlay/usr/local/bin/bee-smoketest (generated by build.sh now)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stale apks_* dirs (from old mirror or previous version pin) cause
"unable to select package" failures. Nuke them on every build.
kernel_*, syslinux_*, grub_* are still preserved — they're large,
stable, and only need to change when KERNEL_PKG_VERSION changes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mkimage checks CWD (/var/tmp) before ~/.mkimage/ for genapkovl scripts.
Old genapkovl-bee.sh left in /var/tmp from previous builds was overriding
the updated version, causing bee-audit-debug to persist in runlevel.
Also add gcompat to apk world so it's installed at boot (was in apks cache
but missing from world file, so nvidia-smi failed with missing ld-linux).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
linux-lts in apks conflicted with mkimage's own kernel download via
kernel_flavors="lts". The kernel is embedded in the ISO via modloop,
not via apks. Pinning it in apks caused "unable to select package".
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both build-nvidia-module.sh (apk add) and mkimage.sh (--repository) now
explicitly use dl-cdn. Local builder mirror config is ignored.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause of linux-lts pin failure: mkimage was using dl-cdn.alpinelinux.org
while the builder uses mirrors.hosterion.ro — different mirrors can have different
package availability at any given moment.
Now mkimage reads repositories directly from /etc/apk/repositories on the builder,
ensuring both module build and ISO package install use the same mirror.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mkimage.sh calls git internally. Running it from inside /root/bee causes
"outside repository" fatal errors. /var/tmp is outside the git repo.
genapkovl is found via ~/.mkimage/ so no copy to /var/tmp needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pin linux-lts to exact KERNEL_PKG_VERSION=6.12.76-r0 in build and ISO package list
- Add build-time verification that compiled kernel version matches pin (fails loudly)
- Fix bee-audit-debug → bee-audit in genapkovl OpenRC registration (service was never starting)
- Add AUDIT_VERSION=0.1.0 to VERSIONS (was undefined, bee-release had empty fields)
- Pin linux-lts-dev version in second apk add in build-nvidia-module.sh
- Add /root/.profile to overlay so /usr/local/bin is in PATH for SSH sessions
- Remove "DEBUG MODE" from motd
- Fix smoketest: grep for slog "audit output written" instead of non-existent "audit completed"
- Document no-internet constraint in system-overview and runtime-flows
- Remove redundant genapkovl copy to /var/tmp (now found via ~/.mkimage/)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without this, old mkimg.bee_debug.sh left from previous builds
causes mkimage to build both bee and bee_debug profiles.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gpu_burn requires CUDA toolkit (~4GB) to build and the resulting binary
would significantly inflate the ISO. Removed from vendor tool list and
smoketest. build-gpu-burn.sh dropped as well.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
## ISO build consolidation
- Remove separate debug/prod split: overlay-debug/, build-debug.sh,
mkimg.bee_debug.sh, genapkovl-bee_debug.sh all deleted
- Single overlay: iso/overlay/ (was overlay-debug content)
- Single build script: build.sh (SSH, TUI, NVIDIA, vendor tools, bee-release)
- Single mkimage profile: bee (with dropbear, dialog, strace, gcompat, etc.)
## NVIDIA fixes
- Modules now stored at /usr/local/lib/nvidia/ instead of
/lib/modules/<kver>/extra/nvidia/ — modloop squashfs mounts over that
path at boot making overlay content there inaccessible
- bee-nvidia init: load via insmod (absolute path), not modprobe
- bee-nvidia init: create libnvidia-ml.so.1/libcuda.so.1 symlinks in /usr/lib/
- build-nvidia-module.sh: always install linux-lts-dev (not conditional) —
stale 6.6.x headers caused wrong-kernel modules that never loaded at runtime
- build-nvidia-module.sh: create soname symlinks in cache
- KERNEL_VERSION in VERSIONS updated 6.6 → 6.12
- gcompat added to ISO packages (nvidia-smi is a glibc binary on musl Alpine)
## Service ordering
- bee-audit: add `after bee-nvidia` so NVIDIA enrichment always succeeds
## New tooling
- iso/builder/smoketest.sh: SSH smoke test for post-boot ISO validation
- iso/builder/build-gpu-burn.sh: builds gpu_burn vendor binary (CUDA 12.8+)
- vendor/gpu_burn included automatically if placed in iso/vendor/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sed -i on overlay/etc/motd caused git pull conflict on next build.
Now BEE_BUILD_INFO is exported and substituted in $tmp copy only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- genapkovl now explicitly chmod +x init.d/* and usr/local/bin/* after cp
- add bee-net-restart command (short name, no .sh) and /etc/profile.d/bee.sh for PATH
- udhcpc: add & to ensure non-blocking even when DHCP responds immediately
- motd: short commands without paths
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bee-* init.d scripts had mode 644 in git — OpenRC silently skipped them,
causing bee-network/bee-nvidia/bee-audit to never start at boot
- bee-network.sh also lacked executable bit
- Remove -q from udhcpc (was quitting after first lease, no renewal)
- Add autologin root on tty1 via /etc/inittab
- Inject build date + git commit + versions into motd at build time
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
modloop was not mounting because:
1. modloop=/boot/modloop-lts was missing from kernel cmdline
2. lz4-compressed squashfs may not be supported by Alpine initramfs
Both issues result in /lib/modules not existing and all modprobe failing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 550.54.15 did not exist on NVIDIA CDN (404)
- updated to 590.48.01 (latest stable, 396MB)
- download sha256sum file first, verify installer before extracting
- re-download if file is missing, empty, or sha256 mismatch
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Long builds (NVIDIA driver download+compile) would abort on SSH timeout.
Now build runs in a detached screen session on the VM, run-builder.sh
streams the log and waits for completion safely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Builds kernel modules from the official NVIDIA installer source tree,
same as a standard NVIDIA driver install. No open-gpu-kernel-modules.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- build-nvidia-module.sh: downloads nvidia open-gpu-kernel-modules source,
builds against linux-lts headers, extracts nvidia-smi from .run installer
- modules cached by driver version + kernel version (rebuild only on update)
- .ko files injected into ISO overlay at /lib/modules/<kver>/extra/nvidia/
- bee-nvidia init script loads nvidia/nvidia-modeset/nvidia-uvm at boot
- NVIDIA_DRIVER_VERSION=550.54.15 (Turing+, H100/A100 supported)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
xz → lz4 for mksquashfs: kernel modloop rebuild is ~10x faster.
Size increase is acceptable since modloop is loaded into RAM.
Applied in both setup-builder.sh and build-debug.sh.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
kernel_* workdir sections were being deleted alongside other non-apks dirs.
Now both apks_* and kernel_* are preserved — kernel modloop squashfs won't
be rebuilt unless the kernel version changes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- audit binary is only rebuilt when .go files are newer than the binary
- rsync replaces scp for ISO download (delta transfer on repeat builds)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep apks_* workdir sections so packages aren't re-downloaded on each build.
Only non-apks sections (kernel, apkovl, final image) are cleaned to pick up changes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- add .DS_Store to .gitignore and remove tracked files
- copy genapkovl-bee_debug.sh to /var/tmp before mkimage (was causing "no such file" error)
- switch udhcpc to background mode (-b -t 0) so network comes up when cable connected after boot
- add -B to DROPBEAR_OPTS to allow password fallback (bee/eeb)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>