113 Commits

Author SHA1 Message Date
Mikhail Chusavitin
699c8d2473 docs: document kernel pin and mirror invariants in runtime-flows
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 11:23:04 +03:00
Mikhail Chusavitin
ac6aeefa1a Fix: use builder's own mirror for mkimage, not dl-cdn
Root cause of linux-lts pin failure: mkimage was using dl-cdn.alpinelinux.org
while the builder uses mirrors.hosterion.ro — different mirrors can have different
package availability at any given moment.

Now mkimage reads repositories directly from /etc/apk/repositories on the builder,
ensuring both module build and ISO package install use the same mirror.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 11:22:31 +03:00
Mikhail Chusavitin
cdc2996cd3 Fix mkimage git conflict: cd /var/tmp before running mkimage.sh
mkimage.sh calls git internally. Running it from inside /root/bee causes
"outside repository" fatal errors. /var/tmp is outside the git repo.
genapkovl is found via ~/.mkimage/ so no copy to /var/tmp needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 11:16:56 +03:00
Mikhail Chusavitin
5bc6d3da42 Add bee-smoketest to ISO overlay
Run directly on live CD: bee-smoketest

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 10:54:25 +03:00
Mikhail Chusavitin
ffc7e5c71a Fix critical ISO build bugs: kernel pinning, service registration, PATH, audit checks
- Pin linux-lts to exact KERNEL_PKG_VERSION=6.12.76-r0 in build and ISO package list
- Add build-time verification that compiled kernel version matches pin (fails loudly)
- Fix bee-audit-debug → bee-audit in genapkovl OpenRC registration (service was never starting)
- Add AUDIT_VERSION=0.1.0 to VERSIONS (was undefined, bee-release had empty fields)
- Pin linux-lts-dev version in second apk add in build-nvidia-module.sh
- Add /root/.profile to overlay so /usr/local/bin is in PATH for SSH sessions
- Remove "DEBUG MODE" from motd
- Fix smoketest: grep for slog "audit output written" instead of non-existent "audit completed"
- Document no-internet constraint in system-overview and runtime-flows
- Remove redundant genapkovl copy to /var/tmp (now found via ~/.mkimage/)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 10:52:54 +03:00
Mikhail Chusavitin
493ccea415 Clear ~/.mkimage before build to prevent stale profiles
Without this, old mkimg.bee_debug.sh left from previous builds
causes mkimage to build both bee and bee_debug profiles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 10:05:06 +03:00
Mikhail Chusavitin
0a13463e94 Fix misleading password fallback message in build.sh
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 09:51:16 +03:00
Mikhail Chusavitin
a2b2cb23bc Fix run-builder.sh: update overlay and build script paths
overlay-debug → overlay, build-debug.sh → build.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 09:50:18 +03:00
Mikhail Chusavitin
b8135a19df Remove leftover debug/prod split files from tracking
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 09:49:11 +03:00
Mikhail Chusavitin
1e98428be8 Add nvidia-bug-report.sh to ISO and fix GPU diagnostic pack in bee-tui
- build-nvidia-module.sh: extract nvidia-bug-report.sh from .run installer
- build.sh: copy nvidia-bug-report.sh into overlay/usr/local/bin/
- bee-tui: pass --output directly to nvidia-bug-report.sh so log goes
  into the run_dir archive instead of CWD; remove redundant cp step

GPU diagnostic pack in TUI (System acceptance tests → GPU NVIDIA → Run command pack):
  nvidia-smi -q, dmidecode -t baseboard, dmidecode -t system, nvidia-bug-report.sh
All logs archived to /var/log/bee-sat/gpu-nvidia-<ts>.tar.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 09:48:27 +03:00
Mikhail Chusavitin
240c33f6a1 Add backlog with GPU stress test task
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 09:45:51 +03:00
Mikhail Chusavitin
1eeee46a34 Remove gpu_burn from ISO build — binary too large
gpu_burn requires CUDA toolkit (~4GB) to build and the resulting binary
would significantly inflate the ISO. Removed from vendor tool list and
smoketest. build-gpu-burn.sh dropped as well.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 20:17:30 +03:00
Mikhail Chusavitin
1768bb58dd Merge debug/prod into single ISO build, fix NVIDIA module loading
## ISO build consolidation
- Remove separate debug/prod split: overlay-debug/, build-debug.sh,
  mkimg.bee_debug.sh, genapkovl-bee_debug.sh all deleted
- Single overlay: iso/overlay/ (was overlay-debug content)
- Single build script: build.sh (SSH, TUI, NVIDIA, vendor tools, bee-release)
- Single mkimage profile: bee (with dropbear, dialog, strace, gcompat, etc.)

## NVIDIA fixes
- Modules now stored at /usr/local/lib/nvidia/ instead of
  /lib/modules/<kver>/extra/nvidia/ — modloop squashfs mounts over that
  path at boot making overlay content there inaccessible
- bee-nvidia init: load via insmod (absolute path), not modprobe
- bee-nvidia init: create libnvidia-ml.so.1/libcuda.so.1 symlinks in /usr/lib/
- build-nvidia-module.sh: always install linux-lts-dev (not conditional) —
  stale 6.6.x headers caused wrong-kernel modules that never loaded at runtime
- build-nvidia-module.sh: create soname symlinks in cache
- KERNEL_VERSION in VERSIONS updated 6.6 → 6.12
- gcompat added to ISO packages (nvidia-smi is a glibc binary on musl Alpine)

## Service ordering
- bee-audit: add `after bee-nvidia` so NVIDIA enrichment always succeeds

## New tooling
- iso/builder/smoketest.sh: SSH smoke test for post-boot ISO validation
- iso/builder/build-gpu-burn.sh: builds gpu_burn vendor binary (CUDA 12.8+)
- vendor/gpu_burn included automatically if placed in iso/vendor/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 20:14:18 +03:00
Mikhail Chusavitin
0907ba07c3 debug iso: add menu command to relaunch tui 2026-03-06 19:49:35 +03:00
Mikhail Chusavitin
94b305f166 Switch debug TUI menus to dialog and include dialog package 2026-03-06 17:57:40 +03:00
Mikhail Chusavitin
f84ec9320c Fix NVIDIA module version selection and add load diagnostics 2026-03-06 17:30:41 +03:00
Mikhail Chusavitin
a55b4108d5 Add wget/curl fallback for vendor and update downloads 2026-03-06 14:45:50 +03:00
Mikhail Chusavitin
18b8c69bc5 Implement audit enrichments, TUI workflows, and production ISO scaffold 2026-03-06 11:56:26 +03:00
bdfb6a0a79 fix: reset VM working tree before pull to clear stale build artifacts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 23:11:18 +03:00
867565cbf8 fix: inject motd build info in genapkovl tmp, not overlay on disk
sed -i on overlay/etc/motd caused git pull conflict on next build.
Now BEE_BUILD_INFO is exported and substituted in $tmp copy only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 23:11:07 +03:00
b72688cf2c fix: chmod +x overlay scripts on builder VM after git pull
macOS does not reliably apply git file mode changes on disk.
Run chmod explicitly on the VM where it matters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 23:03:35 +03:00
e8e09e9063 fix: chmod +x in genapkovl to fix permissions regardless of git filemode on VM
- genapkovl now explicitly chmod +x init.d/* and usr/local/bin/* after cp
- add bee-net-restart command (short name, no .sh) and /etc/profile.d/bee.sh for PATH
- udhcpc: add & to ensure non-blocking even when DHCP responds immediately
- motd: short commands without paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 22:59:28 +03:00
63c608711d fix: use agetty --autologin instead of busybox getty -a (unsupported flag)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 22:48:13 +03:00
eecd0799a0 fix: check local/remote sync before building to prevent building stale code
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 22:44:22 +03:00
fd071e28db fix: include build-debug.sh and motd changes missed from previous commit
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 22:43:23 +03:00
c908809991 fix: init scripts not executable, add autologin and build version in motd
- bee-* init.d scripts had mode 644 in git — OpenRC silently skipped them,
  causing bee-network/bee-nvidia/bee-audit to never start at boot
- bee-network.sh also lacked executable bit
- Remove -q from udhcpc (was quitting after first lease, no renewal)
- Add autologin root on tty1 via /etc/inittab
- Inject build date + git commit + versions into motd at build time

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 22:33:45 +03:00
2235a89364 fix: add modloop= to cmdline, revert lz4 compression
modloop was not mounting because:
1. modloop=/boot/modloop-lts was missing from kernel cmdline
2. lz4-compressed squashfs may not be supported by Alpine initramfs

Both issues result in /lib/modules not existing and all modprobe failing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:23:26 +03:00
871c766194 docs: add bible-local with architecture and decisions, fix PLAN.md versions
- bible-local/architecture/system-overview.md: scope, tech stack, key paths
- bible-local/architecture/runtime-flows.md: boot sequence, ISO build, collector flow
- bible-local/decisions/2026-03-05-nvidia-proprietary-driver.md
- PLAN.md: update KERNEL_VERSION 6.6→6.12, NVIDIA 550.54.15→590.48.01

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:15:07 +03:00
559fc2961d fix: update NVIDIA to 590.48.01, add sha256 verification for installer
- 550.54.15 did not exist on NVIDIA CDN (404)
- updated to 590.48.01 (latest stable, 396MB)
- download sha256sum file first, verify installer before extracting
- re-download if file is missing, empty, or sha256 mismatch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:10:31 +03:00
e5c1ef2c33 fix: run build in screen session to survive SSH disconnects
Long builds (NVIDIA driver download+compile) would abort on SSH timeout.
Now build runs in a detached screen session on the VM, run-builder.sh
streams the log and waits for completion safely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:07:17 +03:00
d4a2d7fa55 fix: use proprietary NVIDIA .run installer instead of open kernel modules
Builds kernel modules from the official NVIDIA installer source tree,
same as a standard NVIDIA driver install. No open-gpu-kernel-modules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:05:57 +03:00
ec9c65e20e feat: build NVIDIA open kernel modules during ISO build
- build-nvidia-module.sh: downloads nvidia open-gpu-kernel-modules source,
  builds against linux-lts headers, extracts nvidia-smi from .run installer
- modules cached by driver version + kernel version (rebuild only on update)
- .ko files injected into ISO overlay at /lib/modules/<kver>/extra/nvidia/
- bee-nvidia init script loads nvidia/nvidia-modeset/nvidia-uvm at boot
- NVIDIA_DRIVER_VERSION=550.54.15 (Turing+, H100/A100 supported)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:01:11 +03:00
5475a0aa77 fix: fall back to scp if rsync not available on builder VM
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 17:44:42 +03:00
fdbf533e6c fix: replace linux-firmware-nfp with linux-firmware-netronome (correct package name)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 17:41:19 +03:00
47d717955c fix: add NIC firmware packages 2026-03-05 17:40:09 +03:00
bd9279f96d perf: use lz4 compression for modloop squashfs
xz → lz4 for mksquashfs: kernel modloop rebuild is ~10x faster.
Size increase is acceptable since modloop is loaded into RAM.
Applied in both setup-builder.sh and build-debug.sh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 16:23:55 +03:00
34faddb9d5 perf: cache syslinux and grub sections between builds
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 16:22:23 +03:00
836c098044 perf: also cache kernel modloop between builds
kernel_* workdir sections were being deleted alongside other non-apks dirs.
Now both apks_* and kernel_* are preserved — kernel modloop squashfs won't
be rebuilt unless the kernel version changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 16:21:43 +03:00
413f188278 perf: skip go rebuild if sources unchanged, use rsync for ISO download
- audit binary is only rebuilt when .go files are newer than the binary
- rsync replaces scp for ISO download (delta transfer on repeat builds)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 16:21:14 +03:00
bb4ceab452 perf: cache apk packages between ISO builds
Keep apks_* workdir sections so packages aren't re-downloaded on each build.
Only non-apks sections (kernel, apkovl, final image) are cleaned to pick up changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 16:20:07 +03:00
ec1a96976b chore: ignore .DS_Store, remove from tracking, fix genapkovl path in build, udhcpc background mode
- add .DS_Store to .gitignore and remove tracked files
- copy genapkovl-bee_debug.sh to /var/tmp before mkimage (was causing "no such file" error)
- switch udhcpc to background mode (-b -t 0) so network comes up when cable connected after boot
- add -B to DROPBEAR_OPTS to allow password fallback (bee/eeb)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 16:18:26 +03:00
279ef318e1 fix: genapkovl copy to /var/tmp, udhcpc background mode 2026-03-05 16:17:52 +03:00
40815161fe fix: clean workdir before build so apkovl changes are always applied 2026-03-05 15:05:42 +03:00
8c0e66c3ef fix: copy genapkovl-bee_debug.sh to ~/.mkimage in build-debug.sh 2026-03-05 15:01:20 +03:00
8502100074 fix: dropbear/network boot ordering — dropbear starts without network
- dropbear: custom init removes 'need net', only needs localmount + bee-sshsetup
- bee-network: removed 'before dropbear' dependency
- bee-network.sh: removed set -e so single iface failure does not abort script
2026-03-05 14:59:23 +03:00
ab22e3ad74 add: NVMe wear telemetry via nvme smart-log (1.8b) 2026-03-05 14:55:53 +03:00
e79f972fb5 add: PSU collector (1.7) via ipmitool fru, skips gracefully without IPMI 2026-03-05 14:54:12 +03:00
55f6098a17 add: memory, storage, pcie collectors (1.4-1.6) — tested on real hardware 2026-03-05 14:50:34 +03:00
569bbf8909 fix: add interfaces file so networking starts, enable dropbear default 2026-03-05 14:47:21 +03:00
aa051266bb fix: replace build_bee_debug with proper apkovl mechanism for Alpine LiveCD
- genapkovl-bee_debug.sh: creates apkovl tarball with overlay files,
  /etc/apk/world package list, runlevel symlinks, dropbear config
- mkimg.bee_debug.sh: set hostname/apkovl, remove invalid build_bee_debug
2026-03-05 14:21:45 +03:00