Commit Graph

16 Commits

Author SHA1 Message Date
Mikhail Chusavitin
b7c888edb1 fix: getty autologin root, inject GSP firmware for H100, bump 0.1.1 2026-03-08 22:12:02 +03:00
Mikhail Chusavitin
7c62d100d4 fix: use SYSSRC=common SYSOUT=amd64 for NVIDIA build on Debian split headers
Debian 12 splits kernel headers into two packages:
  linux-headers-<kver>        (arch-specific: generated/, config/)
  linux-headers-<kver>-common (source headers: linux/, asm-generic/, etc.)

NVIDIA conftest.sh builds include paths as HEADERS=$SOURCES/include.
When SYSSRC=amd64, HEADERS=amd64/include/ which is nearly empty —
conftest can't compile any kernel header tests, all compile-tests fail
silently, and NVIDIA assumes all kernel APIs are present. This causes
link errors for APIs added in kernel 6.3+ (vm_flags_set, vm_flags_clear)
and removed APIs (phys_to_dma, dma_is_direct, get_dma_ops).

Fix: pass SYSSRC=common (real headers) and SYSOUT=amd64 (generated headers).
NVIDIA Makefile maps SYSSRC→NV_KERNEL_SOURCES, SYSOUT→NV_KERNEL_OUTPUT,
and runs 'make -C common KBUILD_OUTPUT=amd64'. Conftest then correctly
detects which APIs are present in kernel 6.1 and uses proper wrappers.

Tested: 5 .ko files built successfully on Debian 12 kernel 6.1.0-43-amd64.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 19:23:47 +03:00
Mikhail Chusavitin
c843ff95a2 fix: add -Wno-error to CFLAGS_MODULE for NVIDIA kernel 6.1 compat
get_dma_ops() return type changed in kernel 6.1 — GCC treats int-conversion
warning as error. Suppress with -Wno-error to allow build to complete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 18:55:25 +03:00
Mikhail Chusavitin
0057686769 fix: pass GCC include dir to NVIDIA make to resolve stdarg.h not found
Debian kernel build uses -nostdinc which strips GCC's own includes.
NVIDIA's nv_stdarg.h needs <stdarg.h> from GCC.
Pass -I$(gcc --print-file-name=include) via CFLAGS_MODULE.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 18:53:37 +03:00
Mikhail Chusavitin
345a93512a migrate ISO build from Alpine to Debian 12 (Bookworm)
Replace the entire live CD build pipeline:
- Alpine SDK + mkimage + genapkovl → Debian live-build (lb config/build)
- OpenRC init scripts → systemd service units
- dropbear → openssh-server (native to Debian live)
- udhcpc → dhclient for DHCP
- apk → apt-get in setup-builder.sh and build-nvidia-module.sh
- Add auto/config (lb config options) and auto/build wrapper
- Add config/package-lists/bee.list.chroot replacing Alpine apks
- Add config/hooks/normal/9000-bee-setup.hook.chroot to enable services
- Add bee-nvidia-load and bee-sshsetup helper scripts
- Keep NVIDIA pre-compile pipeline (Option B): compile on builder VM against
  pinned Debian kernel headers (DEBIAN_KERNEL_ABI), inject .ko into includes.chroot
- Fixes: native glibc (no gcompat shims), proper udev, writable /lib/modules,
  no Alpine modloop read-only constraint, no stale apk cache issues

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 18:01:38 +03:00
Mikhail Chusavitin
d952e10dbb fix: fail loudly on missing NVIDIA libs and .ko, improve mknod logging
build-nvidia-module.sh:
- Replace silent glob cp for libnvidia-ml/libcuda with find + explicit error
  if library not found in extract dir (catches installer layout changes)
- Fix circular symlink bug: don't create .so.1 -> .so.1 if versioned file
  is already named .so.1
- Verify .ko count > 0 after build, fail loudly if none produced
- Show lib cache in final summary

bee-nvidia:
- mknod failures are now logged with ewarn instead of silently suppressed
- If nvidia not in /proc/devices (no GPU hardware), log clearly and exit clean

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 17:07:47 +03:00
Mikhail Chusavitin
98f14b21c1 fix: remove kernel version pin — dynamic detection prevents KVER mismatch
The static KERNEL_PKG_VERSION pin was the root cause of nvidia-smi never
working: modules were compiled for pinned version (e.g. 6.12.76-r0) but
the ISO kernel was unpinned (latest from repo at build time). When Alpine
updated linux-lts, the two diverged silently.

Fix: both steps now use whatever linux-lts is current in Alpine 3.21 main
at build time. build-nvidia-module.sh uses `apk add --update linux-lts-dev`
(no version pin), mkimage gets the same package from the same mirror.
Module cache is still keyed by detected KVER so rebuilds remain fast.

Removed: KERNEL_VERSION, KERNEL_PKG_VERSION from VERSIONS, all pin references
from build.sh and build-nvidia-module.sh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 12:11:05 +03:00
Mikhail Chusavitin
18f377987f fix: audit pipeline correctness after full review
- bee-audit init.d: use --output file: so "audit output written" is logged
  (stdout mode silently redirects, never emits the slog confirmation)
- build-nvidia-module.sh: use $KERNEL_SRC in find for .ko collection
  (was hardcoded $EXTRACT_DIR/kernel, silent failure if path differs)
- smoketest: add bee-audit to required services (was never checked)
- smoketest: remove legacy bee-audit-debug from service list
- smoketest: internet ping → warn (live CD runs in isolated network, no internet)
- build.sh: auto-copy smoketest.sh → overlay/usr/local/bin/bee-smoketest
  (removes manual sync hazard; smoketest.sh is now single source of truth)
- remove static overlay/usr/local/bin/bee-smoketest (generated by build.sh now)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 12:06:25 +03:00
Mikhail Chusavitin
1feb956e30 Fix: use dl-cdn.alpinelinux.org everywhere for consistent package resolution
Both build-nvidia-module.sh (apk add) and mkimage.sh (--repository) now
explicitly use dl-cdn. Local builder mirror config is ignored.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 11:28:14 +03:00
Mikhail Chusavitin
ffc7e5c71a Fix critical ISO build bugs: kernel pinning, service registration, PATH, audit checks
- Pin linux-lts to exact KERNEL_PKG_VERSION=6.12.76-r0 in build and ISO package list
- Add build-time verification that compiled kernel version matches pin (fails loudly)
- Fix bee-audit-debug → bee-audit in genapkovl OpenRC registration (service was never starting)
- Add AUDIT_VERSION=0.1.0 to VERSIONS (was undefined, bee-release had empty fields)
- Pin linux-lts-dev version in second apk add in build-nvidia-module.sh
- Add /root/.profile to overlay so /usr/local/bin is in PATH for SSH sessions
- Remove "DEBUG MODE" from motd
- Fix smoketest: grep for slog "audit output written" instead of non-existent "audit completed"
- Document no-internet constraint in system-overview and runtime-flows
- Remove redundant genapkovl copy to /var/tmp (now found via ~/.mkimage/)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 10:52:54 +03:00
Mikhail Chusavitin
1e98428be8 Add nvidia-bug-report.sh to ISO and fix GPU diagnostic pack in bee-tui
- build-nvidia-module.sh: extract nvidia-bug-report.sh from .run installer
- build.sh: copy nvidia-bug-report.sh into overlay/usr/local/bin/
- bee-tui: pass --output directly to nvidia-bug-report.sh so log goes
  into the run_dir archive instead of CWD; remove redundant cp step

GPU diagnostic pack in TUI (System acceptance tests → GPU NVIDIA → Run command pack):
  nvidia-smi -q, dmidecode -t baseboard, dmidecode -t system, nvidia-bug-report.sh
All logs archived to /var/log/bee-sat/gpu-nvidia-<ts>.tar.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 09:48:27 +03:00
Mikhail Chusavitin
1768bb58dd Merge debug/prod into single ISO build, fix NVIDIA module loading
## ISO build consolidation
- Remove separate debug/prod split: overlay-debug/, build-debug.sh,
  mkimg.bee_debug.sh, genapkovl-bee_debug.sh all deleted
- Single overlay: iso/overlay/ (was overlay-debug content)
- Single build script: build.sh (SSH, TUI, NVIDIA, vendor tools, bee-release)
- Single mkimage profile: bee (with dropbear, dialog, strace, gcompat, etc.)

## NVIDIA fixes
- Modules now stored at /usr/local/lib/nvidia/ instead of
  /lib/modules/<kver>/extra/nvidia/ — modloop squashfs mounts over that
  path at boot making overlay content there inaccessible
- bee-nvidia init: load via insmod (absolute path), not modprobe
- bee-nvidia init: create libnvidia-ml.so.1/libcuda.so.1 symlinks in /usr/lib/
- build-nvidia-module.sh: always install linux-lts-dev (not conditional) —
  stale 6.6.x headers caused wrong-kernel modules that never loaded at runtime
- build-nvidia-module.sh: create soname symlinks in cache
- KERNEL_VERSION in VERSIONS updated 6.6 → 6.12
- gcompat added to ISO packages (nvidia-smi is a glibc binary on musl Alpine)

## Service ordering
- bee-audit: add `after bee-nvidia` so NVIDIA enrichment always succeeds

## New tooling
- iso/builder/smoketest.sh: SSH smoke test for post-boot ISO validation
- iso/builder/build-gpu-burn.sh: builds gpu_burn vendor binary (CUDA 12.8+)
- vendor/gpu_burn included automatically if placed in iso/vendor/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 20:14:18 +03:00
Mikhail Chusavitin
f84ec9320c Fix NVIDIA module version selection and add load diagnostics 2026-03-06 17:30:41 +03:00
559fc2961d fix: update NVIDIA to 590.48.01, add sha256 verification for installer
- 550.54.15 did not exist on NVIDIA CDN (404)
- updated to 590.48.01 (latest stable, 396MB)
- download sha256sum file first, verify installer before extracting
- re-download if file is missing, empty, or sha256 mismatch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:10:31 +03:00
d4a2d7fa55 fix: use proprietary NVIDIA .run installer instead of open kernel modules
Builds kernel modules from the official NVIDIA installer source tree,
same as a standard NVIDIA driver install. No open-gpu-kernel-modules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:05:57 +03:00
ec9c65e20e feat: build NVIDIA open kernel modules during ISO build
- build-nvidia-module.sh: downloads nvidia open-gpu-kernel-modules source,
  builds against linux-lts headers, extracts nvidia-smi from .run installer
- modules cached by driver version + kernel version (rebuild only on update)
- .ko files injected into ISO overlay at /lib/modules/<kver>/extra/nvidia/
- bee-nvidia init script loads nvidia/nvidia-modeset/nvidia-uvm at boot
- NVIDIA_DRIVER_VERSION=550.54.15 (Turing+, H100/A100 supported)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:01:11 +03:00