Commit Graph

6 Commits

Author SHA1 Message Date
Mikhail Chusavitin
1e98428be8 Add nvidia-bug-report.sh to ISO and fix GPU diagnostic pack in bee-tui
- build-nvidia-module.sh: extract nvidia-bug-report.sh from .run installer
- build.sh: copy nvidia-bug-report.sh into overlay/usr/local/bin/
- bee-tui: pass --output directly to nvidia-bug-report.sh so log goes
  into the run_dir archive instead of CWD; remove redundant cp step

GPU diagnostic pack in TUI (System acceptance tests → GPU NVIDIA → Run command pack):
  nvidia-smi -q, dmidecode -t baseboard, dmidecode -t system, nvidia-bug-report.sh
All logs archived to /var/log/bee-sat/gpu-nvidia-<ts>.tar.gz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 09:48:27 +03:00
Mikhail Chusavitin
1768bb58dd Merge debug/prod into single ISO build, fix NVIDIA module loading
## ISO build consolidation
- Remove separate debug/prod split: overlay-debug/, build-debug.sh,
  mkimg.bee_debug.sh, genapkovl-bee_debug.sh all deleted
- Single overlay: iso/overlay/ (was overlay-debug content)
- Single build script: build.sh (SSH, TUI, NVIDIA, vendor tools, bee-release)
- Single mkimage profile: bee (with dropbear, dialog, strace, gcompat, etc.)

## NVIDIA fixes
- Modules now stored at /usr/local/lib/nvidia/ instead of
  /lib/modules/<kver>/extra/nvidia/ — modloop squashfs mounts over that
  path at boot making overlay content there inaccessible
- bee-nvidia init: load via insmod (absolute path), not modprobe
- bee-nvidia init: create libnvidia-ml.so.1/libcuda.so.1 symlinks in /usr/lib/
- build-nvidia-module.sh: always install linux-lts-dev (not conditional) —
  stale 6.6.x headers caused wrong-kernel modules that never loaded at runtime
- build-nvidia-module.sh: create soname symlinks in cache
- KERNEL_VERSION in VERSIONS updated 6.6 → 6.12
- gcompat added to ISO packages (nvidia-smi is a glibc binary on musl Alpine)

## Service ordering
- bee-audit: add `after bee-nvidia` so NVIDIA enrichment always succeeds

## New tooling
- iso/builder/smoketest.sh: SSH smoke test for post-boot ISO validation
- iso/builder/build-gpu-burn.sh: builds gpu_burn vendor binary (CUDA 12.8+)
- vendor/gpu_burn included automatically if placed in iso/vendor/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 20:14:18 +03:00
Mikhail Chusavitin
f84ec9320c Fix NVIDIA module version selection and add load diagnostics 2026-03-06 17:30:41 +03:00
559fc2961d fix: update NVIDIA to 590.48.01, add sha256 verification for installer
- 550.54.15 did not exist on NVIDIA CDN (404)
- updated to 590.48.01 (latest stable, 396MB)
- download sha256sum file first, verify installer before extracting
- re-download if file is missing, empty, or sha256 mismatch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:10:31 +03:00
d4a2d7fa55 fix: use proprietary NVIDIA .run installer instead of open kernel modules
Builds kernel modules from the official NVIDIA installer source tree,
same as a standard NVIDIA driver install. No open-gpu-kernel-modules.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:05:57 +03:00
ec9c65e20e feat: build NVIDIA open kernel modules during ISO build
- build-nvidia-module.sh: downloads nvidia open-gpu-kernel-modules source,
  builds against linux-lts headers, extracts nvidia-smi from .run installer
- modules cached by driver version + kernel version (rebuild only on update)
- .ko files injected into ISO overlay at /lib/modules/<kver>/extra/nvidia/
- bee-nvidia init script loads nvidia/nvidia-modeset/nvidia-uvm at boot
- NVIDIA_DRIVER_VERSION=550.54.15 (Turing+, H100/A100 supported)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 18:01:11 +03:00