Commit Graph

111 Commits

Author SHA1 Message Date
Mikhail Chusavitin
3d2ae4cdcb fix(iso): use Ubuntu jammy codename for AMD ROCm repo — Debian not supported
AMD does not publish Debian Bookworm packages at all (only focal/jammy/noble).
Switch ROCM_UBUNTU_DIST to "jammy"; jammy packages install cleanly on
Debian 12 due to compatible glibc. Also expand candidate list to include
point-releases (6.3.4, 6.3.3, …) so we pick the latest actually-published one.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 10:08:58 +03:00
Mikhail Chusavitin
58510207fa fix(iso): fall back through ROCm 6.4→6.3→6.2 if repo Release file missing
ROCm 6.4 does not yet publish a Release file for Debian Bookworm, causing
the live-build chroot hook to fail with "does not have a Release file".

Try each version in ROCM_CANDIDATES order; skip to the next if apt-get update
fails (repo unavailable). Exit gracefully if none are available.
Also rename inner 'candidate' variable to 'smi_path' to avoid collision.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 09:52:17 +03:00
Mikhail Chusavitin
9a1df9b1ba Tighten support bundles and fix AMD runtime checks 2026-03-25 19:35:25 +03:00
Mikhail Chusavitin
30cf014d58 Rename NVIDIA bootloader modes 2026-03-25 19:12:26 +03:00
Mikhail Chusavitin
27d478aed6 Add bootloader choice for safe vs full NVIDIA boot 2026-03-25 19:11:15 +03:00
Mikhail Chusavitin
d36e8442a9 Stabilize live ISO consoles and NVIDIA boot path 2026-03-25 19:05:18 +03:00
Mikhail Chusavitin
b345b0d14d Derive ISO version from git tags 2026-03-25 18:40:48 +03:00
Mikhail Chusavitin
0a1ac2ab9f Bootstrap ROCm hook prerequisites in ISO build 2026-03-25 18:38:19 +03:00
Mikhail Chusavitin
adcc147b32 feat(iso): add AMD Instinct MI250X/MI250 driver support
- firmware-amd-graphics: Aldebaran firmware blobs (fixes amdgpu IB ring
  test errors on MI250/MI250X at boot)
- 9001-amd-rocm.hook.chroot: adds AMD ROCm 6.4 apt repo and installs
  rocm-smi-lib for GPU monitoring (analogous to nvidia-smi)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 15:42:10 +03:00
Mikhail Chusavitin
03c36f6cb2 fix(iso): add stress-ng to package list for CPU SAT
stress-ng was missing from the LiveCD — CPU acceptance test exited
immediately with rc=1 because the binary was not found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 13:50:30 +03:00
Mikhail Chusavitin
a221814797 fix(tui): fix GPU panel row showing AMD chipset devices, clear screen before TUI
isGPUDevice matched all AMD vendor PCIe devices (SATA, crypto coprocessors,
PCIe dummies) because of a broad strings.Contains(vendor,"amd") check.
Remove it — AMD Instinct/Radeon GPUs are caught by ProcessingAccelerator /
DisplayController class. Also exclude ASPEED (BMC VGA adapter).

Add clear before bee-tui to avoid dirty terminal output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 13:49:09 +03:00
Mikhail Chusavitin
b6619d5ccc fix(iso): skip NVIDIA module load when no NVIDIA GPU present
Check PCI vendor 10de before attempting insmod — avoids spurious
nvidia_uvm symbol errors on systems without NVIDIA hardware (e.g. AMD MI350).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 13:38:31 +03:00
Mikhail Chusavitin
450193b063 feat(iso): remove splash.png, show EASY-BEE ASCII art in GRUB text mode
The graphical splash had "BEE / HARDWARE AUDIT" baked into the PNG,
overriding the echo ASCII art. Replace with a plain black background
so the EASY-BEE block-char banner from grub.cfg echo commands is visible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 13:32:23 +03:00
Mikhail Chusavitin
ee8931f171 fix(iso): pin ISO kernel to same ABI as compiled NVIDIA modules
Export detected DEBIAN_KERNEL_ABI as BEE_KERNEL_ABI from build.sh so
auto/config can pin linux-packages to the exact versioned package
(e.g. linux-image-6.1.0-31 + flavour amd64 = linux-image-6.1.0-31-amd64).
This prevents nvidia.ko vermagic mismatch if the linux-image-amd64
meta-package is updated between build start and lb build chroot step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 12:26:59 +03:00
Mikhail Chusavitin
b771d95894 fix(iso): fix linux-packages to "linux-image" so lb appends flavour correctly
live-build constructs the kernel package as <linux-packages>-<linux-flavours>,
so "linux-image-amd64" + "amd64" = "linux-image-amd64-amd64" (not found).
The correct value is "linux-image" + "amd64" = "linux-image-amd64".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:45:41 +03:00
Mikhail Chusavitin
8e60e474dc feat(iso): rebrand to EASY-BEE with ASCII art banner
Replace "Bee Hardware Audit" branding with EASY-BEE across bootloader
and LiveCD: grub.cfg menu entries, echo ASCII art before menu,
motd banner, iso-volume and iso-application metadata.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:45:12 +03:00
Mikhail Chusavitin
2f4ec2acda fix(iso): auto-detect and install kernel headers at build time
- Dockerfile: linux-headers-amd64 meta-package instead of pinned ABI;
  remove DEBIAN_KERNEL_ABI build-arg (no longer needed at image build time)
- build-in-container.sh: drop --build-arg DEBIAN_KERNEL_ABI
- build.sh: apt-get update + detect ABI from apt-cache at build time;
  auto-install linux-headers-<ABI> if kernel changed since image build

Image rebuild is now needed only when changing Go version or lb tools,
not on every Debian kernel point release.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:25:29 +03:00
Mikhail Chusavitin
7ed5cb0306 fix(iso): auto-detect kernel ABI at build time instead of pinning
DEBIAN_KERNEL_ABI=auto in VERSIONS — build.sh queries
apt-cache depends linux-image-amd64 to find the current ABI.
lb config now uses linux-image-amd64 meta-package.

This prevents build failures when Debian drops old kernel packages
from the repo (happens with every point release).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:17:29 +03:00
Mikhail Chusavitin
6df7ac68f5 fix(iso): bump kernel ABI to 6.1.0-44 (6.1.164-1 in bookworm)
6.1.0-43 is no longer available in Debian repos.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:16:09 +03:00
Mikhail Chusavitin
0ce23aea4f feat(iso): add exfatprogs and ntfs-3g for USB export support
exFAT is the default filesystem on USB drives >32GB sold today.
Without exfatprogs, mount fails silently and export to such drives is broken.
ntfs-3g covers Windows-formatted drives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:12:51 +03:00
Mikhail Chusavitin
2abe2ce3aa fix(iso): fix NCCL version to 2.28.9+cuda13.0, add sha256 verification
NVIDIA's CUDA repo for Debian 12 only has NCCL packages for cuda13.x,
not cuda12.x. Update to the latest available: 2.28.9-1+cuda13.0.
Also pass sha256 from VERSIONS into build-nccl.sh for integrity check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 12:04:03 +03:00
Mikhail Chusavitin
8233c9ee85 feat(iso): add NCCL 2.26.2 to LiveCD
Download libnccl2 .deb from NVIDIA's CUDA apt repo (Debian 12) during ISO
build, extract libnccl.so.* into the overlay at /usr/lib/ alongside
libnvidia-ml and libcuda. Version pinned in VERSIONS, reflected in
/etc/bee-release.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 09:51:28 +03:00
Mikhail Chusavitin
13189e2683 fix(iso): pet hardware watchdog via systemd RuntimeWatchdogSec=30s
Without a keepalive the kernel watchdog timer expires and reboots
the host mid-audit. Configuring RuntimeWatchdogSec lets systemd PID 1
reset /dev/watchdog every 30 s — well within the typical 60 s timeout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 23:56:42 +03:00
Mikhail Chusavitin
76a17937f3 feat(tui): NVIDIA SAT with nvtop, GPU selection, metrics and chart — v1.0.0
- TUI: duration presets (10m/1h/8h/24h), GPU multi-select checkboxes
- nvtop launched concurrently with SAT via tea.ExecProcess; can reopen or abort
- GPU metrics collected per-second during bee-gpu-stress (temp/usage/power/clock)
- Outputs: gpu-metrics.csv, gpu-metrics.html (offline SVG), gpu-metrics-term.txt
- Terminal chart: asciigraph-style line chart with box-drawing chars and ANSI colours
- AUDIT_VERSION bumped 0.1.1 → 1.0.0; nvtop added to ISO package list
- runtime-flows.md updated with full NVIDIA SAT TUI flow documentation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 15:18:57 +03:00
Mikhail Chusavitin
b25a2f6d30 feat: add support bundle and raw audit export 2026-03-16 18:20:26 +03:00
Mikhail Chusavitin
d18cde19c1 Drop legacy non-container builders 2026-03-16 00:23:55 +03:00
Mikhail Chusavitin
72cf482ad3 Embed Reanimator Chart web viewer 2026-03-15 22:07:42 +03:00
Mikhail Chusavitin
ab5a4be7ac Align hardware export with ingest contract 2026-03-15 21:04:53 +03:00
Mikhail Chusavitin
b483e2ce35 Add health verdicts and acceptance tests 2026-03-14 17:53:58 +03:00
Mikhail Chusavitin
591164a251 Rename ISO volume to BEE 2026-03-14 14:58:49 +03:00
Mikhail Chusavitin
ef4ec5695d Remove broken TUI log redirection 2026-03-14 14:57:31 +03:00
Mikhail Chusavitin
f1e096cabe Keep live TUI logs off the console 2026-03-14 14:51:25 +03:00
Mikhail Chusavitin
6082c7953e Add console tools and bee menu startup 2026-03-14 08:36:38 +03:00
Mikhail Chusavitin
f37ef0d844 Run live TUI as root via sudo 2026-03-14 08:34:23 +03:00
Mikhail Chusavitin
e32fa6e477 Use live-config autologin for bee user 2026-03-14 08:33:36 +03:00
Mikhail Chusavitin
20118bb400 Fix tty1 autologin override order 2026-03-14 08:17:23 +03:00
Mikhail Chusavitin
55d6876297 Avoid tty1 black screen on live boot 2026-03-14 08:14:49 +03:00
Mikhail Chusavitin
e8e176ab7f Add zstd to live image packages 2026-03-14 08:04:18 +03:00
Mikhail Chusavitin
caeafa836b Improve VM boot diagnostics and guest support 2026-03-14 07:51:16 +03:00
Mikhail Chusavitin
e8a52562e7 Persist builder caches outside container 2026-03-14 07:40:32 +03:00
Mikhail Chusavitin
6aca1682b9 Refactor bee CLI and LiveCD integration 2026-03-13 16:52:16 +03:00
Mikhail Chusavitin
b7c888edb1 fix: getty autologin root, inject GSP firmware for H100, bump 0.1.1 2026-03-08 22:12:02 +03:00
Mikhail Chusavitin
17d5d74a8d fix: nomodeset + remove splash (framebuffer hangs on headless H100 server) 2026-03-08 21:39:31 +03:00
Mikhail Chusavitin
441ab3adbd fix: blacklist nouveau driver (hangs on H100 unknown chipset) 2026-03-08 20:51:49 +03:00
Mikhail Chusavitin
c91c8d8cf9 feat: bee-themed grub splash (amber/black honeycomb) with progress bar 2026-03-08 20:44:19 +03:00
Mikhail Chusavitin
83e1910281 feat: custom grub bootloader - bee branding, 5s auto-boot, no splash 2026-03-08 20:35:23 +03:00
Mikhail Chusavitin
2252c5af56 fix: use isc-dhcp-client for dhclient, remove standalone lsblk (in util-linux) 2026-03-08 19:43:59 +03:00
Mikhail Chusavitin
7a4d75c143 fix: remove unsupported --hostname/--username from lb config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 19:28:01 +03:00
Mikhail Chusavitin
7c62d100d4 fix: use SYSSRC=common SYSOUT=amd64 for NVIDIA build on Debian split headers
Debian 12 splits kernel headers into two packages:
  linux-headers-<kver>        (arch-specific: generated/, config/)
  linux-headers-<kver>-common (source headers: linux/, asm-generic/, etc.)

NVIDIA conftest.sh builds include paths as HEADERS=$SOURCES/include.
When SYSSRC=amd64, HEADERS=amd64/include/ which is nearly empty —
conftest can't compile any kernel header tests, all compile-tests fail
silently, and NVIDIA assumes all kernel APIs are present. This causes
link errors for APIs added in kernel 6.3+ (vm_flags_set, vm_flags_clear)
and removed APIs (phys_to_dma, dma_is_direct, get_dma_ops).

Fix: pass SYSSRC=common (real headers) and SYSOUT=amd64 (generated headers).
NVIDIA Makefile maps SYSSRC→NV_KERNEL_SOURCES, SYSOUT→NV_KERNEL_OUTPUT,
and runs 'make -C common KBUILD_OUTPUT=amd64'. Conftest then correctly
detects which APIs are present in kernel 6.1 and uses proper wrappers.

Tested: 5 .ko files built successfully on Debian 12 kernel 6.1.0-43-amd64.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 19:23:47 +03:00
Mikhail Chusavitin
c843ff95a2 fix: add -Wno-error to CFLAGS_MODULE for NVIDIA kernel 6.1 compat
get_dma_ops() return type changed in kernel 6.1 — GCC treats int-conversion
warning as error. Suppress with -Wno-error to allow build to complete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 18:55:25 +03:00