The NVIDIA CUDA HTTPS apt source (developer.download.nvidia.com) may be
unreachable from inside the live-build container chroot, causing
'E: Unable to locate package datacenter-gpu-manager-4-cuda13'.
Add build-dcgm.sh that downloads DCGM and nvidia-fabricmanager .deb
packages on the build host (verifying SHA256 against Packages.gz) and
caches them in BEE_CACHE_DIR. build.sh (step 25-dcgm, nvidia only)
copies them into LB_DIR/config/packages.chroot/ before lb build, so
live-build creates a local apt repo from them. The chroot installs the
packages from the local repo without ever contacting the NVIDIA CUDA
HTTPS source.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lb binary_grub-efi and lb binary_syslinux create these files from templates
that already have memtest entries hardcoded. The hook should not fail when
the files don't exist yet — validate_iso_memtest() checks the final ISO.
Only the binary files (x64.bin, x64.efi) are required here.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ver_arg was set to "=memtest86+=VERSION" making the command
"apt-get download memtest86+=memtest86+=VERSION" (invalid).
Fixed to build pkg_spec directly as "memtest86+=VERSION".
Also add apt-get update retry if initial download fails.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Disabling --security broke the build because linux-image-6.1.0-44-amd64
is a security update not present in the base bookworm repo.
Main packages already come from mirror.mephi.ru.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Switch all lb mirrors to mirror.mephi.ru/debian/ for faster/reliable downloads
- Disable security repo (--security false) — not needed for LiveCD
- Pin MEMTEST_VERSION=6.10-4 in VERSIONS, export to hook environment
- Set BEE_REQUIRE_MEMTEST=1 in build-in-container.sh — missing memtest is now fatal
- Fix 9100-memtest.hook.binary: add apt-get download fallback when lb
binary_memtest has already purged the package cache; handle both 5.x
(memtest86+x64.bin) and 6.x (memtest86+.bin) BIOS binary naming
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mkdir -p LOG_DIR before writing the optional step log so that a race
with cleanup_build_log (EXIT trap archiving the log dir) does not cause
a "Directory nonexistent" error during lb binary_checksums / lb binary_iso.
Also downgrade apt-get update failure to a warning so a transient mirror
outage does not block kernel ABI auto-detection when the apt cache is warm.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- grub.cfg + isolinux/live.cfg.in: add pcie_aspm=off,
intel_idle.max_cstate=1 and processor.max_cstate=1 to all
non-failsafe boot entries
- bee-hpc-tuning: new script that sets all CPU cores to performance
governor via sysfs and logs THP state at boot
- bee-hpc-tuning.service: runs before bee-nvidia and bee-audit
- 9000-bee-setup.hook.chroot: enable service and mark script executable
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bootloader: GRUB fallback text colors → yellow/brown (amber tone)
- CLI charts: all GPU metric series use single amber color (xterm-256 #214)
- Wallpaper: logo width scaled to 400 px dynamically, shadow scales with font size
- Support bundle: renamed to YYYY-MM-DD (BEE-SP vX.X) SRV_MODEL SRV_SN ToD.tar.gz
using dmidecode for server model (spaces→underscores) and serial number
- Remove display resolution feature (UI card, API routes, handlers, tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
HPL 2.3 from netlib compiled against OpenBLAS with a minimal
single-process MPI stub — no MPI package required in the ISO.
Matrix size is auto-sized to 80% of total RAM at runtime.
Build:
- VERSIONS: HPL_VERSION=2.3, HPL_SHA256=32c5c17d…
- build-hpl.sh: downloads HPL + OpenBLAS from Debian 12 repo,
compiles xhpl with a self-contained mpi_stub.c
- build.sh: step 80-hpl, injects xhpl + libopenblas into overlay
Runtime:
- bee-hpl: generates HPL.dat (N auto from /proc/meminfo, NB=256,
P=1 Q=1), runs xhpl, prints standard WR... Gflops output
- platform/hpl.go: RunHPL(), parses WR line → GFlops + PASSED/FAILED
- tasks.go: target "hpl"
- pages.go: LINPACK (HPL) card in validate/stress grid (stress-only)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bee-gpu-stress.c: remove per-wave cuCtxSynchronize barrier in both
cuBLASLt and PTX hot loops; sync at most once/sec so the GPU queue
stays continuously full — eliminates the CPU↔GPU ping-pong that
prevented reaching full TDP
- sat_fan_stress.go: default SizeMB 0 (auto = 95% VRAM) instead of
hardcoded 64 MB; tiny matrices caused <0.1 ms kernels where CPU
re-queue overhead dominated
- pages.go: move nvidia-targeted-power and nvidia-pulse from Burn →
Validate stress section alongside nvidia-targeted-stress; these are
DCGM pass/fail diagnostics, not sustained burn loads; remove the
Power Delivery / Power Budget card from Burn entirely
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Sample server power (IPMI dcmi) during baseline+steady phases in parallel;
compute delta vs GPU-reported sum; flag ratio < 0.75 as unreliable reporting
- Collect base_graphics_clock_mhz, multiprocessor_count, default_power_limit_w
from nvidia-smi alongside existing GPU info
- Add tops_per_sm_per_ghz efficiency metric (model-agnostic silicon quality signal)
- Flag when enforced power limit is below default TDP by >5%
- Add fp64 profile to bee-gpu-burn worker (CUDA_R_64F, CUBLAS_COMPUTE_64F, min cc 8.0)
- Improve Executive Summary: overall pass count, FAILED GPU finding
- Throttle counters now shown as % of steady window instead of raw microseconds
- bible-local: clock calibration research, H100/H200 spec, real-world GEMM baselines
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
losetup --replace --direct-io=on fails with EINVAL when the target file
is on tmpfs (/dev/shm), because tmpfs does not support O_DIRECT.
Strip the --direct-io flag from the replace call and downgrade the
verification failure to a warning so boot continues.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Servers with NVIDIA compute GPUs (H100 etc.) have no display output,
so KMS blanks the console. nomodeset disables kernel modesetting and
lets the NVIDIA proprietary driver handle display via Xorg.
KMS variant moved to advanced submenu for cases where it is needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bee-nvidia-load: run insmod in background, poll /proc/devices for
nvidiactl; if GSP init doesn't complete in 90s, kill insmod and retry
with NVreg_EnableGpuFirmware=0. Handles EBUSY case with clear error.
- Write /run/bee-nvidia-mode (gsp-on/gsp-off/gsp-stuck) for audit layer
- Show GSP mode badge in sidebar: yellow for gsp-off, red for gsp-stuck
- Report NvidiaGSPMode in RuntimeHealth with issue entries
- Simplify GRUB menu: default (KMS+GSP), advanced submenu (GSP=off,
nomodeset, fail-safe), remove load-to-RAM entry
- Add pcmanfm, ristretto, mupdf, mousepad to desktop packages
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add feh and python3-pil to package list
- Add chroot hook that generates /usr/share/bee/wallpaper.png using PIL:
black background, EASY-BEE box-drawing logo in amber (#f6c90e),
"Hardware Audit LiveCD" subtitle in dim amber — matches motd exactly
- bee-openbox-session: set wallpaper with feh --bg-fill, fall back to
xsetroot -solid black if wallpaper not found
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Set xsetroot solid background (#12100a, dark amber) so openbox
doesn't show bare black before Chromium opens
- Re-add healthz wait loop before launching Chromium: without it
Chromium opens localhost/loading before bee-web is up and gets
connection-refused which renders as a blank white page
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TTYReset=yes and TTYVHangup=yes so systemd restores the terminal
to a clean state before handing tty1 to getty. Without this the screen
went black with no cursor after the status display finished.
Also remove DefaultDependencies=no which was too aggressive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Command substitution in sh strips trailing newlines, so accumulating
output in a variable via $(...) lost all line breaks. Reverted to
direct printf calls which work correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Refresh every 3s instead of 1s to reduce flicker
- Show ssh, bee-sshsetup in service list
- Show failure reason for failed services
- Show last journal line for activating services
- Show IP addresses and web UI URL when network is up
- Render frame to variable before printing to reduce flicker
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
loglevel=6 floods the screen with mpt3sas/scsi/sd informational
messages, hiding systemd service status and bee-boot-status display.
loglevel=3 shows only kernel errors; all messages still go to serial.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add bee-boot-status service: shows live service status on tty1 with
ASCII logo before getty, exits when all bee services settle
- Remove lightdm dependency on bee-preflight so GUI starts immediately
without waiting for NVIDIA driver load
- Replace Chromium blank-page problem with /loading spinner page that
polls /api/services and auto-redirects when services are ready; add
"Open app now" override button; use fresh --user-data-dir=/tmp/bee-chrome
- Unify branding: add "Hardware Audit LiveCD" subtitle to GRUB menu,
bee-boot-status (with yellow ASCII logo), and web spinner
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>