Metrics:
- Replace canvas JS charts with server-side SVG via go-analyze/charts
- Add ring buffers (120 samples) for CPU temp and power
- /api/metrics/chart/{name}.svg endpoint serves live SVG, polled every 2s
Dashboard:
- Replace custom renderViewerPage with viewer.RenderHTML() from reanimator/chart submodule
- Mount chart static assets at /chart/static/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove all emojis from sidebar nav and logo (broken on server console fonts)
- Fix canvas chart: use parentElement.getBoundingClientRect() for width,
set explicit H=120px — fixes empty charts when offsetWidth/Height is 0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace raw systemctl output in table cell with:
- state badge (active/failed/inactive) — click to expand
- full systemctl status in collapsible pre block (max 200px scroll)
Fixes layout explosion from multi-line status text in table.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
startx from user shell has /dev/fb0 permission issues and is fragile.
LightDM starts Xorg as root — standard LiveCD approach that works
on server hardware / IPMI KVM with nomodeset + fbdev/vesa.
- Add lightdm package, configure autologin as bee/openbox session
- Add /usr/share/xsessions/openbox.desktop
- Remove startx from .profile (LightDM manages X lifecycle)
- Remove Xwrapper.config needs_root_rights workaround (no longer needed)
- Enable lightdm.service in setup hook
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If X fails to start, the user gets a working shell prompt instead
of a dead session or autologin loop.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Interactive script: lists interfaces, DHCP or static IP config.
Shown as hint in tty1 welcome message.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Explicit xorg.conf.d config prevents Xorg from trying KMS/DRM
drivers that fail on server hardware with nomodeset.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
startx from autologin shell targets VT1 directly — KVM sees the
graphical UI without VT switching. Remove bee-desktop.service
(systemd-launched X defaults to VT7, invisible on KVM).
Add xserver-xorg-video-fbdev for server AST/VGA framebuffer.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
grub-pc and grub-efi-amd64 conflict with each other in Debian 12.
The -bin packages provide the same grub-install binaries without conflict.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copies the live system to a local disk via unsquashfs — no debootstrap,
no network required. Supports UEFI (GPT+EFI) and BIOS (MBR) layouts.
ISO:
- Add squashfs-tools, parted, grub-pc, grub-efi-amd64 to package list
- New overlay script bee-install: partitions, formats, unsquashfs,
writes fstab, runs grub-install+update-grub in chroot
Go TUI:
- Settings → Tools submenu (Install to disk, Check tools)
- Disk picker screen: lists non-USB, non-boot disks via lsblk
- Confirm screen warns about data loss
- Runs with live progress tail of /tmp/bee-install.log
- platform/install.go: ListInstallDisks, InstallToDisk, findLiveBootDevice
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Default boot no longer loads ISO to RAM (slow on BMC virtual media).
Separate menu entry added for toram in both GRUB and isolinux.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nv/target has no .h suffix; use -type f instead of -name '*.h' to
detect non-empty include directories.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cuda_fp16.h (included by cublas_api.h) requires <nv/target> from
the CUDA C++ Core Libraries (cuda-cccl-13-0).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ls *.h missed headers in subdirectories like crt/host_defines.h;
use find -maxdepth 2 instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cuda-crt-13-0 may not share the same version string as cuda-cudart-13-0;
pass empty version to lookup_pkg to match the first available version.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cublasLt.h -> cublas_api.h -> driver_types.h -> crt/host_defines.h
which lives in the cuda-crt-13-0 package, not cudart-dev.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NVIDIA CUDA .deb packages install headers under
/usr/local/cuda-X.Y/targets/x86_64-linux/include/ not /usr/include/,
causing copy_headers() to silently skip them.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
awk exit in the blank-line block jumps to END, which printed the
result again causing repo_sha to contain the hash twice with a newline,
breaking the sha256 string comparison.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Debian Packages.gz uses CRLF line endings; \r in the captured SHA256
field caused string comparison to fail even when hashes were identical.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Echo messages captured in stdout polluted the return value of
download_verified_pkg(), causing extract_deb() to receive a
multi-line string instead of a file path and silently exit via set -e.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
console=tty0 sent kernel messages to the active VT (tty1), overwriting
the TUI. Changed to console=tty2 so kernel logs land on a dedicated
console. tty1 is now clean; operator can press Alt+F2 to inspect kernel
messages and Alt+F3 for an extra shell.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- GPU Platform Stress Test now shows a live in-TUI chart instead of nvtop.
nvidia-smi is polled every second; up to 60 data points per GPU kept.
All three metrics (Usage %, Temp °C, Power W) drawn on a single plot,
each normalised to its own range and rendered in a different colour.
- Memory allocation changed from MemoryMB/16 to MemoryMB-512 (full VRAM
minus 512 MB driver overhead) so bee-gpu-stress actually stresses memory.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
loglevel=3 was hiding all kernel messages on tty0/ttyS0 except errors.
Machine crashes (panics, driver oops, module failures) were silent on VGA.
Restored loglevel=7 so kernel messages up to debug are printed to both
tty0 (VGA) and ttyS0 (SOL). Journald MaxLevelConsole reduced to info
(was debug) to reduce noise on SOL while keeping it useful.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the ISO file was named after git describe --match 'audit/v*',
so a new iso/ tag produced names like v1.0.9-1-gXXXXXXX instead of v1.0.17.
Now build.sh has resolve_iso_version() that looks at iso/v* tags separately.
The bee binary inside the ISO still uses AUDIT_VERSION_EFFECTIVE.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- build-nvidia-module.sh: copy libnvidia-ptxjitcompiler.so.* alongside
libcuda/libnvidia-ml — required by cuModuleLoadDataEx for PTX JIT.
Without it: CUDA_ERROR_JIT_COMPILER_NOT_FOUND at runtime.
Cache check updated to force rebuild when ptxjitcompiler is missing.
- bee-nvidia-load: run ldconfig after module load so that NVIDIA/NCCL
libs injected into /usr/lib/ are visible to dlopen() callers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nvtop is no longer shown during NVIDIA SAT runs.
[o] Open nvtop shortcut also removed from the running screen.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Commit d36e844 dropped console=tty0 and added dual-serial + debug logging.
Without console=tty0 the kernel never initialises the VGA console,
leaving the physical screen permanently blank.
- Restore console=tty0 (VGA) as primary, keep console=ttyS0 for SOL
- Drop console=ttyS1 (redundant second serial port)
- Replace loglevel=7 + journald debug flood with loglevel=3 (errors only)
so kernel messages don't overwrite the TUI on the local screen
- Remove systemd.log_target/forward_to_console debug params
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AMD does not publish Debian Bookworm packages at all (only focal/jammy/noble).
Switch ROCM_UBUNTU_DIST to "jammy"; jammy packages install cleanly on
Debian 12 due to compatible glibc. Also expand candidate list to include
point-releases (6.3.4, 6.3.3, …) so we pick the latest actually-published one.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Apply the same pattern as NVIDIA SAT: launch nvtop via tea.ExecProcess
so it occupies the full terminal as a live GPU chart (temp, power, fan,
utilisation lines) while the stress test runs in the background.
- Add screenGPUStressRunning screen + dedicated running/render handlers
- startGPUStressTest: tea.Batch(stress goroutine, tea.ExecProcess(nvtop))
- [o] reopen nvtop at any time; [a] abort (cancels context)
- Graceful degradation: test still runs if nvtop is not on PATH
- gpuStressDoneMsg routes result to screenOutput on completion
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ROCm 6.4 does not yet publish a Release file for Debian Bookworm, causing
the live-build chroot hook to fail with "does not have a Release file".
Try each version in ROCM_CANDIDATES order; skip to the next if apt-get update
fails (repo unavailable). Exit gracefully if none are available.
Also rename inner 'candidate' variable to 'smi_path' to avoid collision.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two-phase GPU thermal cycling test with per-second telemetry:
- Phases: baseline → load1 → pause (no cooldown) → load2 → cooldown
- Monitors: fan RPM (ipmitool sdr), CPU/server temps (ipmitool/sensors),
system power (ipmitool dcmi), GPU temp/power/usage/clock/throttle (nvidia-smi)
- Detects throttling via clocks_throttle_reasons.active bitmask
- Measures fan response lag from load start (validates case-04 ~2s lag)
- Exports metrics.csv (wide format, one row/sec) and fan-sensors.csv (long format)
- TUI: adds [F] Fan Stress Test to Health Check screen with Quick/Standard/Express modes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>