Compare commits

...

88 Commits

Author SHA1 Message Date
911745e4da refactor(iso): replace chroot hooks for DCGM/ROCm with live-build apt sources
Move datacenter-gpu-manager and rocm-smi-lib from dynamic chroot hooks
into live-build's config/archives mechanism so lb caches the .deb files
in cache/packages.chroot/ between builds, eliminating repeated 900+ MB
downloads. Versions pinned via VERSIONS and substituted into package
lists at build time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 13:01:10 +03:00
acfd2010d7 fix(iso): remove firmware-chelsio-t4 (not in Debian bookworm)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 12:43:29 +03:00
e904c13790 fix(iso): remove --no-sandbox from chromium (runs as bee user, not root)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 12:40:42 +03:00
24c5c72cee feat(iso): add NIC firmware packages for broad hardware support
Adds firmware-misc-nonfree (Intel ice/i40e/igc), firmware-bnx2/bnx2x
(Broadcom), firmware-cavium (Marvell/QLogic), firmware-qlogic,
firmware-chelsio-t4, firmware-realtek to fix missing network on
physical servers with modern NICs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 12:38:22 +03:00
6ff0bcad56 feat(iso): show kernel logs on graphical console (remove quiet, loglevel=7)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 11:23:57 +03:00
4fef26000c fix(iso): replace invalid --compression with --chroot-squashfs-compression-type
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:23:00 +03:00
a393dcb731 feat(webui): add POST /api/sat/abort + update bible-local runtime-flows
- jobState now has optional cancel func; abort() calls it if job is running
- handleAPISATRun passes cancellable context to RunNvidiaAcceptancePackWithOptions
- POST /api/sat/abort?job_id=... cancels the running SAT job
- bible-local/runtime-flows.md: replace TUI SAT flow with Web UI flow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:23:00 +03:00
9e55728053 feat(iso): replace --clean-cache with --clean-build (cleans + rebuilds)
--clean-build clears all caches (Go, NVIDIA, lb packages, work dir)
and rebuilds the Docker image, then proceeds with a full clean build.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:12:21 +03:00
4b8023c1cb feat(iso): add --clean-cache option to build-in-container.sh
Removes all cached build artifacts: Go cache, NVIDIA/NCCL/cuBLAS
downloads, lb package cache, and live-build work dir. Use before
a clean rebuild or when switching Debian/kernel versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:11:31 +03:00
4c8417d20a feat(webui): add Install to Disk page
Expose the existing bee-install script through the web UI:
- platform/install.go: remove USB exclusion, add SizeBytes/MountedParts
  fields, add MinInstallBytes()/DiskWarnings() safety checks (size,
  mounted partitions, toram+low-RAM warning)
- webui: add GET /api/install/disks, POST /api/install/run,
  GET /api/install/stream endpoints
- webui: add Install to Disk page with disk table, warning badges,
  device-name confirmation gate, SSE progress terminal, reboot button

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:11:16 +03:00
0755374dd2 perf(iso): speed up builds — zstd squashfs + preserve lb chroot cache
- Switch squashfs compression from xz to zstd (3-5x faster compression,
  ~10-15% larger but decompresses faster at boot)
- Stop rm -rf BUILD_WORK_DIR on each build; rsync only config changes
  so lb can reuse its chroot across builds (skips apt install step)
- Keep lb-packages cache in CACHE_ROOT as fallback if work dir is wiped

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:10:29 +03:00
c70ae274fa revert(iso): remove apt-cacher-ng support, use lb package cache instead
apt-cacher-ng requires a separate container; lb's own package cache
persisted in --cache-dir is simpler and sufficient.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:02:34 +03:00
23ad7ff534 feat(iso): persist lb package cache across builds in cache dir
Saves cache/packages.chroot before wiping BUILD_WORK_DIR and
restores it after, so apt packages are not re-downloaded on every
build. Cache lives in --cache-dir (same place as Go/NVIDIA cache).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 09:59:55 +03:00
de130966f7 feat(iso): add APT_PROXY support to speed up builds via apt-cacher-ng
Pass APT_PROXY=http://host:3142 to build-in-container.sh to route
all apt traffic through a local cache. Also supports --apt-proxy flag.
Mirrors in auto/config are set from BEE_APT_PROXY env when present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 09:57:54 +03:00
c6fbfc8306 fix(boot): restore toram as menu option only, not default boot param
toram was incorrectly added to the default bootappend-live causing
every boot to copy the full ISO to RAM (slow on BMC virtual media).
Default boot reads squashfs from media; toram is available as a
separate menu entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 09:52:25 +03:00
35ad1c74d9 feat(iso): add slim hook to strip locales/man pages/apt cache from squashfs
Removes ~100-300MB from the squashfs: man pages, non-en locales,
python cache, apt lists and package cache, temp files and logs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:44:02 +03:00
4a02e74b17 fix(iso): add git safe.directory so git describe sees v* tags inside container
Without this, git refuses to read the bind-mounted repo (UID mismatch)
and describe returns empty, causing the version to fall back to iso/v1.0.20.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:23:37 +03:00
cd2853ad99 fix(webui): fix viewer static path so Reanimator Chart CSS loads correctly
Mount chart submodule static assets at /static/ (matching the template's
hardcoded href), fix nav to include Audit Snapshot tab, remove dead
renderViewerPage code and iframe from Dashboard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:19:17 +03:00
6caf771d6e fix(boot): restore toram kernel parameter
Without toram the squashfs is read from the physical medium at runtime.
Disconnecting the USB/CD after boot causes SQUASHFS I/O errors on any
uncached block, making all X11 apps crash with SIGBUS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:04:37 +03:00
14fa87b7d7 feat(netconf): add input validation, 'b' to go back, 'a' to abort
- All prompts accept 'a' = abort, 'b' = back to previous step
- Interface input: validate numeric range and name existence, re-prompt on bad input
- IP address: regex check x.x.x.x/prefix format
- Gateway: regex check x.x.x.x format
- Main loop: 'b' at mode selection goes back to interface list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 07:31:23 +03:00
600ece911b fix(desktop): remove forced 1920x1080 modeline, limit LightDM restarts
On real server hardware (IPMI/BMC AST chip + nomodeset) the VESA
framebuffer is set by BIOS at whatever resolution it chooses (often
1024x768 or 1280x1024). The hardcoded 1920x1080 Modeline caused X to
fail → LightDM crash-loop → SOL console flooded with systemd messages.

- Remove Monitor section / Modeline from xorg.conf — fbdev now uses
  whatever framebuffer resolution the kernel provides
- Add lightdm.service.d/bee-limits.conf: RestartSec=10,
  max 3 restarts per 60s so headless hardware doesn't spam the console

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 07:30:51 +03:00
2d424c63cb fix(netconf): accept interface number as input, not just name
User sees a numbered list but could only type the name.
Now numeric input is resolved to the interface name via awk NR==N.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 07:27:49 +03:00
50f28d1ee6 chore: drop legacy TUI/dead code
- Delete audit/internal/app/panel.go (388 lines, zero callers — TUI panel remnant)
- Delete RenderGPULiveChart() from platform/gpu_metrics.go (~155 lines, never called)
- Move formatSATDetail/cleanSummaryKey helpers to app.go (still used)
- Update motd: replace bee-tui with Web UI hint
- Update journald.conf.d comment: remove bee-tui reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 07:27:30 +03:00
3579747ae3 fix(iso): prioritise v[0-9]* tags over iso/v* for ISO filename
Plain v2.x tags are now the active tagging scheme; iso/v1.0.x tags
are legacy. Swap priority in resolve_iso_version so the ISO is named
bee-debian12-v2.x-amd64.iso instead of v1.0.x-N-gHASH.
Also tighten the v* pattern to v[0-9]* to avoid accidentally matching
other prefixed tags in both resolve functions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:34:09 +03:00
09dc7d2613 feat(webui): apply light theme from chart submodule CSS
Replace dark #0f1117 theme with clean white/Semantic-UI-inspired
design matching the updated internal/chart submodule: white surface,
dark sidebar (#1b1c1d), Lato font, blue accent (#2185d0), subtle
borders. Also update submodule pointer to latest commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:31:29 +03:00
ec0b7f7ff9 feat(metrics): single chart engine + full-width stacked layout
- One engine: go-analyze/charts (grafana theme) for all live metrics
- Server chart: CPU temp, CPU load%, mem load%, power W, fan RPMs
- GPU charts: temp, load%, mem%, power W — one card per GPU, added dynamically
- Charts 1400x280px SVG, rendered at width:100% in single-column layout
- Add CPU load (from /proc/stat) and mem load (from /proc/meminfo) to LiveMetricSample
- Add GPU mem utilization to GPUMetricRow (nvidia-smi utilization.memory)
- Document charting architecture in bible-local/architecture/charting.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:26:13 +03:00
e7a7ff54b9 chore: add Makefile with run/build/test targets
make run                          — starts web UI on :8080
make run LISTEN=:9090             — custom port
make run AUDIT_PATH=/tmp/bee.json — with audit data
make build / make test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:14:53 +03:00
b4371e291e fix(build): resolve ISO version from plain v* tags (e.g. v2.6)
resolve_iso_version only matched iso/v* pattern; GUI release tags
(v2, v2.1 ... v2.6) were ignored, falling back to the old v1.0.20
annotated tag via resolve_audit_version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:11:33 +03:00
c22b53a406 feat(boot): set 1920x1080 resolution for framebuffer and GRUB
- Add video=1920x1080 to kernel cmdline (sets fbdev to Full HD)
- Update GRUB gfxmode to 1920x1080 (fallback to 1280x1024,auto)
- Add Xorg Monitor section with 1920x1080 Modeline and preferred mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:10:18 +03:00
ff0acc3698 feat(webui): server-side SVG charts + reanimator-chart viewer
Metrics:
- Replace canvas JS charts with server-side SVG via go-analyze/charts
- Add ring buffers (120 samples) for CPU temp and power
- /api/metrics/chart/{name}.svg endpoint serves live SVG, polled every 2s

Dashboard:
- Replace custom renderViewerPage with viewer.RenderHTML() from reanimator/chart submodule
- Mount chart static assets at /chart/static/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:07:47 +03:00
d50760e7c6 fix(webui): remove emojis from nav, fix metrics chart sizing
- Remove all emojis from sidebar nav and logo (broken on server console fonts)
- Fix canvas chart: use parentElement.getBoundingClientRect() for width,
  set explicit H=120px — fixes empty charts when offsetWidth/Height is 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:49:09 +03:00
ed4f8be019 fix(webui): services table — show state badge, full status on click
Replace raw systemctl output in table cell with:
- state badge (active/failed/inactive) — click to expand
- full systemctl status in collapsible pre block (max 200px scroll)
Fixes layout explosion from multi-line status text in table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:47:59 +03:00
883592d029 feat(desktop): switch to LightDM for X startup (matches Ubuntu LiveCD)
startx from user shell has /dev/fb0 permission issues and is fragile.
LightDM starts Xorg as root — standard LiveCD approach that works
on server hardware / IPMI KVM with nomodeset + fbdev/vesa.

- Add lightdm package, configure autologin as bee/openbox session
- Add /usr/share/xsessions/openbox.desktop
- Remove startx from .profile (LightDM manages X lifecycle)
- Remove Xwrapper.config needs_root_rights workaround (no longer needed)
- Enable lightdm.service in setup hook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:17:59 +03:00
a6dcaf1c7e fix(desktop): fix X permissions for server hardware (IPMI KVM)
- Add bee user to video,input groups (fixes /dev/fb0 permission denied)
- Add Xwrapper.config: needs_root_rights=yes (X gets hw access)
- Add xserver-xorg-video-vesa as fallback driver
- Remove dead bee-tui chmod from setup hook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:07:25 +03:00
88727fb590 fix(desktop): don't exec startx — fall back to shell on X failure
If X fails to start, the user gets a working shell prompt instead
of a dead session or autologin loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:48:26 +03:00
c9f5224c42 feat(console): add netconf command for quick network setup
Interactive script: lists interfaces, DHCP or static IP config.
Shown as hint in tty1 welcome message.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:07:14 +03:00
7cb5c02a9b fix(desktop): force fbdev Xorg driver for server framebuffer
Explicit xorg.conf.d config prevents Xorg from trying KMS/DRM
drivers that fail on server hardware with nomodeset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:05:42 +03:00
c1aa3cf491 fix(desktop): start X on vt1 from .profile for IPMI KVM compatibility
startx from autologin shell targets VT1 directly — KVM sees the
graphical UI without VT switching. Remove bee-desktop.service
(systemd-launched X defaults to VT7, invisible on KVM).
Add xserver-xorg-video-fbdev for server AST/VGA framebuffer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:03:59 +03:00
f7eb75c57c fix(iso): replace grub-pc/grub-efi-amd64 with -bin variants to fix package conflict
grub-pc and grub-efi-amd64 conflict with each other in Debian 12.
The -bin packages provide the same grub-install binaries without conflict.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:12:18 +03:00
004cc4910d feat(webui): replace TUI with full web UI + local openbox desktop
- Remove audit/internal/tui/ (~3000 LOC, bubbletea/lipgloss/reanimator deps)
- Add /api/* REST+SSE endpoints: audit, SAT (nvidia/memory/storage/cpu),
  services, network, export, tools, live metrics stream
- Add async job manager with SSE streaming for long-running operations
- Add platform.SampleLiveMetrics() for live fan/temp/power/GPU polling
- Add multi-page web UI (vanilla JS): Dashboard, Metrics charts, Tests,
  Burn-in, Network, Services, Export, Tools
- Add bee-desktop.service: openbox + Xorg + Chromium opening http://localhost/
- Add openbox/tint2/xorg/xinit/xterm/chromium to ISO package list
- Update .profile, bee.sh, and bible-local docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 19:21:14 +03:00
ed1cceed8c fix(boot): add nomodeset to fix black screen on server VGA/IPMI KVM (AST chip KMS) 2026-03-27 00:13:36 +03:00
9fe9f061f8 fix(nccl-tests): set LIBRARY_PATH so ld finds libnccl.so in nccl cache 2026-03-26 23:59:06 +03:00
837a1fb981 fix(nccl-tests): pin /usr/local/cuda→12.8 symlink, auto-detect gencode by nvcc version 2026-03-26 23:54:07 +03:00
1f43b4e050 fix(nccl-tests): pass NCCL_LIB from nccl cache to fix -lnccl link error 2026-03-26 23:52:25 +03:00
83bbc8a1bc fix(nccl-tests): upgrade to cuda-nvcc-12-8, add sm_100 (Blackwell B100/B200) 2026-03-26 23:51:26 +03:00
896bdb6ee8 fix(nccl-tests): use cuda-nvcc-12-6 to support Ampere/Volta (sm_70..sm_90) 2026-03-26 23:50:36 +03:00
5407c26e25 fix(nccl-tests): CUDA 13.0 supports only sm_90+ (Hopper/H100) 2026-03-26 23:49:45 +03:00
4fddaba9c5 fix(nccl-tests): limit CUDA gencode to sm_70+ (CUDA 13 dropped Pascal) 2026-03-26 23:48:40 +03:00
d2f384b6eb fix(nccl-tests): use plain make instead of non-existent all_reduce_perf target 2026-03-26 23:47:49 +03:00
25f0f30aaf fix(boot): fix black screen on monitor, stop log spam on console
- Add console=tty0 so VGA display gets kernel output (was serial-only)
- Change loglevel=7→3 (debug→errors only)
- Add quiet to suppress verbose kernel boot messages
- journald: ForwardToConsole=no so service logs don't flood tty1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:45:09 +03:00
a57b037a91 feat(installer): add 'Install to disk' in Tools submenu
Copies the live system to a local disk via unsquashfs — no debootstrap,
no network required. Supports UEFI (GPT+EFI) and BIOS (MBR) layouts.

ISO:
- Add squashfs-tools, parted, grub-pc, grub-efi-amd64 to package list
- New overlay script bee-install: partitions, formats, unsquashfs,
  writes fstab, runs grub-install+update-grub in chroot

Go TUI:
- Settings → Tools submenu (Install to disk, Check tools)
- Disk picker screen: lists non-USB, non-boot disks via lsblk
- Confirm screen warns about data loss
- Runs with live progress tail of /tmp/bee-install.log
- platform/install.go: ListInstallDisks, InstallToDisk, findLiveBootDevice

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:35:01 +03:00
5644231f9a feat(nccl): add nccl-tests all_reduce_perf for GPU bandwidth testing
- Dockerfile: install cuda-nvcc-13-0 from NVIDIA repo for compilation
- build-nccl-tests.sh: downloads libnccl-dev for nccl.h, builds all_reduce_perf
- build.sh: runs nccl-tests build, injects binary into /usr/local/bin/
- platform: RunNCCLTests() auto-detects GPU count, runs all_reduce_perf
- TUI: NCCL bandwidth test entry in Burn-in Tests screen [N] hotkey

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:22:19 +03:00
eea98e6d76 feat(dcgm): add NVIDIA DCGM diagnostics, fix KVM console
- Add 9002-nvidia-dcgm.hook.chroot: installs datacenter-gpu-manager
  from NVIDIA apt repo during live-build
- Enable nvidia-dcgm.service in chroot setup hook
- Replace bee-gpu-stress with dcgmi diag (levels 1-4) in NVIDIA SAT
- TUI: replace GPU checkbox + duration UI with DCGM level selection
- Remove console=tty2 from boot params: KVM/VGA now shows tty1
  where bee-tui runs, fixing unresponsive console

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:08:12 +03:00
967455194c feat(iso): make toram optional, add 'load to RAM' boot menu entry
Default boot no longer loads ISO to RAM (slow on BMC virtual media).
Separate menu entry added for toram in both GRUB and isolinux.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:45:04 +03:00
79dabf3efb fix(build): link bee-gpu-stress with -lm for sqrt()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:55:14 +03:00
1336f5b95c fix(cublas): copy include dirs containing files without .h extension
nv/target has no .h suffix; use -type f instead of -name '*.h' to
detect non-empty include directories.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:53:08 +03:00
31486a31c1 fix(cublas): add cuda-cccl package for nv/target header
cuda_fp16.h (included by cublas_api.h) requires <nv/target> from
the CUDA C++ Core Libraries (cuda-cccl-13-0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:49:46 +03:00
aa3fc332ba fix(cublas): check for .h in subdirs when copying non-standard include dirs
ls *.h missed headers in subdirectories like crt/host_defines.h;
use find -maxdepth 2 instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:47:39 +03:00
62c57b87f2 fix(cublas): allow version-free lookup for cuda-crt package
cuda-crt-13-0 may not share the same version string as cuda-cudart-13-0;
pass empty version to lookup_pkg to match the first available version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:46:45 +03:00
f600261546 fix(cublas): add cuda-crt package for crt/host_defines.h
cublasLt.h -> cublas_api.h -> driver_types.h -> crt/host_defines.h
which lives in the cuda-crt-13-0 package, not cudart-dev.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:42:40 +03:00
d7ca04bdfb fix(cublas): search all include/ dirs in deb for CUDA headers
NVIDIA CUDA .deb packages install headers under
/usr/local/cuda-X.Y/targets/x86_64-linux/include/ not /usr/include/,
causing copy_headers() to silently skip them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:35:21 +03:00
5433652c70 fix(cublas): prevent double-print in lookup_pkg awk END block
awk exit in the blank-line block jumps to END, which printed the
result again causing repo_sha to contain the hash twice with a newline,
breaking the sha256 string comparison.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:29:10 +03:00
b25f014dbd fix(cublas): strip CR from Packages.gz fields to fix sha256 comparison
Debian Packages.gz uses CRLF line endings; \r in the captured SHA256
field caused string comparison to fail even when hashes were identical.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:24:58 +03:00
d69a46f211 fix(cublas): redirect diagnostic echo to stderr in download_verified_pkg
Echo messages captured in stdout polluted the return value of
download_verified_pkg(), causing extract_deb() to receive a
multi-line string instead of a file path and silently exit via set -e.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:22:39 +03:00
Mikhail Chusavitin
fc5c2019aa iso: improve burn-in, export, and live boot 2026-03-26 18:56:19 +03:00
Mikhail Chusavitin
67a215c66f fix(iso): route kernel logs to tty2, keep tty1 clean for TUI
console=tty0 sent kernel messages to the active VT (tty1), overwriting
the TUI. Changed to console=tty2 so kernel logs land on a dedicated
console. tty1 is now clean; operator can press Alt+F2 to inspect kernel
messages and Alt+F3 for an extra shell.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 17:40:44 +03:00
Mikhail Chusavitin
8b4bfdf5ad feat(tui): live GPU chart during stress test, full VRAM allocation
- GPU Platform Stress Test now shows a live in-TUI chart instead of nvtop.
  nvidia-smi is polled every second; up to 60 data points per GPU kept.
  All three metrics (Usage %, Temp °C, Power W) drawn on a single plot,
  each normalised to its own range and rendered in a different colour.
- Memory allocation changed from MemoryMB/16 to MemoryMB-512 (full VRAM
  minus 512 MB driver overhead) so bee-gpu-stress actually stresses memory.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 17:37:20 +03:00
Mikhail Chusavitin
0a52a4f3ba fix(iso): restore loglevel=7 on VGA console for crash visibility
loglevel=3 was hiding all kernel messages on tty0/ttyS0 except errors.
Machine crashes (panics, driver oops, module failures) were silent on VGA.

Restored loglevel=7 so kernel messages up to debug are printed to both
tty0 (VGA) and ttyS0 (SOL). Journald MaxLevelConsole reduced to info
(was debug) to reduce noise on SOL while keeping it useful.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 11:19:07 +03:00
Mikhail Chusavitin
b132f7973a fix(iso): derive ISO filename from iso/v* tags, not audit/v*
Previously the ISO file was named after git describe --match 'audit/v*',
so a new iso/ tag produced names like v1.0.9-1-gXXXXXXX instead of v1.0.17.
Now build.sh has resolve_iso_version() that looks at iso/v* tags separately.
The bee binary inside the ISO still uses AUDIT_VERSION_EFFECTIVE.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 11:05:51 +03:00
Mikhail Chusavitin
bd94b6c792 fix(iso): add libnvidia-ptxjitcompiler + ldconfig for PTX JIT and NCCL
- build-nvidia-module.sh: copy libnvidia-ptxjitcompiler.so.* alongside
  libcuda/libnvidia-ml — required by cuModuleLoadDataEx for PTX JIT.
  Without it: CUDA_ERROR_JIT_COMPILER_NOT_FOUND at runtime.
  Cache check updated to force rebuild when ptxjitcompiler is missing.
- bee-nvidia-load: run ldconfig after module load so that NVIDIA/NCCL
  libs injected into /usr/lib/ are visible to dlopen() callers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 10:37:27 +03:00
Mikhail Chusavitin
06017eddfd feat(tui): remove nvtop auto-launch from NVIDIA SAT
nvtop is no longer shown during NVIDIA SAT runs.
[o] Open nvtop shortcut also removed from the running screen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 10:29:05 +03:00
Mikhail Chusavitin
0ac7b6a963 fix(iso): restore console=tty0 — VGA screen was black without it
Commit d36e844 dropped console=tty0 and added dual-serial + debug logging.
Without console=tty0 the kernel never initialises the VGA console,
leaving the physical screen permanently blank.

- Restore console=tty0 (VGA) as primary, keep console=ttyS0 for SOL
- Drop console=ttyS1 (redundant second serial port)
- Replace loglevel=7 + journald debug flood with loglevel=3 (errors only)
  so kernel messages don't overwrite the TUI on the local screen
- Remove systemd.log_target/forward_to_console debug params

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 10:23:53 +03:00
Mikhail Chusavitin
3d2ae4cdcb fix(iso): use Ubuntu jammy codename for AMD ROCm repo — Debian not supported
AMD does not publish Debian Bookworm packages at all (only focal/jammy/noble).
Switch ROCM_UBUNTU_DIST to "jammy"; jammy packages install cleanly on
Debian 12 due to compatible glibc. Also expand candidate list to include
point-releases (6.3.4, 6.3.3, …) so we pick the latest actually-published one.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 10:08:58 +03:00
Mikhail Chusavitin
4669f14f4f feat(tui): GPU Platform Stress Test — live nvtop chart during test
Apply the same pattern as NVIDIA SAT: launch nvtop via tea.ExecProcess
so it occupies the full terminal as a live GPU chart (temp, power, fan,
utilisation lines) while the stress test runs in the background.

- Add screenGPUStressRunning screen + dedicated running/render handlers
- startGPUStressTest: tea.Batch(stress goroutine, tea.ExecProcess(nvtop))
- [o] reopen nvtop at any time; [a] abort (cancels context)
- Graceful degradation: test still runs if nvtop is not on PATH
- gpuStressDoneMsg routes result to screenOutput on completion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 10:01:31 +03:00
Mikhail Chusavitin
540a9e39b8 refactor(audit): rename Fan Stress Test → GPU Platform Stress Test
Update all user-facing strings in TUI and ActionResult title.
Internal identifiers (types, functions, file name) unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 09:56:25 +03:00
Mikhail Chusavitin
58510207fa fix(iso): fall back through ROCm 6.4→6.3→6.2 if repo Release file missing
ROCm 6.4 does not yet publish a Release file for Debian Bookworm, causing
the live-build chroot hook to fail with "does not have a Release file".

Try each version in ROCM_CANDIDATES order; skip to the next if apt-get update
fails (repo unavailable). Exit gracefully if none are available.
Also rename inner 'candidate' variable to 'smi_path' to avoid collision.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 09:52:17 +03:00
Mikhail Chusavitin
4cd7c9ab4e feat(audit): fan-stress SAT for MSI case-04 fan lag & thermal throttle detection
Two-phase GPU thermal cycling test with per-second telemetry:
- Phases: baseline → load1 → pause (no cooldown) → load2 → cooldown
- Monitors: fan RPM (ipmitool sdr), CPU/server temps (ipmitool/sensors),
  system power (ipmitool dcmi), GPU temp/power/usage/clock/throttle (nvidia-smi)
- Detects throttling via clocks_throttle_reasons.active bitmask
- Measures fan response lag from load start (validates case-04 ~2s lag)
- Exports metrics.csv (wide format, one row/sec) and fan-sensors.csv (long format)
- TUI: adds [F] Fan Stress Test to Health Check screen with Quick/Standard/Express modes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 09:51:03 +03:00
Mikhail Chusavitin
cfe255f6e4 Release audit/v1.0.5 2026-03-26 09:41:19 +03:00
Mikhail Chusavitin
8b9d3447d7 Overlay SAT results into audit JSON 2026-03-25 20:11:03 +03:00
Mikhail Chusavitin
614b7cad61 Improve PCIe inventory and hardware identity collection 2026-03-25 20:00:38 +03:00
Mikhail Chusavitin
9a1df9b1ba Tighten support bundles and fix AMD runtime checks 2026-03-25 19:35:25 +03:00
Mikhail Chusavitin
30cf014d58 Rename NVIDIA bootloader modes 2026-03-25 19:12:26 +03:00
Mikhail Chusavitin
27d478aed6 Add bootloader choice for safe vs full NVIDIA boot 2026-03-25 19:11:15 +03:00
Mikhail Chusavitin
d36e8442a9 Stabilize live ISO consoles and NVIDIA boot path 2026-03-25 19:05:18 +03:00
Mikhail Chusavitin
b345b0d14d Derive ISO version from git tags 2026-03-25 18:40:48 +03:00
Mikhail Chusavitin
0a1ac2ab9f Bootstrap ROCm hook prerequisites in ISO build 2026-03-25 18:38:19 +03:00
Mikhail Chusavitin
1e62f828c6 Embed MOTD banner into TUI 2026-03-25 18:11:17 +03:00
Mikhail Chusavitin
f8c997d272 Add missing SAT progress TUI helpers 2026-03-25 18:03:45 +03:00
101 changed files with 7248 additions and 3359 deletions

18
audit/Makefile Normal file
View File

@@ -0,0 +1,18 @@
LISTEN ?= :8080
AUDIT_PATH ?=
RUN_ARGS := web --listen $(LISTEN)
ifneq ($(AUDIT_PATH),)
RUN_ARGS += --audit-path $(AUDIT_PATH)
endif
.PHONY: run build test
run:
go run ./cmd/bee $(RUN_ARGS)
build:
go build -o bee ./cmd/bee
test:
go test ./...

BIN
audit/bee Executable file

Binary file not shown.

View File

@@ -11,7 +11,6 @@ import (
"bee/audit/internal/app"
"bee/audit/internal/platform"
"bee/audit/internal/runtimeenv"
"bee/audit/internal/tui"
"bee/audit/internal/webui"
)
@@ -40,8 +39,6 @@ func run(args []string, stdout, stderr io.Writer) int {
return 0
case "audit":
return runAudit(args[1:], stdout, stderr)
case "tui":
return runTUI(args[1:], stdout, stderr)
case "export":
return runExport(args[1:], stdout, stderr)
case "preflight":
@@ -66,7 +63,6 @@ func printRootUsage(w io.Writer) {
fmt.Fprintln(w, `bee commands:
bee audit --runtime auto|local|livecd --output stdout|file:<path>
bee preflight --output stdout|file:<path>
bee tui --runtime auto|local|livecd
bee export --target <device>
bee support-bundle --output stdout|file:<path>
bee web --listen :80 --audit-path `+app.DefaultAuditJSONPath+`
@@ -79,8 +75,6 @@ func runHelp(args []string, stdout, stderr io.Writer) int {
switch args[0] {
case "audit":
return runAudit([]string{"--help"}, stdout, stdout)
case "tui":
return runTUI([]string{"--help"}, stdout, stdout)
case "export":
return runExport([]string{"--help"}, stdout, stdout)
case "preflight":
@@ -145,42 +139,6 @@ func runAudit(args []string, stdout, stderr io.Writer) int {
return 0
}
func runTUI(args []string, stdout, stderr io.Writer) int {
fs := flag.NewFlagSet("tui", flag.ContinueOnError)
fs.SetOutput(stderr)
runtimeFlag := fs.String("runtime", "auto", "runtime environment: auto, local, livecd")
fs.Usage = func() {
fmt.Fprintln(stderr, "usage: bee tui [--runtime auto|local|livecd]")
fs.PrintDefaults()
}
if err := fs.Parse(args); err != nil {
if err == flag.ErrHelp {
return 0
}
return 2
}
if fs.NArg() != 0 {
fs.Usage()
return 2
}
runtimeInfo, err := runtimeenv.Detect(*runtimeFlag)
if err != nil {
slog.Error("resolve runtime", "err", err)
return 1
}
slog.SetDefault(slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{
Level: slog.LevelInfo,
})))
application := app.New(platform.New())
if err := tui.Run(application, runtimeInfo.Mode); err != nil {
slog.Error("run tui", "err", err)
return 1
}
return 0
}
func runExport(args []string, stdout, stderr io.Writer) int {
fs := flag.NewFlagSet("export", flag.ContinueOnError)
@@ -333,10 +291,18 @@ func runWeb(args []string, stdout, stderr io.Writer) int {
}
slog.Info("starting bee web", "listen", *listenAddr, "audit_path", *auditPath)
runtimeInfo, err := runtimeenv.Detect("auto")
if err != nil {
slog.Warn("resolve runtime for web", "err", err)
}
if err := webui.ListenAndServe(*listenAddr, webui.HandlerOptions{
Title: *title,
AuditPath: *auditPath,
ExportDir: *exportDir,
Title: *title,
AuditPath: *auditPath,
ExportDir: *exportDir,
App: app.New(platform.New()),
RuntimeMode: runtimeInfo.Mode,
}); err != nil {
slog.Error("run web", "err", err)
return 1

View File

@@ -4,25 +4,14 @@ go 1.24.0
replace reanimator/chart => ../internal/chart
require github.com/charmbracelet/bubbletea v1.3.4
require github.com/charmbracelet/lipgloss v1.0.0
require reanimator/chart v0.0.0
require (
github.com/go-analyze/charts v0.5.26
reanimator/chart v0.0.0-00010101000000-000000000000
)
require (
github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
github.com/charmbracelet/lipgloss v1.0.0 // promoted to direct used for TUI colors
github.com/charmbracelet/x/ansi v0.8.0 // indirect
github.com/charmbracelet/x/term v0.2.1 // indirect
github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f // indirect
github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-localereader v0.0.1 // indirect
github.com/mattn/go-runewidth v0.0.16 // indirect
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 // indirect
github.com/muesli/cancelreader v0.2.2 // indirect
github.com/muesli/termenv v0.15.2 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
golang.org/x/sync v0.11.0 // indirect
golang.org/x/sys v0.30.0 // indirect
golang.org/x/text v0.3.8 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/go-analyze/bulk v0.1.3 // indirect
github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0 // indirect
golang.org/x/image v0.24.0 // indirect
)

View File

@@ -1,37 +1,18 @@
github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
github.com/charmbracelet/bubbletea v1.3.4 h1:kCg7B+jSCFPLYRA52SDZjr51kG/fMUEoPoZrkaDHyoI=
github.com/charmbracelet/bubbletea v1.3.4/go.mod h1:dtcUCyCGEX3g9tosuYiut3MXgY/Jsv9nKVdibKKRRXo=
github.com/charmbracelet/lipgloss v1.0.0 h1:O7VkGDvqEdGi93X+DeqsQ7PKHDgtQfF8j8/O2qFMQNg=
github.com/charmbracelet/lipgloss v1.0.0/go.mod h1:U5fy9Z+C38obMs+T+tJqst9VGzlOYGj4ri9reL3qUlo=
github.com/charmbracelet/x/ansi v0.8.0 h1:9GTq3xq9caJW8ZrBTe0LIe2fvfLR/bYXKTx2llXn7xE=
github.com/charmbracelet/x/ansi v0.8.0/go.mod h1:wdYl/ONOLHLIVmQaxbIYEC/cRKOQyjTkowiI4blgS9Q=
github.com/charmbracelet/x/term v0.2.1 h1:AQeHeLZ1OqSXhrAWpYUtZyX1T3zVxfpZuEQMIQaGIAQ=
github.com/charmbracelet/x/term v0.2.1/go.mod h1:oQ4enTYFV7QN4m0i9mzHrViD7TQKvNEEkHUMCmsxdUg=
github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f h1:Y/CXytFA4m6baUTXGLOoWe4PQhGxaX0KpnayAqC48p4=
github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f/go.mod h1:vw97MGsxSvLiUE2X8qFplwetxpGLQrlU1Q9AUEIzCaM=
github.com/lucasb-eyer/go-colorful v1.2.0 h1:1nnpGOrhyZZuNyfu1QjKiUICQ74+3FNCN69Aj6K7nkY=
github.com/lucasb-eyer/go-colorful v1.2.0/go.mod h1:R4dSotOR9KMtayYi1e77YzuveK+i7ruzyGqttikkLy0=
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/mattn/go-localereader v0.0.1 h1:ygSAOl7ZXTx4RdPYinUpg6W99U8jWvWi9Ye2JC/oIi4=
github.com/mattn/go-localereader v0.0.1/go.mod h1:8fBrzywKY7BI3czFoHkuzRoWE9C+EiG4R1k4Cjx5p88=
github.com/mattn/go-runewidth v0.0.16 h1:E5ScNMtiwvlvB5paMFdw9p4kSQzbXFikJ5SQO6TULQc=
github.com/mattn/go-runewidth v0.0.16/go.mod h1:Jdepj2loyihRzMpdS35Xk/zdY8IAYHsh153qUoGf23w=
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6 h1:ZK8zHtRHOkbHy6Mmr5D264iyp3TiX5OmNcI5cIARiQI=
github.com/muesli/ansi v0.0.0-20230316100256-276c6243b2f6/go.mod h1:CJlz5H+gyd6CUWT45Oy4q24RdLyn7Md9Vj2/ldJBSIo=
github.com/muesli/cancelreader v0.2.2 h1:3I4Kt4BQjOR54NavqnDogx/MIoWBFa0StPA8ELUXHmA=
github.com/muesli/cancelreader v0.2.2/go.mod h1:3XuTXfFS2VjM+HTLZY9Ak0l6eUKfijIfMUZ4EgX0QYo=
github.com/muesli/termenv v0.15.2 h1:GohcuySI0QmI3wN8Ok9PtKGkgkFIk7y6Vpb5PvrY+Wo=
github.com/muesli/termenv v0.15.2/go.mod h1:Epx+iuz8sNs7mNKhxzH4fWXGNpZwUaJKRS1noLXviQ8=
github.com/rivo/uniseg v0.2.0/go.mod h1:J6wj4VEh+S6ZtnVlnTBMWIodfgj8LQOQFoIToxlJtxc=
github.com/rivo/uniseg v0.4.7 h1:WUdvkW8uEhrYfLC4ZzdpI2ztxP1I582+49Oc5Mq64VQ=
github.com/rivo/uniseg v0.4.7/go.mod h1:FN3SvrM+Zdj16jyLfmOkMNblXMcoc8DfTHruCPUcx88=
golang.org/x/sync v0.11.0 h1:GGz8+XQP4FvTTrjZPzNKTMFtSXH80RAzG+5ghFPgK9w=
golang.org/x/sync v0.11.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.0.0-20210809222454-d867a43fc93e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.30.0 h1:QjkSwP/36a20jFYWkSue1YwXzLmsV5Gfq7Eiy72C1uc=
golang.org/x/sys v0.30.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/text v0.3.8 h1:nAL+RVCQ9uMn3vJZbV+MRnydTJFPf8qqY42YiA6MrqY=
golang.org/x/text v0.3.8/go.mod h1:E6s5w1FMmriuDzIBO73fBruAKo1PCIq6d2Q6DHfQ8WQ=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/go-analyze/bulk v0.1.3 h1:pzRdBqzHDAT9PyROt0SlWE0YqPtdmTcEpIJY0C3vF0c=
github.com/go-analyze/bulk v0.1.3/go.mod h1:afon/KtFJYnekIyN20H/+XUvcLFjE8sKR1CfpqfClgM=
github.com/go-analyze/charts v0.5.26 h1:rSwZikLQuFX6cJzwI8OAgaWZneG1kDYxD857ms00ZxY=
github.com/go-analyze/charts v0.5.26/go.mod h1:s1YvQhjiSwtLx1f2dOKfiV9x2TT49nVSL6v2rlRpTbY=
github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0 h1:DACJavvAHhabrF08vX0COfcOBJRhZ8lUbR+ZWIs0Y5g=
github.com/golang/freetype v0.0.0-20170609003504-e2365dfdc4a0/go.mod h1:E/TSTwGwJL78qG/PmXZO1EjYhfJinVAhrmmHX6Z8B9k=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U=
golang.org/x/image v0.24.0 h1:AN7zRgVsbvmTfNyqIbbOraYL8mSwcKncEj8ofjgzcMQ=
golang.org/x/image v0.24.0/go.mod h1:4b/ITuLfqYq1hqZcjofwctIhi7sZh2WaCjvsBNjjya8=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

View File

@@ -33,12 +33,13 @@ var (
)
type App struct {
network networkManager
services serviceManager
exports exportManager
tools toolManager
sat satRunner
runtime runtimeChecker
network networkManager
services serviceManager
exports exportManager
tools toolManager
sat satRunner
runtime runtimeChecker
installer installer
}
type ActionResult struct {
@@ -56,6 +57,7 @@ type networkManager interface {
type serviceManager interface {
ListBeeServices() ([]string, error)
ServiceState(name string) string
ServiceStatus(name string) (string, error)
ServiceDo(name string, action platform.ServiceAction) (string, error)
}
@@ -70,9 +72,14 @@ type toolManager interface {
CheckTools(names []string) []platform.ToolStatus
}
type installer interface {
ListInstallDisks() ([]platform.InstallDisk, error)
InstallToDisk(ctx context.Context, device string, logFile string) error
}
type satRunner interface {
RunNvidiaAcceptancePack(baseDir string) (string, error)
RunNvidiaAcceptancePackWithOptions(ctx context.Context, baseDir string, durationSec int, sizeMB int, gpuIndices []int) (string, error)
RunNvidiaAcceptancePackWithOptions(ctx context.Context, baseDir string, diagLevel int, gpuIndices []int) (string, error)
RunMemoryAcceptancePack(baseDir string) (string, error)
RunStorageAcceptancePack(baseDir string) (string, error)
RunCPUAcceptancePack(baseDir string, durationSec int) (string, error)
@@ -80,6 +87,8 @@ type satRunner interface {
DetectGPUVendor() string
ListAMDGPUs() ([]platform.AMDGPUInfo, error)
RunAMDAcceptancePack(baseDir string) (string, error)
RunFanStressTest(ctx context.Context, baseDir string, opts platform.FanStressOptions) (string, error)
RunNCCLTests(ctx context.Context, baseDir string) (string, error)
}
type runtimeChecker interface {
@@ -89,12 +98,13 @@ type runtimeChecker interface {
func New(platform *platform.System) *App {
return &App{
network: platform,
services: platform,
exports: platform,
tools: platform,
sat: platform,
runtime: platform,
network: platform,
services: platform,
exports: platform,
tools: platform,
sat: platform,
runtime: platform,
installer: platform,
}
}
@@ -105,6 +115,7 @@ func (a *App) RunAudit(runtimeMode runtimeenv.Mode, output string) (string, erro
}
}
result := collector.Run(runtimeMode)
applyLatestSATStatuses(&result.Hardware, DefaultSATBaseDir)
if health, err := ReadRuntimeHealth(DefaultRuntimeJSONPath); err == nil {
result.Runtime = &health
}
@@ -173,11 +184,20 @@ func (a *App) RuntimeHealthResult() ActionResult {
if err != nil {
return ActionResult{Title: "Runtime issues", Body: "No runtime health found."}
}
driverLabel := "Driver ready"
accelLabel := "CUDA ready"
switch a.sat.DetectGPUVendor() {
case "amd":
driverLabel = "AMDGPU ready"
accelLabel = "ROCm SMI ready"
case "nvidia":
driverLabel = "NVIDIA ready"
}
var body strings.Builder
fmt.Fprintf(&body, "Status: %s\n", firstNonEmpty(health.Status, "UNKNOWN"))
fmt.Fprintf(&body, "Export dir: %s\n", firstNonEmpty(health.ExportDir, DefaultExportDir))
fmt.Fprintf(&body, "Driver ready: %t\n", health.DriverReady)
fmt.Fprintf(&body, "CUDA ready: %t\n", health.CUDAReady)
fmt.Fprintf(&body, "%s: %t\n", driverLabel, health.DriverReady)
fmt.Fprintf(&body, "%s: %t\n", accelLabel, health.CUDAReady)
fmt.Fprintf(&body, "Network: %s", firstNonEmpty(health.NetworkStatus, "UNKNOWN"))
if len(health.Issues) > 0 {
body.WriteString("\n\nIssues:\n")
@@ -220,8 +240,11 @@ func (a *App) ExportLatestAudit(target platform.RemovableTarget) (string, error)
func (a *App) ExportLatestAuditResult(target platform.RemovableTarget) (ActionResult, error) {
path, err := a.ExportLatestAudit(target)
body := "Audit exported."
if path != "" {
body := "Audit export failed."
if err == nil {
body = "Audit exported."
}
if err == nil && path != "" {
body = "Audit exported to " + path
}
return ActionResult{Title: "Export audit", Body: body}, err
@@ -238,9 +261,12 @@ func (a *App) ExportSupportBundle(target platform.RemovableTarget) (string, erro
func (a *App) ExportSupportBundleResult(target platform.RemovableTarget) (ActionResult, error) {
path, err := a.ExportSupportBundle(target)
body := "Support bundle exported."
if path != "" {
body = "Support bundle exported to " + path
body := "Support bundle export failed."
if err == nil {
body = "Support bundle exported. USB target unmounted and safe to remove."
}
if err == nil && path != "" {
body = "Support bundle exported to " + path + ".\n\nUSB target unmounted and safe to remove."
}
return ActionResult{Title: "Export support bundle", Body: body}, err
}
@@ -331,6 +357,10 @@ func (a *App) ListBeeServices() ([]string, error) {
return a.services.ListBeeServices()
}
func (a *App) ServiceState(name string) string {
return a.services.ServiceState(name)
}
func (a *App) ServiceStatus(name string) (string, error) {
return a.services.ServiceStatus(name)
}
@@ -406,23 +436,16 @@ func (a *App) ListNvidiaGPUs() ([]platform.NvidiaGPU, error) {
return a.sat.ListNvidiaGPUs()
}
func (a *App) RunNvidiaAcceptancePackWithOptions(ctx context.Context, baseDir string, durationSec int, sizeMB int, gpuIndices []int) (ActionResult, error) {
func (a *App) RunNvidiaAcceptancePackWithOptions(ctx context.Context, baseDir string, diagLevel int, gpuIndices []int) (ActionResult, error) {
if strings.TrimSpace(baseDir) == "" {
baseDir = DefaultSATBaseDir
}
path, err := a.sat.RunNvidiaAcceptancePackWithOptions(ctx, baseDir, durationSec, sizeMB, gpuIndices)
path, err := a.sat.RunNvidiaAcceptancePackWithOptions(ctx, baseDir, diagLevel, gpuIndices)
body := "Archive written."
if path != "" {
body = "Archive written to " + path
}
// Include terminal chart if available (runDir = archive path without .tar.gz).
if path != "" {
termPath := filepath.Join(strings.TrimSuffix(path, ".tar.gz"), "gpu-metrics-term.txt")
if chart, readErr := os.ReadFile(termPath); readErr == nil && len(chart) > 0 {
body += "\n\n" + string(chart)
}
}
return ActionResult{Title: "NVIDIA SAT", Body: body}, err
return ActionResult{Title: "NVIDIA DCGM", Body: body}, err
}
func (a *App) RunMemoryAcceptancePack(baseDir string) (string, error) {
@@ -481,6 +504,76 @@ func (a *App) RunAMDAcceptancePackResult(baseDir string) (ActionResult, error) {
return ActionResult{Title: "AMD GPU SAT", Body: satResultBody(path)}, err
}
func (a *App) RunFanStressTest(ctx context.Context, baseDir string, opts platform.FanStressOptions) (string, error) {
if strings.TrimSpace(baseDir) == "" {
baseDir = DefaultSATBaseDir
}
return a.sat.RunFanStressTest(ctx, baseDir, opts)
}
func (a *App) RunNCCLTestsResult(ctx context.Context) (ActionResult, error) {
path, err := a.sat.RunNCCLTests(ctx, DefaultSATBaseDir)
body := "Results: " + path
if err != nil && err != context.Canceled {
body += "\nERROR: " + err.Error()
}
return ActionResult{Title: "NCCL bandwidth test", Body: body}, err
}
func (a *App) RunFanStressTestResult(ctx context.Context, opts platform.FanStressOptions) (ActionResult, error) {
path, err := a.RunFanStressTest(ctx, "", opts)
body := formatFanStressResult(path)
if err != nil && err != context.Canceled {
body += "\nERROR: " + err.Error()
}
return ActionResult{Title: "GPU Platform Stress Test", Body: body}, err
}
// formatFanStressResult formats the summary.txt from a fan-stress run, including
// the per-step pass/fail display and the analysis section (throttling, max temps, fan response).
func formatFanStressResult(archivePath string) string {
if archivePath == "" {
return "No output produced."
}
runDir := strings.TrimSuffix(archivePath, ".tar.gz")
raw, err := os.ReadFile(filepath.Join(runDir, "summary.txt"))
if err != nil {
return "Archive written to " + archivePath
}
content := strings.TrimSpace(string(raw))
kv := parseKeyValueSummary(content)
var b strings.Builder
b.WriteString(formatSATDetail(content))
// Append analysis section.
var analysis []string
if v, ok := kv["throttling_detected"]; ok {
label := "NO"
if v == "true" {
label = "YES ← throttling detected during load"
}
analysis = append(analysis, "Throttling: "+label)
}
if v, ok := kv["max_gpu_temp_c"]; ok && v != "0.0" {
analysis = append(analysis, "Max GPU temp: "+v+"°C")
}
if v, ok := kv["max_cpu_temp_c"]; ok && v != "0.0" {
analysis = append(analysis, "Max CPU temp: "+v+"°C")
}
if v, ok := kv["fan_response_sec"]; ok && v != "N/A" && v != "-1.0" {
analysis = append(analysis, "Fan response: "+v+"s")
}
if len(analysis) > 0 {
b.WriteString("\n\n=== Analysis ===\n")
for _, line := range analysis {
b.WriteString(line + "\n")
}
}
return strings.TrimSpace(b.String())
}
// satResultBody reads summary.txt from the SAT run directory (archive path without .tar.gz)
// and returns a formatted human-readable result. Falls back to a plain message if unreadable.
func satResultBody(archivePath string) string {
@@ -922,3 +1015,70 @@ func firstNonEmpty(values ...string) string {
}
return ""
}
func (a *App) ListInstallDisks() ([]platform.InstallDisk, error) {
return a.installer.ListInstallDisks()
}
func (a *App) InstallToDisk(ctx context.Context, device string, logFile string) error {
return a.installer.InstallToDisk(ctx, device, logFile)
}
func formatSATDetail(raw string) string {
var b strings.Builder
kv := parseKeyValueSummary(raw)
if t, ok := kv["run_at_utc"]; ok {
fmt.Fprintf(&b, "Run: %s\n\n", t)
}
lines := strings.Split(raw, "\n")
var stepKeys []string
seenStep := map[string]bool{}
for _, line := range lines {
if idx := strings.Index(line, "_status="); idx >= 0 {
key := line[:idx]
if !seenStep[key] && key != "overall" {
seenStep[key] = true
stepKeys = append(stepKeys, key)
}
}
}
for _, key := range stepKeys {
status := kv[key+"_status"]
display := cleanSummaryKey(key)
switch status {
case "OK":
fmt.Fprintf(&b, "PASS %s\n", display)
case "FAILED":
fmt.Fprintf(&b, "FAIL %s\n", display)
case "UNSUPPORTED":
fmt.Fprintf(&b, "SKIP %s\n", display)
default:
fmt.Fprintf(&b, "? %s\n", display)
}
}
if overall, ok := kv["overall_status"]; ok {
ok2 := kv["job_ok"]
failed := kv["job_failed"]
fmt.Fprintf(&b, "\nOverall: %s (ok=%s failed=%s)", overall, ok2, failed)
}
return strings.TrimSpace(b.String())
}
func cleanSummaryKey(key string) string {
idx := strings.Index(key, "-")
if idx <= 0 {
return key
}
prefix := key[:idx]
for _, c := range prefix {
if c < '0' || c > '9' {
return key
}
}
return key[idx+1:]
}

View File

@@ -1,9 +1,12 @@
package app
import (
"archive/tar"
"compress/gzip"
"context"
"encoding/json"
"errors"
"io"
"os"
"path/filepath"
"testing"
@@ -49,6 +52,10 @@ func (f fakeServices) ListBeeServices() ([]string, error) {
return nil, nil
}
func (f fakeServices) ServiceState(name string) string {
return "active"
}
func (f fakeServices) ServiceStatus(name string) (string, error) {
return f.serviceStatusFn(name)
}
@@ -57,13 +64,22 @@ func (f fakeServices) ServiceDo(name string, action platform.ServiceAction) (str
return f.serviceDoFn(name, action)
}
type fakeExports struct{}
type fakeExports struct {
listTargetsFn func() ([]platform.RemovableTarget, error)
exportToTargetFn func(string, platform.RemovableTarget) (string, error)
}
func (f fakeExports) ListRemovableTargets() ([]platform.RemovableTarget, error) {
if f.listTargetsFn != nil {
return f.listTargetsFn()
}
return nil, nil
}
func (f fakeExports) ExportFileToTarget(src string, target platform.RemovableTarget) (string, error) {
if f.exportToTargetFn != nil {
return f.exportToTargetFn(src, target)
}
return "", nil
}
@@ -97,21 +113,28 @@ func (f fakeTools) CheckTools(names []string) []platform.ToolStatus {
}
type fakeSAT struct {
runNvidiaFn func(string) (string, error)
runMemoryFn func(string) (string, error)
runStorageFn func(string) (string, error)
runCPUFn func(string, int) (string, error)
runNvidiaFn func(string) (string, error)
runMemoryFn func(string) (string, error)
runStorageFn func(string) (string, error)
runCPUFn func(string, int) (string, error)
detectVendorFn func() string
listAMDGPUsFn func() ([]platform.AMDGPUInfo, error)
runAMDPackFn func(string) (string, error)
listNvidiaGPUsFn func() ([]platform.NvidiaGPU, error)
}
func (f fakeSAT) RunNvidiaAcceptancePack(baseDir string) (string, error) {
return f.runNvidiaFn(baseDir)
}
func (f fakeSAT) RunNvidiaAcceptancePackWithOptions(_ context.Context, baseDir string, _ int, _ int, _ []int) (string, error) {
func (f fakeSAT) RunNvidiaAcceptancePackWithOptions(_ context.Context, baseDir string, _ int, _ []int) (string, error) {
return f.runNvidiaFn(baseDir)
}
func (f fakeSAT) ListNvidiaGPUs() ([]platform.NvidiaGPU, error) {
if f.listNvidiaGPUsFn != nil {
return f.listNvidiaGPUsFn()
}
return nil, nil
}
@@ -130,11 +153,34 @@ func (f fakeSAT) RunCPUAcceptancePack(baseDir string, durationSec int) (string,
return "", nil
}
func (f fakeSAT) DetectGPUVendor() string { return "" }
func (f fakeSAT) DetectGPUVendor() string {
if f.detectVendorFn != nil {
return f.detectVendorFn()
}
return ""
}
func (f fakeSAT) ListAMDGPUs() ([]platform.AMDGPUInfo, error) { return nil, nil }
func (f fakeSAT) ListAMDGPUs() ([]platform.AMDGPUInfo, error) {
if f.listAMDGPUsFn != nil {
return f.listAMDGPUsFn()
}
return nil, nil
}
func (f fakeSAT) RunAMDAcceptancePack(baseDir string) (string, error) { return "", nil }
func (f fakeSAT) RunAMDAcceptancePack(baseDir string) (string, error) {
if f.runAMDPackFn != nil {
return f.runAMDPackFn(baseDir)
}
return "", nil
}
func (f fakeSAT) RunFanStressTest(_ context.Context, _ string, _ platform.FanStressOptions) (string, error) {
return "", nil
}
func (f fakeSAT) RunNCCLTests(_ context.Context, _ string) (string, error) {
return "", nil
}
func TestNetworkStatusFormatsInterfacesAndRoute(t *testing.T) {
t.Parallel()
@@ -394,6 +440,79 @@ func TestActionResultsUseFallbackBody(t *testing.T) {
}
}
func TestExportSupportBundleResultMentionsUnmountedUSB(t *testing.T) {
t.Parallel()
tmp := t.TempDir()
oldExportDir := DefaultExportDir
DefaultExportDir = tmp
t.Cleanup(func() { DefaultExportDir = oldExportDir })
if err := os.WriteFile(filepath.Join(tmp, "bee-audit.json"), []byte("{}\n"), 0644); err != nil {
t.Fatalf("write bee-audit.json: %v", err)
}
if err := os.WriteFile(filepath.Join(tmp, "bee-audit.log"), []byte("audit ok\n"), 0644); err != nil {
t.Fatalf("write bee-audit.log: %v", err)
}
a := &App{
exports: fakeExports{
exportToTargetFn: func(src string, target platform.RemovableTarget) (string, error) {
if filepath.Base(src) == "" {
t.Fatalf("expected non-empty source path")
}
return "/media/bee/" + filepath.Base(src), nil
},
},
}
result, err := a.ExportSupportBundleResult(platform.RemovableTarget{Device: "/dev/sdb1"})
if err != nil {
t.Fatalf("ExportSupportBundleResult error: %v", err)
}
if result.Title != "Export support bundle" {
t.Fatalf("title=%q want %q", result.Title, "Export support bundle")
}
if want := "USB target unmounted and safe to remove."; !contains(result.Body, want) {
t.Fatalf("body missing %q\nbody=%s", want, result.Body)
}
}
func TestExportSupportBundleResultDoesNotPretendSuccessOnError(t *testing.T) {
t.Parallel()
tmp := t.TempDir()
oldExportDir := DefaultExportDir
DefaultExportDir = tmp
t.Cleanup(func() { DefaultExportDir = oldExportDir })
if err := os.WriteFile(filepath.Join(tmp, "bee-audit.json"), []byte("{}\n"), 0644); err != nil {
t.Fatalf("write bee-audit.json: %v", err)
}
if err := os.WriteFile(filepath.Join(tmp, "bee-audit.log"), []byte("audit ok\n"), 0644); err != nil {
t.Fatalf("write bee-audit.log: %v", err)
}
a := &App{
exports: fakeExports{
exportToTargetFn: func(string, platform.RemovableTarget) (string, error) {
return "", errors.New("mount /dev/sda1: exFAT support is missing in this ISO build")
},
},
}
result, err := a.ExportSupportBundleResult(platform.RemovableTarget{Device: "/dev/sda1", FSType: "exfat"})
if err == nil {
t.Fatal("expected export error")
}
if contains(result.Body, "exported to") {
t.Fatalf("body should not claim success:\n%s", result.Body)
}
if result.Body != "Support bundle export failed." {
t.Fatalf("body=%q want %q", result.Body, "Support bundle export failed.")
}
}
func TestRunNvidiaAcceptancePackResult(t *testing.T) {
t.Parallel()
@@ -516,6 +635,9 @@ func TestBuildSupportBundleIncludesExportDirContents(t *testing.T) {
if err := os.WriteFile(filepath.Join(exportDir, "bee-sat", "memory-run", "verbose.log"), []byte("sat verbose"), 0644); err != nil {
t.Fatal(err)
}
if err := os.WriteFile(filepath.Join(exportDir, "bee-sat", "memory-run.tar.gz"), []byte("nested sat archive"), 0644); err != nil {
t.Fatal(err)
}
archive, err := BuildSupportBundle(exportDir)
if err != nil {
@@ -524,6 +646,44 @@ func TestBuildSupportBundleIncludesExportDirContents(t *testing.T) {
if _, err := os.Stat(archive); err != nil {
t.Fatalf("archive stat: %v", err)
}
file, err := os.Open(archive)
if err != nil {
t.Fatalf("open archive: %v", err)
}
defer file.Close()
gzr, err := gzip.NewReader(file)
if err != nil {
t.Fatalf("gzip reader: %v", err)
}
defer gzr.Close()
tr := tar.NewReader(gzr)
var names []string
for {
hdr, err := tr.Next()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
t.Fatalf("read tar entry: %v", err)
}
names = append(names, hdr.Name)
}
var foundRaw bool
for _, name := range names {
if contains(name, "/export/bee-sat/memory-run/verbose.log") {
foundRaw = true
}
if contains(name, "/export/bee-sat/memory-run.tar.gz") {
t.Fatalf("support bundle should not contain nested SAT archive: %s", name)
}
}
if !foundRaw {
t.Fatalf("support bundle missing raw SAT log, names=%v", names)
}
}
func TestMainBanner(t *testing.T) {
@@ -600,6 +760,44 @@ func TestMainBanner(t *testing.T) {
}
}
func TestRuntimeHealthResultUsesAMDLabels(t *testing.T) {
tmp := t.TempDir()
oldRuntimePath := DefaultRuntimeJSONPath
DefaultRuntimeJSONPath = filepath.Join(tmp, "runtime-health.json")
t.Cleanup(func() { DefaultRuntimeJSONPath = oldRuntimePath })
raw, err := json.Marshal(schema.RuntimeHealth{
Status: "OK",
ExportDir: "/appdata/bee/export",
DriverReady: true,
CUDAReady: true,
NetworkStatus: "OK",
})
if err != nil {
t.Fatalf("marshal runtime health: %v", err)
}
if err := os.WriteFile(DefaultRuntimeJSONPath, raw, 0644); err != nil {
t.Fatalf("write runtime health: %v", err)
}
a := &App{
sat: fakeSAT{
detectVendorFn: func() string { return "amd" },
},
}
result := a.RuntimeHealthResult()
if !contains(result.Body, "AMDGPU ready: true") {
t.Fatalf("body missing AMD driver label:\n%s", result.Body)
}
if !contains(result.Body, "ROCm SMI ready: true") {
t.Fatalf("body missing ROCm label:\n%s", result.Body)
}
if contains(result.Body, "CUDA ready") {
t.Fatalf("body should not mention CUDA on AMD:\n%s", result.Body)
}
}
func intPtr(v int) *int { return &v }
func contains(haystack, needle string) bool {

View File

@@ -1,387 +0,0 @@
package app
import (
"encoding/json"
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"bee/audit/internal/schema"
)
// ComponentRow is one line in the hardware panel.
type ComponentRow struct {
Key string // "CPU", "MEM", "GPU", "DISK", "PSU"
Status string // "PASS", "FAIL", "CANCEL", "N/A"
Detail string // compact one-liner
}
// HardwarePanelData holds everything the TUI right panel needs.
type HardwarePanelData struct {
Header []string
Rows []ComponentRow
}
// LoadHardwarePanel reads the latest audit JSON and SAT summaries.
// Returns empty panel if no audit data exists yet.
func (a *App) LoadHardwarePanel() HardwarePanelData {
raw, err := os.ReadFile(DefaultAuditJSONPath)
if err != nil {
return HardwarePanelData{Header: []string{"No audit data — run audit first."}}
}
var snap schema.HardwareIngestRequest
if err := json.Unmarshal(raw, &snap); err != nil {
return HardwarePanelData{Header: []string{"Audit data unreadable."}}
}
statuses := satStatuses()
var header []string
if sys := formatSystemLine(snap.Hardware.Board); sys != "" {
header = append(header, sys)
}
for _, fw := range snap.Hardware.Firmware {
if fw.DeviceName == "BIOS" && fw.Version != "" {
header = append(header, "BIOS: "+fw.Version)
}
if fw.DeviceName == "BMC" && fw.Version != "" {
header = append(header, "BMC: "+fw.Version)
}
}
if ip := formatIPLine(a.network.ListInterfaces); ip != "" {
header = append(header, ip)
}
var rows []ComponentRow
if cpu := formatCPULine(snap.Hardware.CPUs); cpu != "" {
rows = append(rows, ComponentRow{
Key: "CPU",
Status: statuses["cpu"],
Detail: strings.TrimPrefix(cpu, "CPU: "),
})
}
if mem := formatMemoryLine(snap.Hardware.Memory); mem != "" {
rows = append(rows, ComponentRow{
Key: "MEM",
Status: statuses["memory"],
Detail: strings.TrimPrefix(mem, "Memory: "),
})
}
if gpu := formatGPULine(snap.Hardware.PCIeDevices); gpu != "" {
rows = append(rows, ComponentRow{
Key: "GPU",
Status: statuses["gpu"],
Detail: strings.TrimPrefix(gpu, "GPU: "),
})
}
if disk := formatStorageLine(snap.Hardware.Storage); disk != "" {
rows = append(rows, ComponentRow{
Key: "DISK",
Status: statuses["storage"],
Detail: strings.TrimPrefix(disk, "Storage: "),
})
}
if psu := formatPSULine(snap.Hardware.PowerSupplies); psu != "" {
rows = append(rows, ComponentRow{
Key: "PSU",
Status: "N/A",
Detail: psu,
})
}
return HardwarePanelData{Header: header, Rows: rows}
}
// ComponentDetailResult returns detail text for a component shown in the panel.
func (a *App) ComponentDetailResult(key string) ActionResult {
switch key {
case "CPU":
return a.cpuDetailResult(false)
case "MEM":
return a.satDetailResult("memory", "memory-", "MEM detail")
case "GPU":
// Prefer whichever GPU SAT was run most recently.
nv, _ := filepath.Glob(filepath.Join(DefaultSATBaseDir, "gpu-nvidia-*/summary.txt"))
am, _ := filepath.Glob(filepath.Join(DefaultSATBaseDir, "gpu-amd-*/summary.txt"))
sort.Strings(nv)
sort.Strings(am)
latestNV := ""
if len(nv) > 0 {
latestNV = nv[len(nv)-1]
}
latestAM := ""
if len(am) > 0 {
latestAM = am[len(am)-1]
}
if latestAM > latestNV {
return a.satDetailResult("gpu", "gpu-amd-", "GPU detail")
}
return a.satDetailResult("gpu", "gpu-nvidia-", "GPU detail")
case "DISK":
return a.satDetailResult("storage", "storage-", "DISK detail")
case "PSU":
return a.psuDetailResult()
default:
return ActionResult{Title: key, Body: "No detail available."}
}
}
func (a *App) cpuDetailResult(satOnly bool) ActionResult {
var b strings.Builder
// Show latest SAT summary if available.
satResult := a.satDetailResult("cpu", "cpu-", "CPU SAT")
if satResult.Body != "No test results found. Run a test first." {
fmt.Fprintln(&b, "=== Last SAT ===")
fmt.Fprintln(&b, satResult.Body)
fmt.Fprintln(&b)
}
if satOnly {
body := strings.TrimSpace(b.String())
if body == "" {
body = "No CPU SAT results found. Run a test first."
}
return ActionResult{Title: "CPU SAT", Body: body}
}
raw, err := os.ReadFile(DefaultAuditJSONPath)
if err != nil {
return ActionResult{Title: "CPU", Body: strings.TrimSpace(b.String())}
}
var snap schema.HardwareIngestRequest
if err := json.Unmarshal(raw, &snap); err != nil {
return ActionResult{Title: "CPU", Body: strings.TrimSpace(b.String())}
}
if len(snap.Hardware.CPUs) == 0 {
return ActionResult{Title: "CPU", Body: strings.TrimSpace(b.String())}
}
fmt.Fprintln(&b, "=== Audit ===")
for i, cpu := range snap.Hardware.CPUs {
fmt.Fprintf(&b, "CPU %d\n", i)
if cpu.Model != nil {
fmt.Fprintf(&b, " Model: %s\n", *cpu.Model)
}
if cpu.Manufacturer != nil {
fmt.Fprintf(&b, " Vendor: %s\n", *cpu.Manufacturer)
}
if cpu.Cores != nil {
fmt.Fprintf(&b, " Cores: %d\n", *cpu.Cores)
}
if cpu.Threads != nil {
fmt.Fprintf(&b, " Threads: %d\n", *cpu.Threads)
}
if cpu.MaxFrequencyMHz != nil {
fmt.Fprintf(&b, " Max freq: %d MHz\n", *cpu.MaxFrequencyMHz)
}
if cpu.TemperatureC != nil {
fmt.Fprintf(&b, " Temp: %.1f°C\n", *cpu.TemperatureC)
}
if cpu.Throttled != nil {
fmt.Fprintf(&b, " Throttled: %v\n", *cpu.Throttled)
}
if cpu.CorrectableErrorCount != nil && *cpu.CorrectableErrorCount > 0 {
fmt.Fprintf(&b, " ECC correctable: %d\n", *cpu.CorrectableErrorCount)
}
if cpu.UncorrectableErrorCount != nil && *cpu.UncorrectableErrorCount > 0 {
fmt.Fprintf(&b, " ECC uncorrectable: %d\n", *cpu.UncorrectableErrorCount)
}
if i < len(snap.Hardware.CPUs)-1 {
fmt.Fprintln(&b)
}
}
return ActionResult{Title: "CPU", Body: strings.TrimSpace(b.String())}
}
func (a *App) satDetailResult(statusKey, prefix, title string) ActionResult {
matches, err := filepath.Glob(filepath.Join(DefaultSATBaseDir, prefix+"*/summary.txt"))
if err != nil || len(matches) == 0 {
return ActionResult{Title: title, Body: "No test results found. Run a test first."}
}
sort.Strings(matches)
raw, err := os.ReadFile(matches[len(matches)-1])
if err != nil {
return ActionResult{Title: title, Body: "Could not read test results."}
}
return ActionResult{Title: title, Body: formatSATDetail(strings.TrimSpace(string(raw)))}
}
// formatSATDetail converts raw summary.txt key=value content to a human-readable per-step display.
func formatSATDetail(raw string) string {
var b strings.Builder
kv := parseKeyValueSummary(raw)
if t, ok := kv["run_at_utc"]; ok {
fmt.Fprintf(&b, "Run: %s\n\n", t)
}
// Collect step names in order they appear in the file
lines := strings.Split(raw, "\n")
var stepKeys []string
seenStep := map[string]bool{}
for _, line := range lines {
if idx := strings.Index(line, "_status="); idx >= 0 {
key := line[:idx]
if !seenStep[key] && key != "overall" {
seenStep[key] = true
stepKeys = append(stepKeys, key)
}
}
}
for _, key := range stepKeys {
status := kv[key+"_status"]
display := cleanSummaryKey(key)
switch status {
case "OK":
fmt.Fprintf(&b, "PASS %s\n", display)
case "FAILED":
fmt.Fprintf(&b, "FAIL %s\n", display)
case "UNSUPPORTED":
fmt.Fprintf(&b, "SKIP %s\n", display)
default:
fmt.Fprintf(&b, "? %s\n", display)
}
}
if overall, ok := kv["overall_status"]; ok {
ok2 := kv["job_ok"]
failed := kv["job_failed"]
fmt.Fprintf(&b, "\nOverall: %s (ok=%s failed=%s)", overall, ok2, failed)
}
return strings.TrimSpace(b.String())
}
// cleanSummaryKey strips the leading numeric prefix from a SAT step key.
// "1-lscpu" → "lscpu", "3-stress-ng" → "stress-ng"
func cleanSummaryKey(key string) string {
idx := strings.Index(key, "-")
if idx <= 0 {
return key
}
prefix := key[:idx]
for _, c := range prefix {
if c < '0' || c > '9' {
return key
}
}
return key[idx+1:]
}
func (a *App) psuDetailResult() ActionResult {
raw, err := os.ReadFile(DefaultAuditJSONPath)
if err != nil {
return ActionResult{Title: "PSU", Body: "No audit data."}
}
var snap schema.HardwareIngestRequest
if err := json.Unmarshal(raw, &snap); err != nil {
return ActionResult{Title: "PSU", Body: "Audit data unreadable."}
}
if len(snap.Hardware.PowerSupplies) == 0 {
return ActionResult{Title: "PSU", Body: "No PSU data in last audit."}
}
var b strings.Builder
for i, psu := range snap.Hardware.PowerSupplies {
fmt.Fprintf(&b, "PSU %d\n", i)
if psu.Model != nil {
fmt.Fprintf(&b, " Model: %s\n", *psu.Model)
}
if psu.Vendor != nil {
fmt.Fprintf(&b, " Vendor: %s\n", *psu.Vendor)
}
if psu.WattageW != nil {
fmt.Fprintf(&b, " Rated: %d W\n", *psu.WattageW)
}
if psu.InputPowerW != nil {
fmt.Fprintf(&b, " Input: %.1f W\n", *psu.InputPowerW)
}
if psu.OutputPowerW != nil {
fmt.Fprintf(&b, " Output: %.1f W\n", *psu.OutputPowerW)
}
if psu.TemperatureC != nil {
fmt.Fprintf(&b, " Temp: %.1f°C\n", *psu.TemperatureC)
}
if i < len(snap.Hardware.PowerSupplies)-1 {
fmt.Fprintln(&b)
}
}
return ActionResult{Title: "PSU", Body: strings.TrimSpace(b.String())}
}
// satStatuses reads the latest summary.txt for each SAT type and returns
// a map of component key ("gpu","memory","storage") → status ("PASS","FAIL","CANCEL","N/A").
func satStatuses() map[string]string {
result := map[string]string{
"gpu": "N/A",
"memory": "N/A",
"storage": "N/A",
"cpu": "N/A",
}
patterns := []struct {
key string
prefix string
}{
{"gpu", "gpu-nvidia-"},
{"gpu", "gpu-amd-"},
{"memory", "memory-"},
{"storage", "storage-"},
{"cpu", "cpu-"},
}
for _, item := range patterns {
matches, err := filepath.Glob(filepath.Join(DefaultSATBaseDir, item.prefix+"*/summary.txt"))
if err != nil || len(matches) == 0 {
continue
}
sort.Strings(matches)
raw, err := os.ReadFile(matches[len(matches)-1])
if err != nil {
continue
}
values := parseKeyValueSummary(string(raw))
switch strings.ToUpper(strings.TrimSpace(values["overall_status"])) {
case "OK":
result[item.key] = "PASS"
case "FAILED":
result[item.key] = "FAIL"
case "CANCELED", "CANCELLED":
result[item.key] = "CANCEL"
}
}
return result
}
func formatPSULine(psus []schema.HardwarePowerSupply) string {
var present []schema.HardwarePowerSupply
for _, psu := range psus {
if psu.Present != nil && !*psu.Present {
continue
}
present = append(present, psu)
}
if len(present) == 0 {
return ""
}
firstW := 0
if present[0].WattageW != nil {
firstW = *present[0].WattageW
}
allSame := firstW > 0
for _, p := range present[1:] {
w := 0
if p.WattageW != nil {
w = *p.WattageW
}
if w != firstW {
allSame = false
break
}
}
if allSame && firstW > 0 {
return fmt.Sprintf("%dx %dW", len(present), firstW)
}
return fmt.Sprintf("%d PSU", len(present))
}

View File

@@ -0,0 +1,214 @@
package app
import (
"os"
"path/filepath"
"sort"
"strings"
"bee/audit/internal/schema"
)
func applyLatestSATStatuses(snap *schema.HardwareSnapshot, baseDir string) {
if snap == nil || strings.TrimSpace(baseDir) == "" {
return
}
if summary, ok := loadLatestSATSummary(baseDir, "gpu-amd-"); ok {
applyGPUVendorSAT(snap.PCIeDevices, "amd", summary)
}
if summary, ok := loadLatestSATSummary(baseDir, "gpu-nvidia-"); ok {
applyGPUVendorSAT(snap.PCIeDevices, "nvidia", summary)
}
if summary, ok := loadLatestSATSummary(baseDir, "memory-"); ok {
applyMemorySAT(snap.Memory, summary)
}
if summary, ok := loadLatestSATSummary(baseDir, "cpu-"); ok {
applyCPUSAT(snap.CPUs, summary)
}
if summary, ok := loadLatestSATSummary(baseDir, "storage-"); ok {
applyStorageSAT(snap.Storage, summary)
}
}
type satSummary struct {
runAtUTC string
overall string
kv map[string]string
}
func loadLatestSATSummary(baseDir, prefix string) (satSummary, bool) {
matches, err := filepath.Glob(filepath.Join(baseDir, prefix+"*/summary.txt"))
if err != nil || len(matches) == 0 {
return satSummary{}, false
}
sort.Strings(matches)
raw, err := os.ReadFile(matches[len(matches)-1])
if err != nil {
return satSummary{}, false
}
kv := parseKeyValueSummary(string(raw))
return satSummary{
runAtUTC: strings.TrimSpace(kv["run_at_utc"]),
overall: strings.ToUpper(strings.TrimSpace(kv["overall_status"])),
kv: kv,
}, true
}
func applyGPUVendorSAT(devs []schema.HardwarePCIeDevice, vendor string, summary satSummary) {
status, description, ok := satSummaryStatus(summary, vendor+" GPU SAT")
if !ok {
return
}
for i := range devs {
if !matchesGPUVendor(devs[i], vendor) {
continue
}
mergeComponentStatus(&devs[i].HardwareComponentStatus, summary.runAtUTC, status, description)
}
}
func applyMemorySAT(dimms []schema.HardwareMemory, summary satSummary) {
status, description, ok := satSummaryStatus(summary, "memory SAT")
if !ok {
return
}
for i := range dimms {
mergeComponentStatus(&dimms[i].HardwareComponentStatus, summary.runAtUTC, status, description)
}
}
func applyCPUSAT(cpus []schema.HardwareCPU, summary satSummary) {
status, description, ok := satSummaryStatus(summary, "CPU SAT")
if !ok {
return
}
for i := range cpus {
mergeComponentStatus(&cpus[i].HardwareComponentStatus, summary.runAtUTC, status, description)
}
}
func applyStorageSAT(disks []schema.HardwareStorage, summary satSummary) {
byDevice := parseStorageSATStatus(summary)
for i := range disks {
devPath, _ := disks[i].Telemetry["linux_device"].(string)
devName := filepath.Base(strings.TrimSpace(devPath))
if devName == "" {
continue
}
result, ok := byDevice[devName]
if !ok {
continue
}
mergeComponentStatus(&disks[i].HardwareComponentStatus, summary.runAtUTC, result.status, result.description)
}
}
type satStatusResult struct {
status string
description string
ok bool
}
func parseStorageSATStatus(summary satSummary) map[string]satStatusResult {
result := map[string]satStatusResult{}
for key, value := range summary.kv {
if !strings.HasSuffix(key, "_status") || key == "overall_status" {
continue
}
base := strings.TrimSuffix(key, "_status")
idx := strings.Index(base, "_")
if idx <= 0 {
continue
}
devName := base[:idx]
step := strings.ReplaceAll(base[idx+1:], "_", "-")
stepStatus, desc, ok := satKeyStatus(strings.ToUpper(strings.TrimSpace(value)), "storage "+step)
if !ok {
continue
}
current := result[devName]
if !current.ok || statusSeverity(stepStatus) > statusSeverity(current.status) {
result[devName] = satStatusResult{status: stepStatus, description: desc, ok: true}
}
}
return result
}
func satSummaryStatus(summary satSummary, label string) (string, string, bool) {
return satKeyStatus(summary.overall, label)
}
func satKeyStatus(rawStatus, label string) (string, string, bool) {
switch strings.ToUpper(strings.TrimSpace(rawStatus)) {
case "OK":
return "OK", label + " passed", true
case "PARTIAL", "UNSUPPORTED", "CANCELED", "CANCELLED":
return "Warning", label + " incomplete", true
case "FAILED":
return "Critical", label + " failed", true
default:
return "", "", false
}
}
func mergeComponentStatus(component *schema.HardwareComponentStatus, changedAt, satStatus, description string) {
if component == nil || satStatus == "" {
return
}
current := strings.TrimSpace(ptrString(component.Status))
if current == "" || current == "Unknown" || statusSeverity(satStatus) > statusSeverity(current) {
component.Status = appStringPtr(satStatus)
if strings.TrimSpace(description) != "" {
component.ErrorDescription = appStringPtr(description)
}
if strings.TrimSpace(changedAt) != "" {
component.StatusChangedAt = appStringPtr(changedAt)
component.StatusHistory = append(component.StatusHistory, schema.HardwareStatusHistory{
Status: satStatus,
ChangedAt: changedAt,
Details: appStringPtr(description),
})
}
}
}
func statusSeverity(status string) int {
switch strings.TrimSpace(status) {
case "Critical":
return 3
case "Warning":
return 2
case "OK":
return 1
default:
return 0
}
}
func matchesGPUVendor(dev schema.HardwarePCIeDevice, vendor string) bool {
if dev.DeviceClass == nil || !strings.Contains(strings.TrimSpace(*dev.DeviceClass), "Controller") && !strings.Contains(strings.TrimSpace(*dev.DeviceClass), "Accelerator") {
if dev.DeviceClass == nil || !strings.Contains(strings.TrimSpace(*dev.DeviceClass), "Display") && !strings.Contains(strings.TrimSpace(*dev.DeviceClass), "Video") {
return false
}
}
manufacturer := strings.ToLower(strings.TrimSpace(ptrString(dev.Manufacturer)))
switch vendor {
case "amd":
return strings.Contains(manufacturer, "advanced micro devices") || strings.Contains(manufacturer, "amd/ati")
case "nvidia":
return strings.Contains(manufacturer, "nvidia")
default:
return false
}
}
func ptrString(v *string) string {
if v == nil {
return ""
}
return *v
}
func appStringPtr(value string) *string {
return &value
}

View File

@@ -0,0 +1,61 @@
package app
import (
"os"
"path/filepath"
"testing"
"bee/audit/internal/schema"
)
func TestApplyLatestSATStatusesMarksStorageByDevice(t *testing.T) {
baseDir := t.TempDir()
runDir := filepath.Join(baseDir, "storage-20260325-161151")
if err := os.MkdirAll(runDir, 0755); err != nil {
t.Fatal(err)
}
raw := "run_at_utc=2026-03-25T16:11:51Z\nnvme0n1_nvme_smart_log_status=OK\nsda_smartctl_health_status=FAILED\noverall_status=FAILED\n"
if err := os.WriteFile(filepath.Join(runDir, "summary.txt"), []byte(raw), 0644); err != nil {
t.Fatal(err)
}
nvme := schema.HardwareStorage{Telemetry: map[string]any{"linux_device": "/dev/nvme0n1"}}
usb := schema.HardwareStorage{Telemetry: map[string]any{"linux_device": "/dev/sda"}}
snap := schema.HardwareSnapshot{Storage: []schema.HardwareStorage{nvme, usb}}
applyLatestSATStatuses(&snap, baseDir)
if snap.Storage[0].Status == nil || *snap.Storage[0].Status != "OK" {
t.Fatalf("nvme status=%v want OK", snap.Storage[0].Status)
}
if snap.Storage[1].Status == nil || *snap.Storage[1].Status != "Critical" {
t.Fatalf("sda status=%v want Critical", snap.Storage[1].Status)
}
}
func TestApplyLatestSATStatusesMarksAMDGPUs(t *testing.T) {
baseDir := t.TempDir()
runDir := filepath.Join(baseDir, "gpu-amd-20260325-161436")
if err := os.MkdirAll(runDir, 0755); err != nil {
t.Fatal(err)
}
raw := "run_at_utc=2026-03-25T16:14:36Z\noverall_status=FAILED\n"
if err := os.WriteFile(filepath.Join(runDir, "summary.txt"), []byte(raw), 0644); err != nil {
t.Fatal(err)
}
class := "DisplayController"
manufacturer := "Advanced Micro Devices, Inc. [AMD/ATI]"
snap := schema.HardwareSnapshot{
PCIeDevices: []schema.HardwarePCIeDevice{{
DeviceClass: &class,
Manufacturer: &manufacturer,
}},
}
applyLatestSATStatuses(&snap, baseDir)
if snap.PCIeDevices[0].Status == nil || *snap.PCIeDevices[0].Status != "Critical" {
t.Fatalf("gpu status=%v want Critical", snap.PCIeDevices[0].Status)
}
}

View File

@@ -56,7 +56,7 @@ func BuildSupportBundle(exportDir string) (string, error) {
}
defer os.RemoveAll(stageRoot)
if err := copyDirContents(exportDir, filepath.Join(stageRoot, "export")); err != nil {
if err := copyExportDirForSupportBundle(exportDir, filepath.Join(stageRoot, "export")); err != nil {
return "", err
}
if err := writeJournalDump(filepath.Join(stageRoot, "systemd", "combined.journal.log")); err != nil {
@@ -214,6 +214,40 @@ func copyDirContents(srcDir, dstDir string) error {
return nil
}
func copyExportDirForSupportBundle(srcDir, dstDir string) error {
return copyDirContentsFiltered(srcDir, dstDir, func(rel string, info os.FileInfo) bool {
cleanRel := filepath.ToSlash(strings.TrimPrefix(filepath.Clean(rel), "./"))
if cleanRel == "" {
return true
}
if strings.HasPrefix(cleanRel, "bee-sat/") && strings.HasSuffix(cleanRel, ".tar.gz") {
return false
}
if strings.HasPrefix(filepath.Base(cleanRel), "bee-support-") && strings.HasSuffix(cleanRel, ".tar.gz") {
return false
}
return true
})
}
func copyDirContentsFiltered(srcDir, dstDir string, keep func(rel string, info os.FileInfo) bool) error {
entries, err := os.ReadDir(srcDir)
if err != nil {
if os.IsNotExist(err) {
return nil
}
return err
}
for _, entry := range entries {
src := filepath.Join(srcDir, entry.Name())
dst := filepath.Join(dstDir, entry.Name())
if err := copyPathFiltered(srcDir, src, dst, keep); err != nil {
return err
}
}
return nil
}
func copyPath(src, dst string) error {
info, err := os.Stat(src)
if err != nil {
@@ -254,6 +288,36 @@ func copyPath(src, dst string) error {
return err
}
func copyPathFiltered(rootSrc, src, dst string, keep func(rel string, info os.FileInfo) bool) error {
info, err := os.Stat(src)
if err != nil {
return err
}
rel, err := filepath.Rel(rootSrc, src)
if err != nil {
return err
}
if keep != nil && !keep(rel, info) {
return nil
}
if info.IsDir() {
if err := os.MkdirAll(dst, info.Mode().Perm()); err != nil {
return err
}
entries, err := os.ReadDir(src)
if err != nil {
return err
}
for _, entry := range entries {
if err := copyPathFiltered(rootSrc, filepath.Join(src, entry.Name()), filepath.Join(dst, entry.Name()), keep); err != nil {
return err
}
}
return nil
}
return copyPath(src, dst)
}
func createSupportTarGz(dst, srcDir string) error {
file, err := os.Create(dst)
if err != nil {

View File

@@ -0,0 +1,252 @@
package collector
import (
"encoding/csv"
"log/slog"
"os/exec"
"path/filepath"
"sort"
"strconv"
"strings"
"bee/audit/internal/schema"
)
var (
amdSMIExecCommand = exec.Command
amdSMILookPath = exec.LookPath
amdSMIGlob = filepath.Glob
)
var amdSMIExecutableGlobs = []string{
"/opt/rocm/bin/rocm-smi",
"/opt/rocm-*/bin/rocm-smi",
"/usr/local/bin/rocm-smi",
}
type amdGPUInfo struct {
BDF string
Serial string
Product string
Firmware string
PowerW *float64
TempC *float64
}
func enrichPCIeWithAMD(devs []schema.HardwarePCIeDevice) []schema.HardwarePCIeDevice {
if !hasAMDGPUDevices(devs) {
return devs
}
infoByBDF, err := queryAMDGPUs()
if err != nil {
slog.Info("amdgpu: enrichment skipped", "err", err)
return devs
}
enriched := 0
for i := range devs {
if !isAMDGPUDevice(devs[i]) || devs[i].BDF == nil {
continue
}
info, ok := infoByBDF[normalizePCIeBDF(*devs[i].BDF)]
if !ok {
continue
}
if strings.TrimSpace(info.Serial) != "" {
devs[i].SerialNumber = &info.Serial
}
if strings.TrimSpace(info.Firmware) != "" {
devs[i].Firmware = &info.Firmware
}
if strings.TrimSpace(info.Product) != "" && devs[i].Model == nil {
devs[i].Model = &info.Product
}
if info.PowerW != nil {
devs[i].PowerW = info.PowerW
}
if info.TempC != nil {
devs[i].TemperatureC = info.TempC
}
enriched++
}
if enriched > 0 {
slog.Info("amdgpu: enriched", "count", enriched)
}
return devs
}
func hasAMDGPUDevices(devs []schema.HardwarePCIeDevice) bool {
for _, dev := range devs {
if isAMDGPUDevice(dev) {
return true
}
}
return false
}
func isAMDGPUDevice(dev schema.HardwarePCIeDevice) bool {
if dev.Manufacturer == nil || dev.DeviceClass == nil {
return false
}
manufacturer := strings.ToLower(strings.TrimSpace(*dev.Manufacturer))
return strings.Contains(manufacturer, "advanced micro devices") && isGPUClass(strings.TrimSpace(*dev.DeviceClass))
}
func queryAMDGPUs() (map[string]amdGPUInfo, error) {
busByCard, err := queryAMDField("--showbus")
if err != nil {
return nil, err
}
infoByCard := map[string]amdGPUInfo{}
for card, bus := range busByCard {
bdf := normalizePCIeBDF(bus)
if bdf == "" {
continue
}
infoByCard[card] = amdGPUInfo{BDF: bdf}
}
if len(infoByCard) == 0 {
return map[string]amdGPUInfo{}, nil
}
mergeAMDField(infoByCard, "--showserial", func(info *amdGPUInfo, value string) { info.Serial = value })
mergeAMDField(infoByCard, "--showproductname", func(info *amdGPUInfo, value string) { info.Product = value })
mergeAMDField(infoByCard, "--showvbios", func(info *amdGPUInfo, value string) { info.Firmware = value })
mergeAMDNumericField(infoByCard, "--showpower", func(info *amdGPUInfo, value float64) { info.PowerW = &value })
mergeAMDNumericField(infoByCard, "--showtemp", func(info *amdGPUInfo, value float64) { info.TempC = &value })
result := make(map[string]amdGPUInfo, len(infoByCard))
for _, info := range infoByCard {
if info.BDF == "" {
continue
}
result[info.BDF] = info
}
return result, nil
}
func mergeAMDField(infoByCard map[string]amdGPUInfo, flag string, apply func(*amdGPUInfo, string)) {
values, err := queryAMDField(flag)
if err != nil {
return
}
for card, value := range values {
info, ok := infoByCard[card]
if !ok {
continue
}
value = strings.TrimSpace(value)
if value == "" {
continue
}
apply(&info, value)
infoByCard[card] = info
}
}
func mergeAMDNumericField(infoByCard map[string]amdGPUInfo, flag string, apply func(*amdGPUInfo, float64)) {
values, err := queryAMDNumericField(flag)
if err != nil {
return
}
for card, value := range values {
info, ok := infoByCard[card]
if !ok {
continue
}
apply(&info, value)
infoByCard[card] = info
}
}
func queryAMDField(flag string) (map[string]string, error) {
cmd, err := resolveAMDSMICmd(flag, "--csv")
if err != nil {
return nil, err
}
out, err := amdSMIExecCommand(cmd[0], cmd[1:]...).CombinedOutput()
if err != nil {
return nil, err
}
return parseROCmSingleValueCSV(string(out)), nil
}
func queryAMDNumericField(flag string) (map[string]float64, error) {
values, err := queryAMDField(flag)
if err != nil {
return nil, err
}
out := map[string]float64{}
for card, raw := range values {
if value, ok := firstFloat(raw); ok {
out[card] = value
}
}
return out, nil
}
func resolveAMDSMICmd(args ...string) ([]string, error) {
if path, err := amdSMILookPath("rocm-smi"); err == nil {
return append([]string{path}, args...), nil
}
for _, pattern := range amdSMIExecutableGlobs {
matches, err := amdSMIGlob(pattern)
if err != nil {
continue
}
sort.Strings(matches)
for _, match := range matches {
return append([]string{match}, args...), nil
}
}
return nil, exec.ErrNotFound
}
func parseROCmSingleValueCSV(raw string) map[string]string {
rows := map[string]string{}
reader := csv.NewReader(strings.NewReader(raw))
reader.FieldsPerRecord = -1
records, err := reader.ReadAll()
if err != nil {
return rows
}
for _, rec := range records {
if len(rec) < 2 {
continue
}
card := normalizeROCmCardKey(rec[0])
if card == "" {
continue
}
value := strings.TrimSpace(strings.Join(rec[1:], ","))
if value == "" || looksLikeCSVHeaderValue(value) {
continue
}
rows[card] = value
}
return rows
}
func normalizeROCmCardKey(raw string) string {
raw = strings.ToLower(strings.TrimSpace(raw))
raw = strings.Trim(raw, "\"")
if raw == "" {
return ""
}
if raw == "device" || raw == "gpu" || raw == "card" {
return ""
}
if strings.HasPrefix(raw, "card") {
return raw
}
if _, err := strconv.Atoi(raw); err == nil {
return "card" + raw
}
return ""
}
func looksLikeCSVHeaderValue(value string) bool {
value = strings.ToLower(strings.TrimSpace(value))
return strings.Contains(value, "product") ||
strings.Contains(value, "serial") ||
strings.Contains(value, "vbios") ||
strings.Contains(value, "bus")
}

View File

@@ -0,0 +1,56 @@
package collector
import (
"os/exec"
"testing"
)
func TestParseROCmSingleValueCSV(t *testing.T) {
raw := "device,Serial Number\ncard0,ABC123\ncard1,XYZ789\n"
got := parseROCmSingleValueCSV(raw)
if got["card0"] != "ABC123" {
t.Fatalf("card0=%q want ABC123", got["card0"])
}
if got["card1"] != "XYZ789" {
t.Fatalf("card1=%q want XYZ789", got["card1"])
}
}
func TestQueryAMDNumericFieldParsesUnits(t *testing.T) {
origExec := amdSMIExecCommand
origLookPath := amdSMILookPath
t.Cleanup(func() {
amdSMIExecCommand = origExec
amdSMILookPath = origLookPath
})
amdSMILookPath = func(string) (string, error) { return "/usr/bin/rocm-smi", nil }
amdSMIExecCommand = func(name string, args ...string) *exec.Cmd {
return exec.Command("sh", "-c", "printf 'device,Temperature\\ncard0,45.5c\\ncard1,67.0c\\n'")
}
got, err := queryAMDNumericField("--showtemp")
if err != nil {
t.Fatalf("queryAMDNumericField: %v", err)
}
if got["card0"] != 45.5 {
t.Fatalf("card0=%v want 45.5", got["card0"])
}
if got["card1"] != 67.0 {
t.Fatalf("card1=%v want 67.0", got["card1"])
}
}
func TestNormalizeROCmCardKey(t *testing.T) {
tests := map[string]string{
"0": "card0",
"card1": "card1",
"Device": "",
"": "",
}
for input, want := range tests {
if got := normalizeROCmCardKey(input); got != want {
t.Fatalf("normalizeROCmCardKey(%q)=%q want %q", input, got, want)
}
}
}

View File

@@ -36,6 +36,8 @@ func Run(_ runtimeenv.Mode) schema.HardwareIngestRequest {
snap.Memory = enrichMemoryWithTelemetry(snap.Memory, sensorDoc)
snap.Storage = collectStorage()
snap.PCIeDevices = collectPCIe()
snap.PCIeDevices = enrichPCIeWithAMD(snap.PCIeDevices)
snap.PCIeDevices = enrichPCIeWithPCISerials(snap.PCIeDevices)
snap.PCIeDevices = enrichPCIeWithNVIDIA(snap.PCIeDevices)
snap.PCIeDevices = enrichPCIeWithMellanox(snap.PCIeDevices)
snap.PCIeDevices = enrichPCIeWithNICTelemetry(snap.PCIeDevices)

View File

@@ -41,7 +41,18 @@ func filterStorage(disks []schema.HardwareStorage) []schema.HardwareStorage {
func filterPSUs(psus []schema.HardwarePowerSupply) []schema.HardwarePowerSupply {
out := make([]schema.HardwarePowerSupply, 0, len(psus))
for _, psu := range psus {
if psu.SerialNumber == nil || *psu.SerialNumber == "" {
hasIdentity := false
switch {
case psu.SerialNumber != nil && *psu.SerialNumber != "":
hasIdentity = true
case psu.Slot != nil && *psu.Slot != "":
hasIdentity = true
case psu.Model != nil && *psu.Model != "":
hasIdentity = true
case psu.Vendor != nil && *psu.Vendor != "":
hasIdentity = true
}
if !hasIdentity {
continue
}
out = append(out, psu)

View File

@@ -61,3 +61,20 @@ func TestFinalizeSnapshotPreservesDuplicateSerials(t *testing.T) {
t.Fatalf("duplicate serial should stay unchanged: %q", got)
}
}
func TestFilterPSUsKeepsSlotOnlyEntries(t *testing.T) {
slot := "0"
status := statusOK
got := filterPSUs([]schema.HardwarePowerSupply{
{Slot: &slot, HardwareComponentStatus: schema.HardwareComponentStatus{Status: &status}},
{HardwareComponentStatus: schema.HardwareComponentStatus{Status: &status}},
})
if len(got) != 1 {
t.Fatalf("len(got)=%d want 1", len(got))
}
if got[0].Slot == nil || *got[0].Slot != "0" {
t.Fatalf("unexpected kept PSU: %+v", got[0])
}
}

View File

@@ -44,6 +44,11 @@ func enrichPCIeWithNICTelemetry(devs []schema.HardwarePCIeDevice) []schema.Hardw
}
iface := ifaces[0]
devs[i].MacAddresses = collectInterfaceMACs(ifaces)
if devs[i].SerialNumber == nil {
if serial := queryPCIDeviceSerial(bdf); serial != "" {
devs[i].SerialNumber = &serial
}
}
if devs[i].Firmware == nil {
if out, err := ethtoolInfoQuery(iface); err == nil {

View File

@@ -1,6 +1,10 @@
package collector
import "testing"
import (
"bee/audit/internal/schema"
"fmt"
"testing"
)
func TestParseSFPDOM(t *testing.T) {
raw := `
@@ -29,6 +33,74 @@ func TestParseSFPDOM(t *testing.T) {
}
}
func TestParseLSPCIDetailSerial(t *testing.T) {
raw := `
05:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Serial number: NIC-SN-12345
`
if got := parseLSPCIDetailSerial(raw); got != "NIC-SN-12345" {
t.Fatalf("serial=%q want %q", got, "NIC-SN-12345")
}
}
func TestParsePCIVPDSerial(t *testing.T) {
raw := []byte{0x82, 0x05, 0x00, 'M', 'L', 'X', '5', 0x90, 0x08, 0x00, 'S', 'N', 0x08, 'M', 'T', '1', '2', '3', '4', '5', '6'}
if got := parsePCIVPDSerial(raw); got != "MT123456" {
t.Fatalf("serial=%q want %q", got, "MT123456")
}
}
func TestEnrichPCIeWithNICTelemetryAddsSerialFallback(t *testing.T) {
origDetail := queryPCILSPCIDetail
origVPD := readPCIVPDFile
origIfaces := netIfacesByBDF
origReadMAC := readNetAddressFile
origEth := ethtoolInfoQuery
origModule := ethtoolModuleQuery
t.Cleanup(func() {
queryPCILSPCIDetail = origDetail
readPCIVPDFile = origVPD
netIfacesByBDF = origIfaces
readNetAddressFile = origReadMAC
ethtoolInfoQuery = origEth
ethtoolModuleQuery = origModule
})
queryPCILSPCIDetail = func(bdf string) (string, error) {
if bdf != "0000:18:00.0" {
t.Fatalf("unexpected bdf: %s", bdf)
}
return "Serial number: NIC-SN-98765\n", nil
}
readPCIVPDFile = func(string) ([]byte, error) {
return nil, fmt.Errorf("no vpd needed")
}
netIfacesByBDF = func(string) []string { return []string{"eth0"} }
readNetAddressFile = func(iface string) (string, error) {
if iface != "eth0" {
t.Fatalf("unexpected iface: %s", iface)
}
return "aa:bb:cc:dd:ee:ff", nil
}
ethtoolInfoQuery = func(string) (string, error) { return "", fmt.Errorf("skip firmware") }
ethtoolModuleQuery = func(string) (string, error) { return "", fmt.Errorf("skip optics") }
class := "EthernetController"
bdf := "0000:18:00.0"
devs := []schema.HardwarePCIeDevice{{
DeviceClass: &class,
BDF: &bdf,
}}
out := enrichPCIeWithNICTelemetry(devs)
if out[0].SerialNumber == nil || *out[0].SerialNumber != "NIC-SN-98765" {
t.Fatalf("serial=%v want NIC-SN-98765", out[0].SerialNumber)
}
if len(out[0].MacAddresses) != 1 || out[0].MacAddresses[0] != "aa:bb:cc:dd:ee:ff" {
t.Fatalf("mac_addresses=%v", out[0].MacAddresses)
}
}
func TestDBMValue(t *testing.T) {
tests := []struct {
in string

View File

@@ -37,7 +37,7 @@ func parseLspci(output string) []schema.HardwarePCIeDevice {
val := strings.TrimSpace(line[idx+2:])
fields[key] = val
}
if !shouldIncludePCIeDevice(fields["Class"]) {
if !shouldIncludePCIeDevice(fields["Class"], fields["Vendor"], fields["Device"]) {
continue
}
dev := parseLspciDevice(fields)
@@ -46,8 +46,10 @@ func parseLspci(output string) []schema.HardwarePCIeDevice {
return devs
}
func shouldIncludePCIeDevice(class string) bool {
func shouldIncludePCIeDevice(class, vendor, device string) bool {
c := strings.ToLower(strings.TrimSpace(class))
v := strings.ToLower(strings.TrimSpace(vendor))
d := strings.ToLower(strings.TrimSpace(device))
if c == "" {
return true
}
@@ -68,12 +70,28 @@ func shouldIncludePCIeDevice(class string) bool {
"audio device",
"serial bus controller",
"unassigned class",
"non-essential instrumentation",
}
for _, bad := range excluded {
if strings.Contains(c, bad) {
return false
}
}
if strings.Contains(v, "advanced micro devices") || strings.Contains(v, "[amd]") {
internalAMDPatterns := []string{
"dummy function",
"reserved spp",
"ptdma",
"cryptographic coprocessor pspcpp",
"pspcpp",
}
for _, bad := range internalAMDPatterns {
if strings.Contains(d, bad) {
return false
}
}
}
return true
}
@@ -98,6 +116,8 @@ func parseLspciDevice(fields map[string]string) schema.HardwarePCIeDevice {
}
if numaNode, ok := readPCINumaNode(bdf); ok {
dev.NUMANode = &numaNode
} else if numaNode, ok := parsePCINumaNode(fields["NUMANode"]); ok {
dev.NUMANode = &numaNode
}
if width, ok := readPCIIntAttribute(bdf, "current_link_width"); ok {
dev.LinkWidth = &width
@@ -165,6 +185,18 @@ func readPCINumaNode(bdf string) (int, bool) {
return value, true
}
func parsePCINumaNode(raw string) (int, bool) {
raw = strings.TrimSpace(raw)
if raw == "" {
return 0, false
}
value, err := strconv.Atoi(raw)
if err != nil || value < 0 {
return 0, false
}
return value, true
}
func readPCIIntAttribute(bdf, attribute string) (int, bool) {
out, err := exec.Command("cat", "/sys/bus/pci/devices/"+bdf+"/"+attribute).Output()
if err != nil {

View File

@@ -8,32 +8,42 @@ import (
func TestShouldIncludePCIeDevice(t *testing.T) {
tests := []struct {
class string
want bool
name string
class string
vendor string
device string
want bool
}{
{"USB controller", false},
{"System peripheral", false},
{"Audio device", false},
{"Host bridge", false},
{"PCI bridge", false},
{"SMBus", false},
{"Performance counters", false},
{"Ethernet controller", true},
{"RAID bus controller", true},
{"Non-Volatile memory controller", true},
{"VGA compatible controller", true},
{name: "usb", class: "USB controller", want: false},
{name: "system peripheral", class: "System peripheral", want: false},
{name: "audio", class: "Audio device", want: false},
{name: "host bridge", class: "Host bridge", want: false},
{name: "pci bridge", class: "PCI bridge", want: false},
{name: "smbus", class: "SMBus", want: false},
{name: "perf", class: "Performance counters", want: false},
{name: "non essential instrumentation", class: "Non-Essential Instrumentation", want: false},
{name: "amd dummy function", class: "Encryption controller", vendor: "Advanced Micro Devices, Inc. [AMD]", device: "Starship/Matisse PTDMA", want: false},
{name: "amd pspcpp", class: "Encryption controller", vendor: "Advanced Micro Devices, Inc. [AMD]", device: "Starship/Matisse Cryptographic Coprocessor PSPCPP", want: false},
{name: "ethernet", class: "Ethernet controller", want: true},
{name: "raid", class: "RAID bus controller", want: true},
{name: "nvme", class: "Non-Volatile memory controller", want: true},
{name: "vga", class: "VGA compatible controller", want: true},
{name: "other encryption controller", class: "Encryption controller", vendor: "Intel Corporation", device: "QuickAssist", want: true},
}
for _, tt := range tests {
got := shouldIncludePCIeDevice(tt.class)
if got != tt.want {
t.Fatalf("class %q include=%v want %v", tt.class, got, tt.want)
}
t.Run(tt.name, func(t *testing.T) {
got := shouldIncludePCIeDevice(tt.class, tt.vendor, tt.device)
if got != tt.want {
t.Fatalf("class=%q vendor=%q device=%q include=%v want %v", tt.class, tt.vendor, tt.device, got, tt.want)
}
})
}
}
func TestParseLspci_filtersExcludedClasses(t *testing.T) {
input := "Slot:\t0000:00:14.0\nClass:\tUSB controller\nVendor:\tIntel Corporation\nDevice:\tUSB 3.0\n\n" +
"Slot:\t0000:00:18.0\nClass:\tNon-Essential Instrumentation\nVendor:\tAdvanced Micro Devices, Inc. [AMD]\nDevice:\tStarship/Matisse PCIe Dummy Function\n\n" +
"Slot:\t0000:65:00.0\nClass:\tVGA compatible controller\nVendor:\tNVIDIA Corporation\nDevice:\tH100\n\n"
devs := parseLspci(input)
@@ -51,6 +61,21 @@ func TestParseLspci_filtersExcludedClasses(t *testing.T) {
}
}
func TestParseLspci_filtersAMDChipsetNoise(t *testing.T) {
input := "" +
"Slot:\t0000:1a:00.0\nClass:\tNon-Essential Instrumentation\nVendor:\tAdvanced Micro Devices, Inc. [AMD]\nDevice:\tStarship/Matisse PCIe Dummy Function\n\n" +
"Slot:\t0000:1a:00.2\nClass:\tEncryption controller\nVendor:\tAdvanced Micro Devices, Inc. [AMD]\nDevice:\tStarship/Matisse PTDMA\n\n" +
"Slot:\t0000:05:00.0\nClass:\tEthernet controller\nVendor:\tMellanox Technologies\nDevice:\tMT28908 Family [ConnectX-6]\n\n"
devs := parseLspci(input)
if len(devs) != 1 {
t.Fatalf("expected 1 remaining device, got %d", len(devs))
}
if devs[0].Model == nil || *devs[0].Model != "MT28908 Family [ConnectX-6]" {
t.Fatalf("unexpected remaining device: %+v", devs[0])
}
}
func TestPCIeJSONUsesSlotNotBDF(t *testing.T) {
input := "Slot:\t0000:65:00.0\nClass:\tVGA compatible controller\nVendor:\tNVIDIA Corporation\nDevice:\tH100\n\n"
@@ -68,6 +93,18 @@ func TestPCIeJSONUsesSlotNotBDF(t *testing.T) {
}
}
func TestParseLspciUsesNUMANodeFieldWhenSysfsUnavailable(t *testing.T) {
input := "Slot:\t0000:65:00.0\nClass:\tEthernet controller\nVendor:\tIntel Corporation\nDevice:\tX710\nNUMANode:\t1\n\n"
devs := parseLspci(input)
if len(devs) != 1 {
t.Fatalf("expected 1 device, got %d", len(devs))
}
if devs[0].NUMANode == nil || *devs[0].NUMANode != 1 {
t.Fatalf("numa_node=%v want 1", devs[0].NUMANode)
}
}
func TestNormalizePCILinkSpeed(t *testing.T) {
tests := []struct {
raw string

View File

@@ -0,0 +1,123 @@
package collector
import (
"bee/audit/internal/schema"
"log/slog"
"os"
"os/exec"
"path/filepath"
"strings"
)
var (
queryPCILSPCIDetail = func(bdf string) (string, error) {
out, err := exec.Command("lspci", "-vv", "-s", bdf).Output()
if err != nil {
return "", err
}
return string(out), nil
}
readPCIVPDFile = func(bdf string) ([]byte, error) {
return os.ReadFile(filepath.Join("/sys/bus/pci/devices", bdf, "vpd"))
}
)
func enrichPCIeWithPCISerials(devs []schema.HardwarePCIeDevice) []schema.HardwarePCIeDevice {
enriched := 0
for i := range devs {
if !shouldProbePCIeSerial(devs[i]) {
continue
}
bdf := normalizePCIeBDF(*devs[i].BDF)
if bdf == "" {
continue
}
if serial := queryPCIDeviceSerial(bdf); serial != "" {
devs[i].SerialNumber = &serial
enriched++
}
}
if enriched > 0 {
slog.Info("pcie: serials enriched", "count", enriched)
}
return devs
}
func shouldProbePCIeSerial(dev schema.HardwarePCIeDevice) bool {
if dev.BDF == nil || dev.SerialNumber != nil {
return false
}
if dev.DeviceClass == nil {
return false
}
class := strings.TrimSpace(*dev.DeviceClass)
return isNICClass(class) || isGPUClass(class)
}
func queryPCIDeviceSerial(bdf string) string {
if out, err := queryPCILSPCIDetail(bdf); err == nil {
if serial := parseLSPCIDetailSerial(out); serial != "" {
return serial
}
}
if raw, err := readPCIVPDFile(bdf); err == nil {
return parsePCIVPDSerial(raw)
}
return ""
}
func parseLSPCIDetailSerial(raw string) string {
for _, line := range strings.Split(raw, "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
lower := strings.ToLower(line)
if !strings.Contains(lower, "serial number:") {
continue
}
idx := strings.Index(line, ":")
if idx < 0 {
continue
}
if serial := strings.TrimSpace(line[idx+1:]); serial != "" {
return serial
}
}
return ""
}
func parsePCIVPDSerial(raw []byte) string {
for i := 0; i+3 < len(raw); i++ {
if raw[i] != 'S' || raw[i+1] != 'N' {
continue
}
length := int(raw[i+2])
if length <= 0 || length > 64 || i+3+length > len(raw) {
continue
}
value := strings.TrimSpace(strings.Trim(string(raw[i+3:i+3+length]), "\x00"))
if !looksLikeSerial(value) {
continue
}
return value
}
return ""
}
func looksLikeSerial(value string) bool {
if len(value) < 4 {
return false
}
hasAlphaNum := false
for _, r := range value {
switch {
case r >= 'a' && r <= 'z', r >= 'A' && r <= 'Z', r >= '0' && r <= '9':
hasAlphaNum = true
case strings.ContainsRune(" -_./:", r):
default:
return false
}
}
return hasAlphaNum
}

View File

@@ -0,0 +1,47 @@
package collector
import (
"bee/audit/internal/schema"
"fmt"
"testing"
)
func TestEnrichPCIeWithPCISerialsAddsGPUFallback(t *testing.T) {
origDetail := queryPCILSPCIDetail
origVPD := readPCIVPDFile
t.Cleanup(func() {
queryPCILSPCIDetail = origDetail
readPCIVPDFile = origVPD
})
queryPCILSPCIDetail = func(bdf string) (string, error) {
if bdf != "0000:11:00.0" {
t.Fatalf("unexpected bdf: %s", bdf)
}
return "Serial number: GPU-SN-12345\n", nil
}
readPCIVPDFile = func(string) ([]byte, error) {
return nil, fmt.Errorf("no vpd needed")
}
class := "DisplayController"
bdf := "0000:11:00.0"
devs := []schema.HardwarePCIeDevice{{
DeviceClass: &class,
BDF: &bdf,
}}
out := enrichPCIeWithPCISerials(devs)
if out[0].SerialNumber == nil || *out[0].SerialNumber != "GPU-SN-12345" {
t.Fatalf("serial=%v want GPU-SN-12345", out[0].SerialNumber)
}
}
func TestShouldProbePCIeSerialSkipsNonGPUOrNIC(t *testing.T) {
class := "StorageController"
bdf := "0000:19:00.0"
dev := schema.HardwarePCIeDevice{DeviceClass: &class, BDF: &bdf}
if shouldProbePCIeSerial(dev) {
t.Fatal("unexpected probe for storage controller")
}
}

View File

@@ -190,6 +190,7 @@ type smartctlInfo struct {
func enrichWithSmartctl(dev lsblkDevice) schema.HardwareStorage {
present := true
s := schema.HardwareStorage{Present: &present}
s.Telemetry = map[string]any{"linux_device": "/dev/" + dev.Name}
tran := strings.ToLower(dev.Tran)
devPath := "/dev/" + dev.Name
@@ -348,6 +349,7 @@ func enrichWithNVMe(dev lsblkDevice) schema.HardwareStorage {
Present: &present,
Type: &devType,
Interface: &iface,
Telemetry: map[string]any{"linux_device": "/dev/" + dev.Name},
}
devPath := "/dev/" + dev.Name

View File

@@ -9,8 +9,50 @@ import (
"strings"
)
var exportExecCommand = exec.Command
func formatMountTargetError(target RemovableTarget, raw string, err error) error {
msg := strings.TrimSpace(raw)
fstype := strings.ToLower(strings.TrimSpace(target.FSType))
if fstype == "exfat" && strings.Contains(strings.ToLower(msg), "unknown filesystem type 'exfat'") {
return fmt.Errorf("mount %s: exFAT support is missing in this ISO build: %w", target.Device, err)
}
if msg == "" {
return err
}
return fmt.Errorf("%s: %w", msg, err)
}
func removableTargetReadOnly(fields map[string]string) bool {
if fields["RO"] == "1" {
return true
}
switch strings.ToLower(strings.TrimSpace(fields["FSTYPE"])) {
case "iso9660", "squashfs":
return true
default:
return false
}
}
func ensureWritableMountpoint(mountpoint string) error {
probe, err := os.CreateTemp(mountpoint, ".bee-write-test-*")
if err != nil {
return fmt.Errorf("target filesystem is not writable: %w", err)
}
name := probe.Name()
if closeErr := probe.Close(); closeErr != nil {
_ = os.Remove(name)
return closeErr
}
if err := os.Remove(name); err != nil {
return err
}
return nil
}
func (s *System) ListRemovableTargets() ([]RemovableTarget, error) {
raw, err := exec.Command("lsblk", "-P", "-o", "NAME,TYPE,PKNAME,RM,FSTYPE,MOUNTPOINT,SIZE,LABEL,MODEL").Output()
raw, err := exportExecCommand("lsblk", "-P", "-o", "NAME,TYPE,PKNAME,RM,RO,FSTYPE,MOUNTPOINT,SIZE,LABEL,MODEL").Output()
if err != nil {
return nil, err
}
@@ -34,7 +76,7 @@ func (s *System) ListRemovableTargets() ([]RemovableTarget, error) {
}
}
}
if !removable || fields["FSTYPE"] == "" {
if !removable || fields["FSTYPE"] == "" || removableTargetReadOnly(fields) {
continue
}
@@ -52,7 +94,7 @@ func (s *System) ListRemovableTargets() ([]RemovableTarget, error) {
return out, nil
}
func (s *System) ExportFileToTarget(src string, target RemovableTarget) (string, error) {
func (s *System) ExportFileToTarget(src string, target RemovableTarget) (dst string, retErr error) {
if src == "" || target.Device == "" {
return "", fmt.Errorf("source and target are required")
}
@@ -62,20 +104,43 @@ func (s *System) ExportFileToTarget(src string, target RemovableTarget) (string,
mountpoint := strings.TrimSpace(target.Mountpoint)
mountedHere := false
mounted := mountpoint != ""
if mountpoint == "" {
mountpoint = filepath.Join("/tmp", "bee-export-"+filepath.Base(target.Device))
if err := os.MkdirAll(mountpoint, 0755); err != nil {
return "", err
}
if raw, err := exec.Command("mount", target.Device, mountpoint).CombinedOutput(); err != nil {
if raw, err := exportExecCommand("mount", target.Device, mountpoint).CombinedOutput(); err != nil {
_ = os.Remove(mountpoint)
return string(raw), err
return "", formatMountTargetError(target, string(raw), err)
}
mountedHere = true
mounted = true
}
defer func() {
if !mounted {
return
}
_ = exportExecCommand("sync").Run()
if raw, err := exportExecCommand("umount", mountpoint).CombinedOutput(); err != nil && retErr == nil {
msg := strings.TrimSpace(string(raw))
if msg == "" {
retErr = err
} else {
retErr = fmt.Errorf("%s: %w", msg, err)
}
}
if mountedHere {
_ = os.Remove(mountpoint)
}
}()
if err := ensureWritableMountpoint(mountpoint); err != nil {
return "", err
}
filename := filepath.Base(src)
dst := filepath.Join(mountpoint, filename)
dst = filepath.Join(mountpoint, filename)
data, err := os.ReadFile(src)
if err != nil {
return "", err
@@ -83,12 +148,6 @@ func (s *System) ExportFileToTarget(src string, target RemovableTarget) (string,
if err := os.WriteFile(dst, data, 0644); err != nil {
return "", err
}
_ = exec.Command("sync").Run()
if mountedHere {
_ = exec.Command("umount", mountpoint).Run()
_ = os.Remove(mountpoint)
}
return dst, nil
}

View File

@@ -0,0 +1,112 @@
package platform
import (
"os"
"os/exec"
"path/filepath"
"strings"
"testing"
)
func TestExportFileToTargetUnmountsExistingMountpoint(t *testing.T) {
tmp := t.TempDir()
src := filepath.Join(tmp, "bundle.tar.gz")
mountpoint := filepath.Join(tmp, "mnt")
if err := os.MkdirAll(mountpoint, 0755); err != nil {
t.Fatalf("mkdir mountpoint: %v", err)
}
if err := os.WriteFile(src, []byte("bundle"), 0644); err != nil {
t.Fatalf("write src: %v", err)
}
var calls [][]string
oldExec := exportExecCommand
exportExecCommand = func(name string, args ...string) *exec.Cmd {
calls = append(calls, append([]string{name}, args...))
return exec.Command("sh", "-c", "exit 0")
}
t.Cleanup(func() { exportExecCommand = oldExec })
s := &System{}
dst, err := s.ExportFileToTarget(src, RemovableTarget{
Device: "/dev/sdb1",
Mountpoint: mountpoint,
})
if err != nil {
t.Fatalf("ExportFileToTarget error: %v", err)
}
if got, want := dst, filepath.Join(mountpoint, "bundle.tar.gz"); got != want {
t.Fatalf("dst=%q want %q", got, want)
}
if _, err := os.Stat(filepath.Join(mountpoint, "bundle.tar.gz")); err != nil {
t.Fatalf("exported file missing: %v", err)
}
foundUmount := false
for _, call := range calls {
if len(call) == 2 && call[0] == "umount" && call[1] == mountpoint {
foundUmount = true
break
}
}
if !foundUmount {
t.Fatalf("expected umount %q call, got %#v", mountpoint, calls)
}
}
func TestExportFileToTargetRejectsNonWritableMountpoint(t *testing.T) {
tmp := t.TempDir()
src := filepath.Join(tmp, "bundle.tar.gz")
mountpoint := filepath.Join(tmp, "mnt")
if err := os.MkdirAll(mountpoint, 0755); err != nil {
t.Fatalf("mkdir mountpoint: %v", err)
}
if err := os.WriteFile(src, []byte("bundle"), 0644); err != nil {
t.Fatalf("write src: %v", err)
}
if err := os.Chmod(mountpoint, 0555); err != nil {
t.Fatalf("chmod mountpoint: %v", err)
}
oldExec := exportExecCommand
exportExecCommand = func(name string, args ...string) *exec.Cmd {
return exec.Command("sh", "-c", "exit 0")
}
t.Cleanup(func() { exportExecCommand = oldExec })
s := &System{}
_, err := s.ExportFileToTarget(src, RemovableTarget{
Device: "/dev/sdb1",
Mountpoint: mountpoint,
})
if err == nil {
t.Fatal("expected error for non-writable mountpoint")
}
if !strings.Contains(err.Error(), "target filesystem is not writable") {
t.Fatalf("err=%q want writable message", err)
}
}
func TestListRemovableTargetsSkipsReadOnlyMedia(t *testing.T) {
oldExec := exportExecCommand
lsblkOut := `NAME="sda1" TYPE="part" PKNAME="sda" RM="1" RO="1" FSTYPE="iso9660" MOUNTPOINT="/run/live/medium" SIZE="3.7G" LABEL="BEE" MODEL=""
NAME="sdb1" TYPE="part" PKNAME="sdb" RM="1" RO="0" FSTYPE="vfat" MOUNTPOINT="/media/bee/USB" SIZE="29.8G" LABEL="USB" MODEL=""`
exportExecCommand = func(name string, args ...string) *exec.Cmd {
cmd := exec.Command("sh", "-c", "printf '%s\n' \"$LSBLK_OUT\"")
cmd.Env = append(os.Environ(), "LSBLK_OUT="+lsblkOut)
return cmd
}
t.Cleanup(func() { exportExecCommand = oldExec })
s := &System{}
targets, err := s.ListRemovableTargets()
if err != nil {
t.Fatalf("ListRemovableTargets error: %v", err)
}
if len(targets) != 1 {
t.Fatalf("len(targets)=%d want 1 (%+v)", len(targets), targets)
}
if got := targets[0].Device; got != "/dev/sdb1" {
t.Fatalf("device=%q want /dev/sdb1", got)
}
}

View File

@@ -13,18 +13,19 @@ import (
// GPUMetricRow is one telemetry sample from nvidia-smi during a stress test.
type GPUMetricRow struct {
ElapsedSec float64
GPUIndex int
TempC float64
UsagePct float64
PowerW float64
ClockMHz float64
ElapsedSec float64 `json:"elapsed_sec"`
GPUIndex int `json:"index"`
TempC float64 `json:"temp_c"`
UsagePct float64 `json:"usage_pct"`
MemUsagePct float64 `json:"mem_usage_pct"`
PowerW float64 `json:"power_w"`
ClockMHz float64 `json:"clock_mhz"`
}
// sampleGPUMetrics runs nvidia-smi once and returns current metrics for each GPU.
func sampleGPUMetrics(gpuIndices []int) ([]GPUMetricRow, error) {
args := []string{
"--query-gpu=index,temperature.gpu,utilization.gpu,power.draw,clocks.current.graphics",
"--query-gpu=index,temperature.gpu,utilization.gpu,utilization.memory,power.draw,clocks.current.graphics",
"--format=csv,noheader,nounits",
}
if len(gpuIndices) > 0 {
@@ -45,16 +46,17 @@ func sampleGPUMetrics(gpuIndices []int) ([]GPUMetricRow, error) {
continue
}
parts := strings.Split(line, ", ")
if len(parts) < 5 {
if len(parts) < 6 {
continue
}
idx, _ := strconv.Atoi(strings.TrimSpace(parts[0]))
rows = append(rows, GPUMetricRow{
GPUIndex: idx,
TempC: parseGPUFloat(parts[1]),
UsagePct: parseGPUFloat(parts[2]),
PowerW: parseGPUFloat(parts[3]),
ClockMHz: parseGPUFloat(parts[4]),
GPUIndex: idx,
TempC: parseGPUFloat(parts[1]),
UsagePct: parseGPUFloat(parts[2]),
MemUsagePct: parseGPUFloat(parts[3]),
PowerW: parseGPUFloat(parts[4]),
ClockMHz: parseGPUFloat(parts[5]),
})
}
return rows, nil
@@ -69,6 +71,11 @@ func parseGPUFloat(s string) float64 {
return v
}
// SampleGPUMetrics runs nvidia-smi once and returns current metrics for each GPU.
func SampleGPUMetrics(gpuIndices []int) ([]GPUMetricRow, error) {
return sampleGPUMetrics(gpuIndices)
}
// WriteGPUMetricsCSV writes collected rows as a CSV file.
func WriteGPUMetricsCSV(path string, rows []GPUMetricRow) error {
var b bytes.Buffer
@@ -327,7 +334,7 @@ const (
)
// RenderGPUTerminalChart returns ANSI line charts (asciigraph-style) per GPU.
// Suitable for display in the TUI screenOutput.
// Used in SAT stress-test logs.
func RenderGPUTerminalChart(rows []GPUMetricRow) string {
seen := make(map[int]bool)
var order []int

View File

@@ -0,0 +1,214 @@
package platform
import (
"context"
"fmt"
"os"
"os/exec"
"strconv"
"strings"
)
// InstallDisk describes a candidate disk for installation.
type InstallDisk struct {
Device string // e.g. /dev/sda
Model string
Size string // human-readable, e.g. "500G"
SizeBytes int64 // raw byte count from lsblk
MountedParts []string // partition mount points currently active
}
const squashfsPath = "/run/live/medium/live/filesystem.squashfs"
// ListInstallDisks returns block devices suitable for installation.
// Excludes the current live boot medium but includes USB drives.
func (s *System) ListInstallDisks() ([]InstallDisk, error) {
out, err := exec.Command("lsblk", "-dn", "-o", "NAME,MODEL,SIZE,TYPE,TRAN").Output()
if err != nil {
return nil, fmt.Errorf("lsblk: %w", err)
}
bootDev := findLiveBootDevice()
var disks []InstallDisk
for _, line := range strings.Split(strings.TrimSpace(string(out)), "\n") {
fields := strings.Fields(line)
// NAME MODEL SIZE TYPE TRAN — model may have spaces so we parse from end
if len(fields) < 4 {
continue
}
// Last field: TRAN, second-to-last: TYPE, third-to-last: SIZE
typ := fields[len(fields)-2]
size := fields[len(fields)-3]
name := fields[0]
model := strings.Join(fields[1:len(fields)-3], " ")
if typ != "disk" {
continue
}
device := "/dev/" + name
if device == bootDev {
continue
}
sizeBytes := diskSizeBytes(device)
mounted := mountedParts(device)
disks = append(disks, InstallDisk{
Device: device,
Model: strings.TrimSpace(model),
Size: size,
SizeBytes: sizeBytes,
MountedParts: mounted,
})
}
return disks, nil
}
// diskSizeBytes returns the byte size of a block device using lsblk.
func diskSizeBytes(device string) int64 {
out, err := exec.Command("lsblk", "-bdn", "-o", "SIZE", device).Output()
if err != nil {
return 0
}
n, _ := strconv.ParseInt(strings.TrimSpace(string(out)), 10, 64)
return n
}
// mountedParts returns a list of "<part> at <mountpoint>" strings for any
// mounted partitions on the given device.
func mountedParts(device string) []string {
out, err := exec.Command("lsblk", "-n", "-o", "NAME,MOUNTPOINT", device).Output()
if err != nil {
return nil
}
var result []string
for _, line := range strings.Split(strings.TrimSpace(string(out)), "\n") {
fields := strings.Fields(line)
if len(fields) < 2 {
continue
}
mp := fields[1]
if mp == "" || mp == "[SWAP]" {
continue
}
result = append(result, "/dev/"+strings.TrimLeft(fields[0], "└─├─")+" at "+mp)
}
return result
}
// findLiveBootDevice returns the block device backing /run/live/medium (if any).
func findLiveBootDevice() string {
out, err := exec.Command("findmnt", "-n", "-o", "SOURCE", "/run/live/medium").Output()
if err != nil {
return ""
}
src := strings.TrimSpace(string(out))
if src == "" {
return ""
}
// Strip partition suffix to get the whole disk device.
// e.g. /dev/sdb1 → /dev/sdb, /dev/nvme0n1p1 → /dev/nvme0n1
out2, err := exec.Command("lsblk", "-no", "PKNAME", src).Output()
if err != nil || strings.TrimSpace(string(out2)) == "" {
return src
}
return "/dev/" + strings.TrimSpace(string(out2))
}
// MinInstallBytes returns the minimum recommended disk size for installation:
// squashfs size × 1.5 to allow for extracted filesystem and bootloader.
// Returns 0 if the squashfs is not available (non-live environment).
func MinInstallBytes() int64 {
fi, err := os.Stat(squashfsPath)
if err != nil {
return 0
}
return fi.Size() * 3 / 2
}
// toramActive returns true when the live system was booted with toram.
func toramActive() bool {
data, err := os.ReadFile("/proc/cmdline")
if err != nil {
return false
}
return strings.Contains(string(data), "toram")
}
// freeMemBytes returns MemAvailable from /proc/meminfo.
func freeMemBytes() int64 {
data, err := os.ReadFile("/proc/meminfo")
if err != nil {
return 0
}
for _, line := range strings.Split(string(data), "\n") {
if strings.HasPrefix(line, "MemAvailable:") {
fields := strings.Fields(line)
if len(fields) >= 2 {
n, _ := strconv.ParseInt(fields[1], 10, 64)
return n * 1024 // kB → bytes
}
}
}
return 0
}
// DiskWarnings returns advisory warning strings for a disk candidate.
func DiskWarnings(d InstallDisk) []string {
var w []string
if len(d.MountedParts) > 0 {
w = append(w, "has mounted partitions: "+strings.Join(d.MountedParts, ", "))
}
min := MinInstallBytes()
if min > 0 && d.SizeBytes > 0 && d.SizeBytes < min {
w = append(w, fmt.Sprintf("disk may be too small (need ≥ %s, have %s)",
humanBytes(min), humanBytes(d.SizeBytes)))
}
if toramActive() {
sqFi, err := os.Stat(squashfsPath)
if err == nil {
free := freeMemBytes()
if free > 0 && free < sqFi.Size()*2 {
w = append(w, "toram mode — low RAM, extraction may be slow or fail")
}
}
}
return w
}
func humanBytes(b int64) string {
const unit = 1024
if b < unit {
return fmt.Sprintf("%d B", b)
}
div, exp := int64(unit), 0
for n := b / unit; n >= unit; n /= unit {
div *= unit
exp++
}
return fmt.Sprintf("%.1f %cB", float64(b)/float64(div), "KMGTPE"[exp])
}
// InstallToDisk runs bee-install <device> <logfile> and streams output to logFile.
// The context can be used to cancel.
func (s *System) InstallToDisk(ctx context.Context, device string, logFile string) error {
cmd := exec.CommandContext(ctx, "bee-install", device, logFile)
return cmd.Run()
}
// InstallLogPath returns the default install log path for a given device.
func InstallLogPath(device string) string {
safe := strings.NewReplacer("/", "_", " ", "_").Replace(device)
return "/tmp/bee-install" + safe + ".log"
}
// Label returns a display label for a disk.
func (d InstallDisk) Label() string {
model := d.Model
if model == "" {
model = "Unknown"
}
return fmt.Sprintf("%s %s %s", d.Device, d.Size, model)
}

View File

@@ -0,0 +1,139 @@
package platform
import (
"bufio"
"os"
"strconv"
"strings"
"time"
)
// LiveMetricSample is a single point-in-time snapshot of server metrics
// collected for the web UI metrics page.
type LiveMetricSample struct {
Timestamp time.Time `json:"ts"`
Fans []FanReading `json:"fans"`
Temps []TempReading `json:"temps"`
PowerW float64 `json:"power_w"`
CPULoadPct float64 `json:"cpu_load_pct"`
MemLoadPct float64 `json:"mem_load_pct"`
GPUs []GPUMetricRow `json:"gpus"`
}
// TempReading is a named temperature sensor value.
type TempReading struct {
Name string `json:"name"`
Celsius float64 `json:"celsius"`
}
// SampleLiveMetrics collects a single metrics snapshot from all available
// sources: GPU (via nvidia-smi), fans and temperatures (via ipmitool/sensors),
// and system power (via ipmitool dcmi). Missing sources are silently skipped.
func SampleLiveMetrics() LiveMetricSample {
s := LiveMetricSample{Timestamp: time.Now().UTC()}
// GPU metrics — skipped silently if nvidia-smi unavailable
gpus, _ := SampleGPUMetrics(nil)
s.GPUs = gpus
// Fan speeds — skipped silently if ipmitool unavailable
fans, _ := sampleFanSpeeds()
s.Fans = fans
// CPU/system temperature — returns 0 if unavailable
cpuTemp := sampleCPUMaxTemp()
if cpuTemp > 0 {
s.Temps = append(s.Temps, TempReading{Name: "CPU", Celsius: cpuTemp})
}
// System power — returns 0 if unavailable
s.PowerW = sampleSystemPower()
// CPU load — from /proc/stat
s.CPULoadPct = sampleCPULoadPct()
// Memory load — from /proc/meminfo
s.MemLoadPct = sampleMemLoadPct()
return s
}
// sampleCPULoadPct reads two /proc/stat snapshots 200ms apart and returns
// the overall CPU utilisation percentage.
var cpuStatPrev [2]uint64 // [total, idle]
func sampleCPULoadPct() float64 {
total, idle := readCPUStat()
if total == 0 {
return 0
}
prevTotal, prevIdle := cpuStatPrev[0], cpuStatPrev[1]
cpuStatPrev = [2]uint64{total, idle}
if prevTotal == 0 {
return 0
}
dt := float64(total - prevTotal)
di := float64(idle - prevIdle)
if dt <= 0 {
return 0
}
pct := (1 - di/dt) * 100
if pct < 0 {
return 0
}
if pct > 100 {
return 100
}
return pct
}
func readCPUStat() (total, idle uint64) {
f, err := os.Open("/proc/stat")
if err != nil {
return 0, 0
}
defer f.Close()
sc := bufio.NewScanner(f)
for sc.Scan() {
line := sc.Text()
if !strings.HasPrefix(line, "cpu ") {
continue
}
fields := strings.Fields(line)[1:] // skip "cpu"
var vals [10]uint64
for i := 0; i < len(fields) && i < 10; i++ {
vals[i], _ = strconv.ParseUint(fields[i], 10, 64)
}
// idle = idle + iowait
idle = vals[3] + vals[4]
for _, v := range vals {
total += v
}
return total, idle
}
return 0, 0
}
func sampleMemLoadPct() float64 {
f, err := os.Open("/proc/meminfo")
if err != nil {
return 0
}
defer f.Close()
vals := map[string]uint64{}
sc := bufio.NewScanner(f)
for sc.Scan() {
fields := strings.Fields(sc.Text())
if len(fields) >= 2 {
v, _ := strconv.ParseUint(fields[1], 10, 64)
vals[strings.TrimSuffix(fields[0], ":")] = v
}
}
total := vals["MemTotal"]
avail := vals["MemAvailable"]
if total == 0 {
return 0
}
used := total - avail
return float64(used) / float64(total) * 100
}

View File

@@ -16,9 +16,6 @@ var runtimeRequiredTools = []string{
"smartctl",
"nvme",
"ipmitool",
"nvidia-smi",
"nvidia-bug-report.sh",
"bee-gpu-stress",
"dhclient",
"mount",
}
@@ -93,7 +90,8 @@ func (s *System) CollectRuntimeHealth(exportDir string) (schema.RuntimeHealth, e
}
}
for _, tool := range s.CheckTools(runtimeRequiredTools) {
vendor := s.DetectGPUVendor()
for _, tool := range s.runtimeToolStatuses(vendor) {
health.Tools = append(health.Tools, schema.RuntimeToolStatus{
Name: tool.Name,
Path: tool.Path,
@@ -115,39 +113,7 @@ func (s *System) CollectRuntimeHealth(exportDir string) (schema.RuntimeHealth, e
})
}
lsmodText := commandText("lsmod")
health.DriverReady = strings.Contains(lsmodText, "nvidia ")
if !health.DriverReady {
health.Issues = append(health.Issues, schema.RuntimeIssue{
Code: "nvidia_kernel_module_missing",
Severity: "warning",
Description: "NVIDIA kernel module is not loaded.",
})
}
if health.DriverReady && !strings.Contains(lsmodText, "nvidia_modeset") {
health.Issues = append(health.Issues, schema.RuntimeIssue{
Code: "nvidia_modeset_failed",
Severity: "warning",
Description: "nvidia-modeset is not loaded; display/CUDA stack may be partial.",
})
}
if out, err := exec.Command("nvidia-smi", "-L").CombinedOutput(); err == nil && strings.TrimSpace(string(out)) != "" {
health.DriverReady = true
}
health.CUDAReady = false
if lookErr := exec.Command("sh", "-c", "command -v bee-gpu-stress >/dev/null 2>&1").Run(); lookErr == nil {
out, err := exec.Command("bee-gpu-stress", "--seconds", "1", "--size-mb", "1").CombinedOutput()
if err == nil {
health.CUDAReady = true
} else if strings.Contains(strings.ToLower(string(out)), "cuda_error_system_not_ready") {
health.Issues = append(health.Issues, schema.RuntimeIssue{
Code: "cuda_runtime_not_ready",
Severity: "warning",
Description: "CUDA runtime is not ready for GPU SAT.",
})
}
}
s.collectGPURuntimeHealth(vendor, &health)
if health.Status != "FAILED" && len(health.Issues) > 0 {
health.Status = "PARTIAL"
@@ -162,3 +128,87 @@ func commandText(name string, args ...string) string {
}
return string(raw)
}
func (s *System) runtimeToolStatuses(vendor string) []ToolStatus {
tools := s.CheckTools(runtimeRequiredTools)
switch vendor {
case "nvidia":
tools = append(tools, s.CheckTools([]string{
"nvidia-smi",
"nvidia-bug-report.sh",
"bee-gpu-stress",
})...)
case "amd":
tool := ToolStatus{Name: "rocm-smi"}
if cmd, err := resolveROCmSMICommand(); err == nil && len(cmd) > 0 {
tool.Path = cmd[0]
if len(cmd) > 1 && strings.HasSuffix(cmd[1], "rocm_smi.py") {
tool.Path = cmd[1]
}
tool.OK = true
}
tools = append(tools, tool)
}
return tools
}
func (s *System) collectGPURuntimeHealth(vendor string, health *schema.RuntimeHealth) {
lsmodText := commandText("lsmod")
switch vendor {
case "nvidia":
health.DriverReady = strings.Contains(lsmodText, "nvidia ")
if !health.DriverReady {
health.Issues = append(health.Issues, schema.RuntimeIssue{
Code: "nvidia_kernel_module_missing",
Severity: "warning",
Description: "NVIDIA kernel module is not loaded.",
})
}
if health.DriverReady && !strings.Contains(lsmodText, "nvidia_modeset") {
health.Issues = append(health.Issues, schema.RuntimeIssue{
Code: "nvidia_modeset_failed",
Severity: "warning",
Description: "nvidia-modeset is not loaded; display/CUDA stack may be partial.",
})
}
if out, err := exec.Command("nvidia-smi", "-L").CombinedOutput(); err == nil && strings.TrimSpace(string(out)) != "" {
health.DriverReady = true
}
if lookErr := exec.Command("sh", "-c", "command -v bee-gpu-stress >/dev/null 2>&1").Run(); lookErr == nil {
out, err := exec.Command("bee-gpu-stress", "--seconds", "1", "--size-mb", "1").CombinedOutput()
if err == nil {
health.CUDAReady = true
} else if strings.Contains(strings.ToLower(string(out)), "cuda_error_system_not_ready") {
health.Issues = append(health.Issues, schema.RuntimeIssue{
Code: "cuda_runtime_not_ready",
Severity: "warning",
Description: "CUDA runtime is not ready for GPU SAT.",
})
}
}
case "amd":
health.DriverReady = strings.Contains(lsmodText, "amdgpu ") || strings.Contains(lsmodText, "amdkfd")
if !health.DriverReady {
health.Issues = append(health.Issues, schema.RuntimeIssue{
Code: "amdgpu_kernel_module_missing",
Severity: "warning",
Description: "AMD GPU driver is not loaded.",
})
}
out, err := runROCmSMI("--showproductname", "--csv")
if err == nil && strings.TrimSpace(string(out)) != "" {
health.CUDAReady = true
health.DriverReady = true
return
}
health.Issues = append(health.Issues, schema.RuntimeIssue{
Code: "rocm_smi_unavailable",
Severity: "warning",
Description: "ROCm SMI is not available for AMD GPU SAT.",
})
}
}

View File

@@ -4,6 +4,7 @@ import (
"archive/tar"
"compress/gzip"
"context"
"errors"
"fmt"
"io"
"os"
@@ -15,6 +16,22 @@ import (
"time"
)
var (
satExecCommand = exec.Command
satLookPath = exec.LookPath
satGlob = filepath.Glob
satStat = os.Stat
rocmSMIExecutableGlobs = []string{
"/opt/rocm/bin/rocm-smi",
"/opt/rocm-*/bin/rocm-smi",
}
rocmSMIScriptGlobs = []string{
"/opt/rocm/libexec/rocm_smi/rocm_smi.py",
"/opt/rocm-*/libexec/rocm_smi/rocm_smi.py",
}
)
// NvidiaGPU holds basic GPU info from nvidia-smi.
type NvidiaGPU struct {
Index int
@@ -41,7 +58,7 @@ func (s *System) DetectGPUVendor() string {
// ListAMDGPUs returns AMD GPUs visible to rocm-smi.
func (s *System) ListAMDGPUs() ([]AMDGPUInfo, error) {
out, err := exec.Command("rocm-smi", "--showproductname", "--csv").Output()
out, err := runROCmSMI("--showproductname", "--csv")
if err != nil {
return nil, fmt.Errorf("rocm-smi: %w", err)
}
@@ -104,14 +121,34 @@ func (s *System) ListNvidiaGPUs() ([]NvidiaGPU, error) {
return gpus, nil
}
// RunNCCLTests runs nccl-tests all_reduce_perf across all NVIDIA GPUs.
// Measures collective communication bandwidth over NVLink/PCIe.
func (s *System) RunNCCLTests(ctx context.Context, baseDir string) (string, error) {
// detect GPU count
out, _ := exec.Command("nvidia-smi", "--query-gpu=index", "--format=csv,noheader").Output()
gpuCount := len(strings.Split(strings.TrimSpace(string(out)), "\n"))
if gpuCount < 1 {
gpuCount = 1
}
return runAcceptancePackCtx(ctx, baseDir, "nccl-tests", []satJob{
{name: "01-nvidia-smi-q.log", cmd: []string{"nvidia-smi", "-q"}},
{name: "02-all-reduce-perf.log", cmd: []string{
"all_reduce_perf", "-b", "512M", "-e", "4G", "-f", "2",
"-g", strconv.Itoa(gpuCount), "--iters", "20",
}},
})
}
func (s *System) RunNvidiaAcceptancePack(baseDir string) (string, error) {
return runAcceptancePack(baseDir, "gpu-nvidia", nvidiaSATJobs())
}
// RunNvidiaAcceptancePackWithOptions runs the NVIDIA SAT with explicit duration,
// GPU memory size, and GPU index selection. ctx cancellation kills the running job.
func (s *System) RunNvidiaAcceptancePackWithOptions(ctx context.Context, baseDir string, durationSec int, sizeMB int, gpuIndices []int) (string, error) {
return runAcceptancePackCtx(ctx, baseDir, "gpu-nvidia", nvidiaSATJobsWithOptions(durationSec, sizeMB, gpuIndices))
// RunNvidiaAcceptancePackWithOptions runs the NVIDIA diagnostics via DCGM.
// diagLevel: 1=quick, 2=medium, 3=targeted stress, 4=extended stress.
// gpuIndices: specific GPU indices to test (empty = all GPUs).
// ctx cancellation kills the running job.
func (s *System) RunNvidiaAcceptancePackWithOptions(ctx context.Context, baseDir string, diagLevel int, gpuIndices []int) (string, error) {
return runAcceptancePackCtx(ctx, baseDir, "gpu-nvidia", nvidiaDCGMJobs(diagLevel, gpuIndices))
}
func (s *System) RunMemoryAcceptancePack(baseDir string) (string, error) {
@@ -258,27 +295,23 @@ func runAcceptancePack(baseDir, prefix string, jobs []satJob) (string, error) {
return archive, nil
}
func nvidiaSATJobsWithOptions(durationSec, sizeMB int, gpuIndices []int) []satJob {
var env []string
func nvidiaDCGMJobs(diagLevel int, gpuIndices []int) []satJob {
if diagLevel < 1 || diagLevel > 4 {
diagLevel = 3
}
diagArgs := []string{"dcgmi", "diag", "-r", strconv.Itoa(diagLevel)}
if len(gpuIndices) > 0 {
ids := make([]string, len(gpuIndices))
for i, idx := range gpuIndices {
ids[i] = strconv.Itoa(idx)
}
env = []string{"CUDA_VISIBLE_DEVICES=" + strings.Join(ids, ",")}
diagArgs = append(diagArgs, "-i", strings.Join(ids, ","))
}
return []satJob{
{name: "01-nvidia-smi-q.log", cmd: []string{"nvidia-smi", "-q"}},
{name: "02-dmidecode-baseboard.log", cmd: []string{"dmidecode", "-t", "baseboard"}},
{name: "03-dmidecode-system.log", cmd: []string{"dmidecode", "-t", "system"}},
{name: "04-nvidia-bug-report.log", cmd: []string{"nvidia-bug-report.sh", "--output-file", "{{run_dir}}/nvidia-bug-report.log"}},
{
name: "05-bee-gpu-stress.log",
cmd: []string{"bee-gpu-stress", "--seconds", strconv.Itoa(durationSec), "--size-mb", strconv.Itoa(sizeMB)},
env: env,
collectGPU: true,
gpuIndices: gpuIndices,
},
{name: "04-dcgmi-diag.log", cmd: diagArgs},
}
}
@@ -337,12 +370,22 @@ func runAcceptancePackCtx(ctx context.Context, baseDir, prefix string, jobs []sa
func runSATCommandCtx(ctx context.Context, verboseLog, name string, cmd []string, env []string) ([]byte, error) {
start := time.Now().UTC()
resolvedCmd, err := resolveSATCommand(cmd)
appendSATVerboseLog(verboseLog,
fmt.Sprintf("[%s] start %s", start.Format(time.RFC3339), name),
"cmd: "+strings.Join(cmd, " "),
"cmd: "+strings.Join(resolvedCmd, " "),
)
if err != nil {
appendSATVerboseLog(verboseLog,
fmt.Sprintf("[%s] finish %s", time.Now().UTC().Format(time.RFC3339), name),
"rc: 1",
fmt.Sprintf("duration_ms: %d", time.Since(start).Milliseconds()),
"",
)
return []byte(err.Error() + "\n"), err
}
c := exec.CommandContext(ctx, cmd[0], cmd[1:]...)
c := exec.CommandContext(ctx, resolvedCmd[0], resolvedCmd[1:]...)
if len(env) > 0 {
c.Env = append(os.Environ(), env...)
}
@@ -362,19 +405,11 @@ func runSATCommandCtx(ctx context.Context, verboseLog, name string, cmd []string
}
func listStorageDevices() ([]string, error) {
out, err := exec.Command("lsblk", "-dn", "-o", "NAME,TYPE").Output()
out, err := satExecCommand("lsblk", "-dn", "-o", "NAME,TYPE,TRAN").Output()
if err != nil {
return nil, err
}
var devices []string
for _, line := range strings.Split(strings.TrimSpace(string(out)), "\n") {
fields := strings.Fields(strings.TrimSpace(line))
if len(fields) != 2 || fields[1] != "disk" {
continue
}
devices = append(devices, "/dev/"+fields[0])
}
return devices, nil
return parseStorageDevices(string(out)), nil
}
func storageSATCommands(devPath string) []satJob {
@@ -445,12 +480,22 @@ func classifySATResult(name string, out []byte, err error) (string, int) {
func runSATCommand(verboseLog, name string, cmd []string) ([]byte, error) {
start := time.Now().UTC()
resolvedCmd, err := resolveSATCommand(cmd)
appendSATVerboseLog(verboseLog,
fmt.Sprintf("[%s] start %s", start.Format(time.RFC3339), name),
"cmd: "+strings.Join(cmd, " "),
"cmd: "+strings.Join(resolvedCmd, " "),
)
if err != nil {
appendSATVerboseLog(verboseLog,
fmt.Sprintf("[%s] finish %s", time.Now().UTC().Format(time.RFC3339), name),
"rc: 1",
fmt.Sprintf("duration_ms: %d", time.Since(start).Milliseconds()),
"",
)
return []byte(err.Error() + "\n"), err
}
out, err := exec.Command(cmd[0], cmd[1:]...).CombinedOutput()
out, err := satExecCommand(resolvedCmd[0], resolvedCmd[1:]...).CombinedOutput()
rc := 0
if err != nil {
@@ -465,6 +510,91 @@ func runSATCommand(verboseLog, name string, cmd []string) ([]byte, error) {
return out, err
}
func runROCmSMI(args ...string) ([]byte, error) {
cmd, err := resolveROCmSMICommand(args...)
if err != nil {
return nil, err
}
return satExecCommand(cmd[0], cmd[1:]...).CombinedOutput()
}
func resolveSATCommand(cmd []string) ([]string, error) {
if len(cmd) == 0 {
return nil, errors.New("empty SAT command")
}
if cmd[0] != "rocm-smi" {
return cmd, nil
}
return resolveROCmSMICommand(cmd[1:]...)
}
func resolveROCmSMICommand(args ...string) ([]string, error) {
if path, err := satLookPath("rocm-smi"); err == nil {
return append([]string{path}, args...), nil
}
for _, path := range rocmSMIExecutableCandidates() {
return append([]string{path}, args...), nil
}
pythonPath, pyErr := satLookPath("python3")
if pyErr == nil {
for _, script := range rocmSMIScriptCandidates() {
cmd := []string{pythonPath, script}
cmd = append(cmd, args...)
return cmd, nil
}
}
return nil, errors.New("rocm-smi not found in PATH or under /opt/rocm")
}
func rocmSMIExecutableCandidates() []string {
return expandExistingPaths(rocmSMIExecutableGlobs)
}
func rocmSMIScriptCandidates() []string {
return expandExistingPaths(rocmSMIScriptGlobs)
}
func expandExistingPaths(patterns []string) []string {
seen := make(map[string]struct{})
var paths []string
for _, pattern := range patterns {
matches, err := satGlob(pattern)
if err != nil {
continue
}
sort.Strings(matches)
for _, match := range matches {
if _, err := satStat(match); err != nil {
continue
}
if _, ok := seen[match]; ok {
continue
}
seen[match] = struct{}{}
paths = append(paths, match)
}
}
return paths
}
func parseStorageDevices(raw string) []string {
var devices []string
for _, line := range strings.Split(strings.TrimSpace(raw), "\n") {
fields := strings.Fields(strings.TrimSpace(line))
if len(fields) < 2 || fields[1] != "disk" {
continue
}
if len(fields) >= 3 && strings.EqualFold(fields[2], "usb") {
continue
}
devices = append(devices, "/dev/"+fields[0])
}
return devices
}
// runSATCommandWithMetrics runs a command while collecting GPU metrics in the background.
// On completion it writes gpu-metrics.csv and gpu-metrics.html into runDir.
func runSATCommandWithMetrics(ctx context.Context, verboseLog, name string, cmd []string, env []string, gpuIndices []int, runDir string) ([]byte, error) {

View File

@@ -0,0 +1,587 @@
package platform
import (
"context"
"fmt"
"os"
"os/exec"
"path/filepath"
"strconv"
"strings"
"sync"
"time"
)
// FanStressOptions configures the fan-stress / thermal cycling test.
type FanStressOptions struct {
BaselineSec int // idle monitoring before and after load (default 30)
Phase1DurSec int // first load phase duration in seconds (default 300)
PauseSec int // pause between the two load phases (default 60)
Phase2DurSec int // second load phase duration in seconds (default 300)
SizeMB int // GPU memory to allocate per GPU during stress (default 64)
GPUIndices []int // which GPU indices to stress (empty = all detected)
}
// FanReading holds one fan sensor reading.
type FanReading struct {
Name string
RPM float64
}
// GPUStressMetric holds per-GPU metrics during the stress test.
type GPUStressMetric struct {
Index int
TempC float64
UsagePct float64
PowerW float64
ClockMHz float64
Throttled bool // true if any throttle reason is active
}
// FanStressRow is one second-interval telemetry sample covering all monitored dimensions.
type FanStressRow struct {
TimestampUTC string
ElapsedSec float64
Phase string // "baseline", "load1", "pause", "load2", "cooldown"
GPUs []GPUStressMetric
Fans []FanReading
CPUMaxTempC float64 // highest CPU temperature from ipmitool / sensors
SysPowerW float64 // DCMI system power reading
}
// RunFanStressTest runs a two-phase GPU stress test while monitoring fan speeds,
// temperatures, and power draw every second. Exports metrics.csv and fan-sensors.csv.
// Designed to reproduce case-04 fan-speed lag and detect GPU thermal throttling.
func (s *System) RunFanStressTest(ctx context.Context, baseDir string, opts FanStressOptions) (string, error) {
if baseDir == "" {
baseDir = "/var/log/bee-sat"
}
applyFanStressDefaults(&opts)
ts := time.Now().UTC().Format("20060102-150405")
runDir := filepath.Join(baseDir, "fan-stress-"+ts)
if err := os.MkdirAll(runDir, 0755); err != nil {
return "", err
}
verboseLog := filepath.Join(runDir, "verbose.log")
// Phase name shared between sampler goroutine and main goroutine.
var phaseMu sync.Mutex
currentPhase := "init"
setPhase := func(name string) {
phaseMu.Lock()
currentPhase = name
phaseMu.Unlock()
}
getPhase := func() string {
phaseMu.Lock()
defer phaseMu.Unlock()
return currentPhase
}
start := time.Now()
var rowsMu sync.Mutex
var allRows []FanStressRow
// Start background sampler (every second).
stopCh := make(chan struct{})
doneCh := make(chan struct{})
go func() {
defer close(doneCh)
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
select {
case <-stopCh:
return
case <-ticker.C:
row := sampleFanStressRow(opts.GPUIndices, getPhase(), time.Since(start).Seconds())
rowsMu.Lock()
allRows = append(allRows, row)
rowsMu.Unlock()
}
}
}()
var summary strings.Builder
fmt.Fprintf(&summary, "run_at_utc=%s\n", time.Now().UTC().Format(time.RFC3339))
stats := satStats{}
// idlePhase sleeps for durSec while the sampler stamps phaseName on each row.
idlePhase := func(phaseName, stepName string, durSec int) {
if ctx.Err() != nil {
return
}
setPhase(phaseName)
appendSATVerboseLog(verboseLog,
fmt.Sprintf("[%s] start %s (idle %ds)", time.Now().UTC().Format(time.RFC3339), stepName, durSec),
)
select {
case <-ctx.Done():
case <-time.After(time.Duration(durSec) * time.Second):
}
appendSATVerboseLog(verboseLog,
fmt.Sprintf("[%s] finish %s", time.Now().UTC().Format(time.RFC3339), stepName),
)
fmt.Fprintf(&summary, "%s_status=OK\n", stepName)
stats.OK++
}
// loadPhase runs bee-gpu-stress for durSec; sampler stamps phaseName on each row.
loadPhase := func(phaseName, stepName string, durSec int) {
if ctx.Err() != nil {
return
}
setPhase(phaseName)
var env []string
if len(opts.GPUIndices) > 0 {
ids := make([]string, len(opts.GPUIndices))
for i, idx := range opts.GPUIndices {
ids[i] = strconv.Itoa(idx)
}
env = []string{"CUDA_VISIBLE_DEVICES=" + strings.Join(ids, ",")}
}
cmd := []string{
"bee-gpu-stress",
"--seconds", strconv.Itoa(durSec),
"--size-mb", strconv.Itoa(opts.SizeMB),
}
out, err := runSATCommandCtx(ctx, verboseLog, stepName, cmd, env)
_ = os.WriteFile(filepath.Join(runDir, stepName+".log"), out, 0644)
if err != nil && err != context.Canceled && err.Error() != "signal: killed" {
fmt.Fprintf(&summary, "%s_status=FAILED\n", stepName)
stats.Failed++
} else {
fmt.Fprintf(&summary, "%s_status=OK\n", stepName)
stats.OK++
}
}
// Execute test phases.
idlePhase("baseline", "01-baseline", opts.BaselineSec)
loadPhase("load1", "02-load1", opts.Phase1DurSec)
idlePhase("pause", "03-pause", opts.PauseSec)
loadPhase("load2", "04-load2", opts.Phase2DurSec)
idlePhase("cooldown", "05-cooldown", opts.BaselineSec)
// Stop sampler and collect rows.
close(stopCh)
<-doneCh
rowsMu.Lock()
rows := allRows
rowsMu.Unlock()
// Analysis.
throttled := analyzeThrottling(rows)
maxGPUTemp := analyzeMaxTemp(rows, func(r FanStressRow) float64 {
var m float64
for _, g := range r.GPUs {
if g.TempC > m {
m = g.TempC
}
}
return m
})
maxCPUTemp := analyzeMaxTemp(rows, func(r FanStressRow) float64 {
return r.CPUMaxTempC
})
fanResponseSec := analyzeFanResponse(rows)
fmt.Fprintf(&summary, "throttling_detected=%v\n", throttled)
fmt.Fprintf(&summary, "max_gpu_temp_c=%.1f\n", maxGPUTemp)
fmt.Fprintf(&summary, "max_cpu_temp_c=%.1f\n", maxCPUTemp)
if fanResponseSec >= 0 {
fmt.Fprintf(&summary, "fan_response_sec=%.1f\n", fanResponseSec)
} else {
fmt.Fprintf(&summary, "fan_response_sec=N/A\n")
}
// Throttling failure counts against overall result.
if throttled {
stats.Failed++
}
writeSATStats(&summary, stats)
// Write CSV outputs.
if err := WriteFanStressCSV(filepath.Join(runDir, "metrics.csv"), rows, opts.GPUIndices); err != nil {
return "", err
}
_ = WriteFanSensorsCSV(filepath.Join(runDir, "fan-sensors.csv"), rows)
if err := os.WriteFile(filepath.Join(runDir, "summary.txt"), []byte(summary.String()), 0644); err != nil {
return "", err
}
archive := filepath.Join(baseDir, "fan-stress-"+ts+".tar.gz")
if err := createTarGz(archive, runDir); err != nil {
return "", err
}
return archive, nil
}
func applyFanStressDefaults(opts *FanStressOptions) {
if opts.BaselineSec <= 0 {
opts.BaselineSec = 30
}
if opts.Phase1DurSec <= 0 {
opts.Phase1DurSec = 300
}
if opts.PauseSec <= 0 {
opts.PauseSec = 60
}
if opts.Phase2DurSec <= 0 {
opts.Phase2DurSec = 300
}
if opts.SizeMB <= 0 {
opts.SizeMB = 64
}
}
// sampleFanStressRow collects all metrics for one telemetry sample.
func sampleFanStressRow(gpuIndices []int, phase string, elapsed float64) FanStressRow {
row := FanStressRow{
TimestampUTC: time.Now().UTC().Format(time.RFC3339),
ElapsedSec: elapsed,
Phase: phase,
}
row.GPUs = sampleGPUStressMetrics(gpuIndices)
row.Fans, _ = sampleFanSpeeds()
row.CPUMaxTempC = sampleCPUMaxTemp()
row.SysPowerW = sampleSystemPower()
return row
}
// sampleGPUStressMetrics queries nvidia-smi for temperature, utilization, power,
// clock frequency, and active throttle reasons for each GPU.
func sampleGPUStressMetrics(gpuIndices []int) []GPUStressMetric {
args := []string{
"--query-gpu=index,temperature.gpu,utilization.gpu,power.draw,clocks.current.graphics,clocks_throttle_reasons.active",
"--format=csv,noheader,nounits",
}
if len(gpuIndices) > 0 {
ids := make([]string, len(gpuIndices))
for i, idx := range gpuIndices {
ids[i] = strconv.Itoa(idx)
}
args = append([]string{"--id=" + strings.Join(ids, ",")}, args...)
}
out, err := exec.Command("nvidia-smi", args...).Output()
if err != nil {
return nil
}
var metrics []GPUStressMetric
for _, line := range strings.Split(strings.TrimSpace(string(out)), "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
parts := strings.Split(line, ", ")
if len(parts) < 6 {
continue
}
idx, _ := strconv.Atoi(strings.TrimSpace(parts[0]))
throttleVal := strings.TrimSpace(parts[5])
// Throttled if active reasons bitmask is non-zero.
throttled := throttleVal != "0x0000000000000000" &&
throttleVal != "0x0" &&
throttleVal != "0" &&
throttleVal != "" &&
throttleVal != "N/A"
metrics = append(metrics, GPUStressMetric{
Index: idx,
TempC: parseGPUFloat(parts[1]),
UsagePct: parseGPUFloat(parts[2]),
PowerW: parseGPUFloat(parts[3]),
ClockMHz: parseGPUFloat(parts[4]),
Throttled: throttled,
})
}
return metrics
}
// sampleFanSpeeds reads fan RPM values from ipmitool sdr.
func sampleFanSpeeds() ([]FanReading, error) {
out, err := exec.Command("ipmitool", "sdr", "type", "Fan").Output()
if err != nil {
return nil, err
}
return parseFanSpeeds(string(out)), nil
}
// parseFanSpeeds parses "ipmitool sdr type Fan" output.
// Line format: "FAN1 | 2400.000 | RPM | ok"
func parseFanSpeeds(raw string) []FanReading {
var fans []FanReading
for _, line := range strings.Split(strings.TrimSpace(raw), "\n") {
parts := strings.Split(line, "|")
if len(parts) < 3 {
continue
}
unit := strings.TrimSpace(parts[2])
if !strings.EqualFold(unit, "RPM") {
continue
}
valStr := strings.TrimSpace(parts[1])
if strings.EqualFold(valStr, "na") || strings.EqualFold(valStr, "disabled") || valStr == "" {
continue
}
val, err := strconv.ParseFloat(valStr, 64)
if err != nil {
continue
}
fans = append(fans, FanReading{
Name: strings.TrimSpace(parts[0]),
RPM: val,
})
}
return fans
}
// sampleCPUMaxTemp returns the highest CPU/inlet temperature from ipmitool or sensors.
func sampleCPUMaxTemp() float64 {
out, err := exec.Command("ipmitool", "sdr", "type", "Temperature").Output()
if err != nil {
return sampleCPUTempViaSensors()
}
return parseIPMIMaxTemp(string(out))
}
// parseIPMIMaxTemp extracts the maximum temperature from "ipmitool sdr type Temperature".
func parseIPMIMaxTemp(raw string) float64 {
var max float64
for _, line := range strings.Split(strings.TrimSpace(raw), "\n") {
parts := strings.Split(line, "|")
if len(parts) < 3 {
continue
}
unit := strings.TrimSpace(parts[2])
if !strings.Contains(strings.ToLower(unit), "degrees") {
continue
}
valStr := strings.TrimSpace(parts[1])
if strings.EqualFold(valStr, "na") || valStr == "" {
continue
}
val, err := strconv.ParseFloat(valStr, 64)
if err != nil {
continue
}
if val > max {
max = val
}
}
return max
}
// sampleCPUTempViaSensors falls back to lm-sensors when ipmitool is unavailable.
func sampleCPUTempViaSensors() float64 {
out, err := exec.Command("sensors", "-u").Output()
if err != nil {
return 0
}
var max float64
for _, line := range strings.Split(string(out), "\n") {
line = strings.TrimSpace(line)
fields := strings.Fields(line)
if len(fields) < 2 {
continue
}
if !strings.HasSuffix(fields[0], "_input:") {
continue
}
val, err := strconv.ParseFloat(fields[1], 64)
if err != nil {
continue
}
if val > 0 && val < 150 && val > max {
max = val
}
}
return max
}
// sampleSystemPower reads system power draw via DCMI.
func sampleSystemPower() float64 {
out, err := exec.Command("ipmitool", "dcmi", "power", "reading").Output()
if err != nil {
return 0
}
return parseDCMIPowerReading(string(out))
}
// parseDCMIPowerReading extracts the instantaneous power reading from ipmitool dcmi output.
// Sample: " Instantaneous power reading: 500 Watts"
func parseDCMIPowerReading(raw string) float64 {
for _, line := range strings.Split(raw, "\n") {
if !strings.Contains(strings.ToLower(line), "instantaneous") {
continue
}
parts := strings.Fields(line)
for i, p := range parts {
if strings.EqualFold(p, "Watts") && i > 0 {
val, err := strconv.ParseFloat(parts[i-1], 64)
if err == nil {
return val
}
}
}
}
return 0
}
// analyzeThrottling returns true if any GPU reported an active throttle reason
// during either load phase.
func analyzeThrottling(rows []FanStressRow) bool {
for _, row := range rows {
if row.Phase != "load1" && row.Phase != "load2" {
continue
}
for _, gpu := range row.GPUs {
if gpu.Throttled {
return true
}
}
}
return false
}
// analyzeMaxTemp returns the maximum value of the given extractor across all rows.
func analyzeMaxTemp(rows []FanStressRow, extract func(FanStressRow) float64) float64 {
var max float64
for _, row := range rows {
if v := extract(row); v > max {
max = v
}
}
return max
}
// analyzeFanResponse returns the seconds from load1 start until fan RPM first
// increased by more than 5% above the baseline average. Returns -1 if undetermined.
func analyzeFanResponse(rows []FanStressRow) float64 {
// Compute baseline average fan RPM.
var baseTotal, baseCount float64
for _, row := range rows {
if row.Phase != "baseline" {
continue
}
for _, f := range row.Fans {
baseTotal += f.RPM
baseCount++
}
}
if baseCount == 0 || baseTotal == 0 {
return -1
}
baseAvg := baseTotal / baseCount
threshold := baseAvg * 1.05 // 5% increase signals fan ramp-up
// Find elapsed time when load1 started.
var load1Start float64 = -1
for _, row := range rows {
if row.Phase == "load1" {
load1Start = row.ElapsedSec
break
}
}
if load1Start < 0 {
return -1
}
// Find first load1 row where average RPM crosses the threshold.
for _, row := range rows {
if row.Phase != "load1" {
continue
}
var total, count float64
for _, f := range row.Fans {
total += f.RPM
count++
}
if count > 0 && total/count >= threshold {
return row.ElapsedSec - load1Start
}
}
return -1
}
// WriteFanStressCSV writes the wide-format metrics CSV with one row per second.
// GPU columns are generated per index in gpuIndices order.
func WriteFanStressCSV(path string, rows []FanStressRow, gpuIndices []int) error {
if len(rows) == 0 {
return os.WriteFile(path, []byte("no data\n"), 0644)
}
var b strings.Builder
// Header: fixed system columns + per-GPU columns.
b.WriteString("timestamp_utc,elapsed_sec,phase,fan_avg_rpm,fan_min_rpm,fan_max_rpm,cpu_max_temp_c,sys_power_w")
for _, idx := range gpuIndices {
fmt.Fprintf(&b, ",gpu%d_temp_c,gpu%d_usage_pct,gpu%d_power_w,gpu%d_clock_mhz,gpu%d_throttled",
idx, idx, idx, idx, idx)
}
b.WriteRune('\n')
for _, row := range rows {
favg, fmin, fmax := fanRPMStats(row.Fans)
fmt.Fprintf(&b, "%s,%.1f,%s,%.0f,%.0f,%.0f,%.1f,%.1f",
row.TimestampUTC,
row.ElapsedSec,
row.Phase,
favg, fmin, fmax,
row.CPUMaxTempC,
row.SysPowerW,
)
gpuByIdx := make(map[int]GPUStressMetric, len(row.GPUs))
for _, g := range row.GPUs {
gpuByIdx[g.Index] = g
}
for _, idx := range gpuIndices {
g := gpuByIdx[idx]
throttled := 0
if g.Throttled {
throttled = 1
}
fmt.Fprintf(&b, ",%.1f,%.1f,%.1f,%.0f,%d",
g.TempC, g.UsagePct, g.PowerW, g.ClockMHz, throttled)
}
b.WriteRune('\n')
}
return os.WriteFile(path, []byte(b.String()), 0644)
}
// WriteFanSensorsCSV writes individual fan sensor readings in long (tidy) format.
func WriteFanSensorsCSV(path string, rows []FanStressRow) error {
var b strings.Builder
b.WriteString("timestamp_utc,elapsed_sec,phase,fan_name,rpm\n")
for _, row := range rows {
for _, f := range row.Fans {
fmt.Fprintf(&b, "%s,%.1f,%s,%s,%.0f\n",
row.TimestampUTC, row.ElapsedSec, row.Phase, f.Name, f.RPM)
}
}
return os.WriteFile(path, []byte(b.String()), 0644)
}
// fanRPMStats computes average, min, max RPM across all fans in a sample row.
func fanRPMStats(fans []FanReading) (avg, min, max float64) {
if len(fans) == 0 {
return 0, 0, 0
}
min = fans[0].RPM
max = fans[0].RPM
var total float64
for _, f := range fans {
total += f.RPM
if f.RPM < min {
min = f.RPM
}
if f.RPM > max {
max = f.RPM
}
}
return total / float64(len(fans)), min, max
}

View File

@@ -3,6 +3,8 @@ package platform
import (
"errors"
"os"
"os/exec"
"path/filepath"
"testing"
)
@@ -91,3 +93,90 @@ func TestClassifySATResult(t *testing.T) {
})
}
}
func TestParseStorageDevicesSkipsUSBDisks(t *testing.T) {
t.Parallel()
raw := "nvme0n1 disk nvme\nsda disk usb\nloop0 loop\nsdb disk sata\n"
got := parseStorageDevices(raw)
want := []string{"/dev/nvme0n1", "/dev/sdb"}
if len(got) != len(want) {
t.Fatalf("len(devices)=%d want %d (%v)", len(got), len(want), got)
}
for i := range want {
if got[i] != want[i] {
t.Fatalf("devices[%d]=%q want %q", i, got[i], want[i])
}
}
}
func TestResolveROCmSMICommandFromPATH(t *testing.T) {
t.Setenv("PATH", t.TempDir())
toolPath := filepath.Join(os.Getenv("PATH"), "rocm-smi")
if err := os.WriteFile(toolPath, []byte("#!/bin/sh\nexit 0\n"), 0755); err != nil {
t.Fatalf("write rocm-smi: %v", err)
}
cmd, err := resolveROCmSMICommand("--showproductname")
if err != nil {
t.Fatalf("resolveROCmSMICommand error: %v", err)
}
if len(cmd) != 2 {
t.Fatalf("cmd len=%d want 2 (%v)", len(cmd), cmd)
}
if cmd[0] != toolPath {
t.Fatalf("cmd[0]=%q want %q", cmd[0], toolPath)
}
}
func TestResolveROCmSMICommandFallsBackToROCmTree(t *testing.T) {
tmp := t.TempDir()
execPath := filepath.Join(tmp, "opt", "rocm", "bin", "rocm-smi")
if err := os.MkdirAll(filepath.Dir(execPath), 0755); err != nil {
t.Fatalf("mkdir: %v", err)
}
if err := os.WriteFile(execPath, []byte("#!/bin/sh\nexit 0\n"), 0755); err != nil {
t.Fatalf("write rocm-smi: %v", err)
}
oldGlob := rocmSMIExecutableGlobs
oldScriptGlobs := rocmSMIScriptGlobs
rocmSMIExecutableGlobs = []string{execPath}
rocmSMIScriptGlobs = nil
t.Cleanup(func() {
rocmSMIExecutableGlobs = oldGlob
rocmSMIScriptGlobs = oldScriptGlobs
})
t.Setenv("PATH", "")
cmd, err := resolveROCmSMICommand("--showallinfo")
if err != nil {
t.Fatalf("resolveROCmSMICommand error: %v", err)
}
if len(cmd) != 2 {
t.Fatalf("cmd len=%d want 2 (%v)", len(cmd), cmd)
}
if cmd[0] != execPath {
t.Fatalf("cmd[0]=%q want %q", cmd[0], execPath)
}
}
func TestRunROCmSMIReportsMissingCommand(t *testing.T) {
oldLookPath := satLookPath
oldExecGlobs := rocmSMIExecutableGlobs
oldScriptGlobs := rocmSMIScriptGlobs
satLookPath = func(string) (string, error) { return "", exec.ErrNotFound }
rocmSMIExecutableGlobs = nil
rocmSMIScriptGlobs = nil
t.Cleanup(func() {
satLookPath = oldLookPath
rocmSMIExecutableGlobs = oldExecGlobs
rocmSMIScriptGlobs = oldScriptGlobs
})
if _, err := runROCmSMI("--showproductname"); err == nil {
t.Fatal("expected missing rocm-smi error")
}
}

View File

@@ -24,15 +24,23 @@ var techDumpFixedCommands = []struct {
{Name: "sensors", Args: []string{"-j"}, File: "sensors.json"},
{Name: "ipmitool", Args: []string{"fru", "print"}, File: "ipmitool-fru.txt"},
{Name: "ipmitool", Args: []string{"sdr"}, File: "ipmitool-sdr.txt"},
{Name: "nvme", Args: []string{"list", "-o", "json"}, File: "nvme-list.json"},
}
var techDumpNvidiaCommands = []struct {
Name string
Args []string
File string
}{
{Name: "nvidia-smi", Args: []string{"-q"}, File: "nvidia-smi-q.txt"},
{Name: "nvidia-smi", Args: []string{"--query-gpu=index,pci.bus_id,serial,vbios_version,temperature.gpu,power.draw,ecc.errors.uncorrected.aggregate.total,ecc.errors.corrected.aggregate.total,clocks_throttle_reasons.hw_slowdown", "--format=csv,noheader,nounits"}, File: "nvidia-smi-query.csv"},
{Name: "nvme", Args: []string{"list", "-o", "json"}, File: "nvme-list.json"},
}
type lsblkDumpRoot struct {
Blockdevices []struct {
Name string `json:"name"`
Type string `json:"type"`
Tran string `json:"tran"`
} `json:"blockdevices"`
}
@@ -50,6 +58,15 @@ func (s *System) CaptureTechnicalDump(baseDir string) error {
for _, cmd := range techDumpFixedCommands {
writeCommandDump(filepath.Join(baseDir, cmd.File), cmd.Name, cmd.Args...)
}
switch s.DetectGPUVendor() {
case "nvidia":
for _, cmd := range techDumpNvidiaCommands {
writeCommandDump(filepath.Join(baseDir, cmd.File), cmd.Name, cmd.Args...)
}
case "amd":
writeROCmSMIDump(filepath.Join(baseDir, "rocm-smi.txt"))
writeROCmSMIDump(filepath.Join(baseDir, "rocm-smi-showallinfo.txt"), "--showallinfo")
}
for _, dev := range lsblkDumpDevices(filepath.Join(baseDir, "lsblk.json")) {
writeCommandDump(filepath.Join(baseDir, "smartctl-"+sanitizeDumpName(dev)+".json"), "smartctl", "-j", "-a", "/dev/"+dev)
@@ -69,6 +86,14 @@ func writeCommandDump(path, name string, args ...string) {
_ = os.WriteFile(path, out, 0644)
}
func writeROCmSMIDump(path string, args ...string) {
out, err := runROCmSMI(args...)
if err != nil && len(out) == 0 {
return
}
_ = os.WriteFile(path, out, 0644)
}
func lsblkDumpDevices(path string) []string {
raw, err := os.ReadFile(path)
if err != nil {
@@ -80,6 +105,9 @@ func lsblkDumpDevices(path string) []string {
}
var devices []string
for _, dev := range root.Blockdevices {
if strings.EqualFold(strings.TrimSpace(dev.Tran), "usb") {
continue
}
if dev.Type == "disk" && strings.TrimSpace(dev.Name) != "" {
devices = append(devices, strings.TrimSpace(dev.Name))
}

View File

@@ -12,12 +12,12 @@ func TestLSBLKDumpDevices(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "lsblk.json")
if err := os.WriteFile(path, []byte(`{"blockdevices":[{"name":"sda","type":"disk"},{"name":"sda1","type":"part"},{"name":"nvme0n1","type":"disk"}]}`), 0644); err != nil {
if err := os.WriteFile(path, []byte(`{"blockdevices":[{"name":"sda","type":"disk","tran":"usb"},{"name":"sda1","type":"part"},{"name":"nvme0n1","type":"disk","tran":"nvme"},{"name":"sdb","type":"disk","tran":"sata"}]}`), 0644); err != nil {
t.Fatalf("write lsblk fixture: %v", err)
}
got := lsblkDumpDevices(path)
want := []string{"nvme0n1", "sda"}
want := []string{"nvme0n1", "sdb"}
if !reflect.DeepEqual(got, want) {
t.Fatalf("lsblkDumpDevices=%v want %v", got, want)
}

View File

@@ -1,156 +0,0 @@
package tui
import (
"time"
tea "github.com/charmbracelet/bubbletea"
)
func (m model) updateStaticForm(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
switch msg.String() {
case "esc":
m.screen = screenNetwork
m.formFields = nil
m.formIndex = 0
return m, nil
case "up", "shift+tab":
if m.formIndex > 0 {
m.formIndex--
}
case "down", "tab":
if m.formIndex < len(m.formFields)-1 {
m.formIndex++
}
case "enter":
if m.formIndex < len(m.formFields)-1 {
m.formIndex++
return m, nil
}
cfg := m.app.ParseStaticIPv4Config(m.selectedIface, []string{
m.formFields[0].Value,
m.formFields[1].Value,
m.formFields[2].Value,
m.formFields[3].Value,
})
m.busy = true
m.busyTitle = "Static IPv4: " + m.selectedIface
return m, func() tea.Msg {
result, err := m.app.SetStaticIPv4Result(cfg)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenNetwork}
}
case "backspace":
field := &m.formFields[m.formIndex]
if len(field.Value) > 0 {
field.Value = field.Value[:len(field.Value)-1]
}
default:
if msg.Type == tea.KeyRunes && len(msg.Runes) > 0 {
m.formFields[m.formIndex].Value += string(msg.Runes)
}
}
return m, nil
}
func (m model) updateConfirm(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
switch msg.String() {
case "left", "up", "tab":
if m.cursor > 0 {
m.cursor--
}
case "right", "down":
if m.cursor < 1 {
m.cursor++
}
case "esc":
m.screen = m.confirmCancelTarget()
m.cursor = 0
m.pendingAction = actionNone
return m, nil
case "enter":
if m.cursor == 1 { // Cancel
m.screen = m.confirmCancelTarget()
m.cursor = 0
m.pendingAction = actionNone
return m, nil
}
m.busy = true
switch m.pendingAction {
case actionExportBundle:
m.busyTitle = "Export support bundle"
target := *m.selectedTarget
return m, func() tea.Msg {
result, err := m.app.ExportSupportBundleResult(target)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenMain}
}
case actionRunAll:
return m.executeRunAll()
case actionRunMemorySAT:
m.busyTitle = "Memory test"
m.progressPrefix = "memory"
m.progressSince = time.Now()
m.progressLines = nil
since := m.progressSince
return m, tea.Batch(
func() tea.Msg {
result, err := m.app.RunMemoryAcceptancePackResult("")
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenHealthCheck}
},
pollSATProgress("memory", since),
)
case actionRunStorageSAT:
m.busyTitle = "Storage test"
m.progressPrefix = "storage"
m.progressSince = time.Now()
m.progressLines = nil
since := m.progressSince
return m, tea.Batch(
func() tea.Msg {
result, err := m.app.RunStorageAcceptancePackResult("")
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenHealthCheck}
},
pollSATProgress("storage", since),
)
case actionRunCPUSAT:
m.busyTitle = "CPU test"
m.progressPrefix = "cpu"
m.progressSince = time.Now()
m.progressLines = nil
since := m.progressSince
durationSec := hcCPUDurations[m.hcMode]
return m, tea.Batch(
func() tea.Msg {
result, err := m.app.RunCPUAcceptancePackResult("", durationSec)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenHealthCheck}
},
pollSATProgress("cpu", since),
)
case actionRunAMDGPUSAT:
m.busyTitle = "AMD GPU test"
m.progressPrefix = "gpu-amd"
m.progressSince = time.Now()
m.progressLines = nil
since := m.progressSince
return m, tea.Batch(
func() tea.Msg {
result, err := m.app.RunAMDAcceptancePackResult("")
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenHealthCheck}
},
pollSATProgress("gpu-amd", since),
)
}
case "ctrl+c":
return m, tea.Quit
}
return m, nil
}
func (m model) confirmCancelTarget() screen {
switch m.pendingAction {
case actionExportBundle:
return screenExportTargets
case actionRunAll, actionRunMemorySAT, actionRunStorageSAT, actionRunCPUSAT, actionRunAMDGPUSAT:
return screenHealthCheck
default:
return screenMain
}
}

View File

@@ -1,45 +0,0 @@
package tui
import (
"bee/audit/internal/app"
"bee/audit/internal/platform"
)
type resultMsg struct {
title string
body string
err error
back screen
}
type servicesMsg struct {
services []string
err error
}
type interfacesMsg struct {
ifaces []platform.InterfaceInfo
err error
}
type exportTargetsMsg struct {
targets []platform.RemovableTarget
err error
}
type panelMsg struct {
data app.HardwarePanelData
}
type nvidiaGPUsMsg struct {
gpus []platform.NvidiaGPU
err error
}
type nvtopClosedMsg struct{}
type nvidiaSATDoneMsg struct {
title string
body string
err error
}

View File

@@ -1,14 +0,0 @@
package tui
import tea "github.com/charmbracelet/bubbletea"
func (m model) handleExportTargetsMenu() (tea.Model, tea.Cmd) {
if len(m.targets) == 0 {
return m, resultCmd("Export support bundle", "No removable filesystems found", nil, screenMain)
}
target := m.targets[m.cursor]
m.selectedTarget = &target
m.pendingAction = actionExportBundle
m.screen = screenConfirm
return m, nil
}

View File

@@ -1,307 +0,0 @@
package tui
import (
"context"
"fmt"
"strings"
tea "github.com/charmbracelet/bubbletea"
)
// Component indices.
const (
hcGPU = 0
hcMemory = 1
hcStorage = 2
hcCPU = 3
)
// Cursor positions in Health Check screen.
const (
hcCurGPU = 0
hcCurMemory = 1
hcCurStorage = 2
hcCurCPU = 3
hcCurSelectAll = 4
hcCurModeQuick = 5
hcCurModeStd = 6
hcCurModeExpr = 7
hcCurRunAll = 8
hcCurTotal = 9
)
// hcModeDurations maps mode index (0=Quick,1=Standard,2=Express) to GPU stress seconds.
var hcModeDurations = [3]int{600, 3600, 28800}
// hcCPUDurations maps mode index to CPU stress-ng seconds.
var hcCPUDurations = [3]int{60, 300, 900}
func (m model) enterHealthCheck() (tea.Model, tea.Cmd) {
m.screen = screenHealthCheck
if !m.hcInitialized {
m.hcSel = [4]bool{true, true, true, true}
m.hcMode = 0
m.hcCursor = 0
m.hcInitialized = true
}
return m, nil
}
func (m model) updateHealthCheck(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
switch msg.String() {
case "up", "k":
if m.hcCursor > 0 {
m.hcCursor--
}
case "down", "j":
if m.hcCursor < hcCurTotal-1 {
m.hcCursor++
}
case " ":
switch m.hcCursor {
case hcCurGPU, hcCurMemory, hcCurStorage, hcCurCPU:
m.hcSel[m.hcCursor] = !m.hcSel[m.hcCursor]
case hcCurSelectAll:
allOn := m.hcSel[0] && m.hcSel[1] && m.hcSel[2] && m.hcSel[3]
for i := range m.hcSel {
m.hcSel[i] = !allOn
}
case hcCurModeQuick, hcCurModeStd, hcCurModeExpr:
m.hcMode = m.hcCursor - hcCurModeQuick
}
case "enter":
switch m.hcCursor {
case hcCurGPU, hcCurMemory, hcCurStorage, hcCurCPU:
return m.hcRunSingle(m.hcCursor)
case hcCurSelectAll:
allOn := m.hcSel[0] && m.hcSel[1] && m.hcSel[2] && m.hcSel[3]
for i := range m.hcSel {
m.hcSel[i] = !allOn
}
case hcCurModeQuick, hcCurModeStd, hcCurModeExpr:
m.hcMode = m.hcCursor - hcCurModeQuick
case hcCurRunAll:
return m.hcRunAll()
}
case "g", "G":
return m.hcRunSingle(hcGPU)
case "m", "M":
return m.hcRunSingle(hcMemory)
case "s", "S":
return m.hcRunSingle(hcStorage)
case "c", "C":
return m.hcRunSingle(hcCPU)
case "r", "R":
return m.hcRunAll()
case "a", "A":
allOn := m.hcSel[0] && m.hcSel[1] && m.hcSel[2] && m.hcSel[3]
for i := range m.hcSel {
m.hcSel[i] = !allOn
}
case "1":
m.hcMode = 0
case "2":
m.hcMode = 1
case "3":
m.hcMode = 2
case "esc":
m.screen = screenMain
m.cursor = 0
case "q", "ctrl+c":
return m, tea.Quit
}
return m, nil
}
func (m model) hcRunSingle(idx int) (tea.Model, tea.Cmd) {
switch idx {
case hcGPU:
if m.app.DetectGPUVendor() == "amd" {
m.pendingAction = actionRunAMDGPUSAT
m.screen = screenConfirm
m.cursor = 0
return m, nil
}
m.nvidiaDurIdx = m.hcMode
return m.enterNvidiaSATSetup()
case hcMemory:
m.pendingAction = actionRunMemorySAT
m.screen = screenConfirm
m.cursor = 0
return m, nil
case hcStorage:
m.pendingAction = actionRunStorageSAT
m.screen = screenConfirm
m.cursor = 0
return m, nil
case hcCPU:
m.pendingAction = actionRunCPUSAT
m.screen = screenConfirm
m.cursor = 0
return m, nil
}
return m, nil
}
func (m model) hcRunAll() (tea.Model, tea.Cmd) {
for _, sel := range m.hcSel {
if sel {
m.pendingAction = actionRunAll
m.screen = screenConfirm
m.cursor = 0
return m, nil
}
}
return m, nil
}
func (m model) executeRunAll() (tea.Model, tea.Cmd) {
durationSec := hcModeDurations[m.hcMode]
durationIdx := m.hcMode
sel := m.hcSel
app := m.app
m.busy = true
m.busyTitle = "Health Check"
return m, func() tea.Msg {
var parts []string
if sel[hcGPU] {
vendor := app.DetectGPUVendor()
if vendor == "amd" {
r, err := app.RunAMDAcceptancePackResult("")
body := r.Body
if err != nil {
body += "\nERROR: " + err.Error()
}
parts = append(parts, "=== GPU (AMD) ===\n"+body)
} else {
gpus, err := app.ListNvidiaGPUs()
if err != nil || len(gpus) == 0 {
parts = append(parts, "=== GPU ===\nNo NVIDIA GPUs detected or driver not loaded.")
} else {
var indices []int
sizeMB := 0
for _, g := range gpus {
indices = append(indices, g.Index)
if sizeMB == 0 || g.MemoryMB < sizeMB {
sizeMB = g.MemoryMB
}
}
if sizeMB == 0 {
sizeMB = 64
}
r, err := app.RunNvidiaAcceptancePackWithOptions(context.Background(), "", durationSec, sizeMB, indices)
body := r.Body
if err != nil {
body += "\nERROR: " + err.Error()
}
parts = append(parts, "=== GPU ===\n"+body)
}
}
}
if sel[hcMemory] {
r, err := app.RunMemoryAcceptancePackResult("")
body := r.Body
if err != nil {
body += "\nERROR: " + err.Error()
}
parts = append(parts, "=== MEMORY ===\n"+body)
}
if sel[hcStorage] {
r, err := app.RunStorageAcceptancePackResult("")
body := r.Body
if err != nil {
body += "\nERROR: " + err.Error()
}
parts = append(parts, "=== STORAGE ===\n"+body)
}
if sel[hcCPU] {
cpuDur := hcCPUDurations[durationIdx]
r, err := app.RunCPUAcceptancePackResult("", cpuDur)
body := r.Body
if err != nil {
body += "\nERROR: " + err.Error()
}
parts = append(parts, "=== CPU ===\n"+body)
}
combined := strings.Join(parts, "\n\n")
if combined == "" {
combined = "No components selected."
}
return resultMsg{title: "Health Check", body: combined, back: screenHealthCheck}
}
}
func renderHealthCheck(m model) string {
var b strings.Builder
fmt.Fprintln(&b, "HEALTH CHECK")
fmt.Fprintln(&b)
fmt.Fprintln(&b, " Diagnostics:")
fmt.Fprintln(&b)
type comp struct{ name, desc, key string }
comps := []comp{
{"GPU", "nvidia/amd auto-detect", "G"},
{"MEMORY", "memtester", "M"},
{"STORAGE", "smartctl + NVMe self-test", "S"},
{"CPU", "audit diagnostics", "C"},
}
for i, c := range comps {
pfx := " "
if m.hcCursor == i {
pfx = "> "
}
ch := "[ ]"
if m.hcSel[i] {
ch = "[x]"
}
fmt.Fprintf(&b, "%s%s %-8s %-28s [%s]\n", pfx, ch, c.name, c.desc, c.key)
}
fmt.Fprintln(&b, " ─────────────────────────────────────────────────")
{
pfx := " "
if m.hcCursor == hcCurSelectAll {
pfx = "> "
}
allOn := m.hcSel[0] && m.hcSel[1] && m.hcSel[2] && m.hcSel[3]
ch := "[ ]"
if allOn {
ch = "[x]"
}
fmt.Fprintf(&b, "%s%s Select / Deselect All [A]\n", pfx, ch)
}
fmt.Fprintln(&b)
fmt.Fprintln(&b, " Mode:")
modes := []struct{ label, key string }{
{"Quick", "1"},
{"Standard", "2"},
{"Express", "3"},
}
for i, mode := range modes {
pfx := " "
if m.hcCursor == hcCurModeQuick+i {
pfx = "> "
}
radio := "( )"
if m.hcMode == i {
radio = "(*)"
}
fmt.Fprintf(&b, "%s%s %-10s [%s]\n", pfx, radio, mode.label, mode.key)
}
fmt.Fprintln(&b)
{
pfx := " "
if m.hcCursor == hcCurRunAll {
pfx = "> "
}
fmt.Fprintf(&b, "%s[ RUN ALL [R] ]\n", pfx)
}
fmt.Fprintln(&b)
fmt.Fprintln(&b, "─────────────────────────────────────────────────────────────────")
fmt.Fprint(&b, "[↑↓] move [space/enter] toggle [letter] single test [R] run all [Esc] back")
return b.String()
}

View File

@@ -1,27 +0,0 @@
package tui
import (
tea "github.com/charmbracelet/bubbletea"
)
func (m model) handleMainMenu() (tea.Model, tea.Cmd) {
switch m.cursor {
case 0: // Health Check
return m.enterHealthCheck()
case 1: // Export support bundle
m.pendingAction = actionExportBundle
m.busy = true
m.busyTitle = "Export support bundle"
return m, func() tea.Msg {
targets, err := m.app.ListRemovableTargets()
return exportTargetsMsg{targets: targets, err: err}
}
case 2: // Settings
m.screen = screenSettings
m.cursor = 0
return m, nil
case 3: // Exit
return m, tea.Quit
}
return m, nil
}

View File

@@ -1,76 +0,0 @@
package tui
import (
"strings"
tea "github.com/charmbracelet/bubbletea"
)
func (m model) handleNetworkMenu() (tea.Model, tea.Cmd) {
switch m.cursor {
case 0:
m.busy = true
m.busyTitle = "Network status"
return m, func() tea.Msg {
result, err := m.app.NetworkStatus()
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenNetwork}
}
case 1:
m.busy = true
m.busyTitle = "DHCP all interfaces"
return m, func() tea.Msg {
result, err := m.app.DHCPAllResult()
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenNetwork}
}
case 2:
m.pendingAction = actionDHCPOne
m.busy = true
m.busyTitle = "Interfaces"
return m, func() tea.Msg {
ifaces, err := m.app.ListInterfaces()
return interfacesMsg{ifaces: ifaces, err: err}
}
case 3:
m.pendingAction = actionStaticIPv4
m.busy = true
m.busyTitle = "Interfaces"
return m, func() tea.Msg {
ifaces, err := m.app.ListInterfaces()
return interfacesMsg{ifaces: ifaces, err: err}
}
case 4:
m.screen = screenSettings
m.cursor = 0
return m, nil
}
return m, nil
}
func (m model) handleInterfacePickMenu() (tea.Model, tea.Cmd) {
if len(m.interfaces) == 0 {
return m, resultCmd("interfaces", "No physical interfaces found", nil, screenNetwork)
}
m.selectedIface = m.interfaces[m.cursor].Name
switch m.pendingAction {
case actionDHCPOne:
m.busy = true
m.busyTitle = "DHCP on " + m.selectedIface
return m, func() tea.Msg {
result, err := m.app.DHCPOneResult(m.selectedIface)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenNetwork}
}
case actionStaticIPv4:
defaults := m.app.DefaultStaticIPv4FormFields(m.selectedIface)
m.formFields = []formField{
{Label: "IPv4 address", Value: defaults[0]},
{Label: "Prefix", Value: defaults[1]},
{Label: "Gateway", Value: strings.TrimSpace(defaults[2])},
{Label: "DNS (space-separated)", Value: defaults[3]},
}
m.formIndex = 0
m.screen = screenStaticForm
return m, nil
default:
return m, nil
}
}

View File

@@ -1,238 +0,0 @@
package tui
import (
"context"
"fmt"
"os/exec"
"strings"
"bee/audit/internal/platform"
tea "github.com/charmbracelet/bubbletea"
)
var nvidiaDurationOptions = []struct {
label string
seconds int
}{
{"10 minutes", 600},
{"1 hour", 3600},
{"8 hours", 28800},
{"24 hours", 86400},
}
// enterNvidiaSATSetup resets the setup screen and starts loading GPU list.
func (m model) enterNvidiaSATSetup() (tea.Model, tea.Cmd) {
m.screen = screenNvidiaSATSetup
m.nvidiaGPUs = nil
m.nvidiaGPUSel = nil
m.nvidiaDurIdx = 0
m.nvidiaSATCursor = 0
m.busy = true
m.busyTitle = "NVIDIA SAT"
return m, func() tea.Msg {
gpus, err := m.app.ListNvidiaGPUs()
return nvidiaGPUsMsg{gpus: gpus, err: err}
}
}
// handleNvidiaGPUsMsg processes the GPU list response.
func (m model) handleNvidiaGPUsMsg(msg nvidiaGPUsMsg) (tea.Model, tea.Cmd) {
m.busy = false
m.busyTitle = ""
if msg.err != nil {
m.title = "NVIDIA SAT"
m.body = fmt.Sprintf("Failed to list GPUs: %v", msg.err)
m.prevScreen = screenHealthCheck
m.screen = screenOutput
return m, nil
}
m.nvidiaGPUs = msg.gpus
m.nvidiaGPUSel = make([]bool, len(msg.gpus))
for i := range m.nvidiaGPUSel {
m.nvidiaGPUSel[i] = true // all selected by default
}
m.nvidiaSATCursor = 0
return m, nil
}
// updateNvidiaSATSetup handles keys on the setup screen.
func (m model) updateNvidiaSATSetup(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
numDur := len(nvidiaDurationOptions)
numGPU := len(m.nvidiaGPUs)
totalItems := numDur + numGPU + 2 // +2: Start, Cancel
switch msg.String() {
case "up", "k":
if m.nvidiaSATCursor > 0 {
m.nvidiaSATCursor--
}
case "down", "j":
if m.nvidiaSATCursor < totalItems-1 {
m.nvidiaSATCursor++
}
case " ":
switch {
case m.nvidiaSATCursor < numDur:
m.nvidiaDurIdx = m.nvidiaSATCursor
case m.nvidiaSATCursor < numDur+numGPU:
i := m.nvidiaSATCursor - numDur
m.nvidiaGPUSel[i] = !m.nvidiaGPUSel[i]
}
case "enter":
startIdx := numDur + numGPU
cancelIdx := startIdx + 1
switch {
case m.nvidiaSATCursor < numDur:
m.nvidiaDurIdx = m.nvidiaSATCursor
case m.nvidiaSATCursor < startIdx:
i := m.nvidiaSATCursor - numDur
m.nvidiaGPUSel[i] = !m.nvidiaGPUSel[i]
case m.nvidiaSATCursor == startIdx:
return m.startNvidiaSAT()
case m.nvidiaSATCursor == cancelIdx:
m.screen = screenHealthCheck
m.cursor = 0
}
case "esc":
m.screen = screenHealthCheck
m.cursor = 0
case "ctrl+c", "q":
return m, tea.Quit
}
return m, nil
}
// startNvidiaSAT launches the SAT and nvtop.
func (m model) startNvidiaSAT() (tea.Model, tea.Cmd) {
var selectedGPUs []platform.NvidiaGPU
for i, sel := range m.nvidiaGPUSel {
if sel {
selectedGPUs = append(selectedGPUs, m.nvidiaGPUs[i])
}
}
if len(selectedGPUs) == 0 {
selectedGPUs = m.nvidiaGPUs // fallback: use all if none explicitly selected
}
sizeMB := 0
for _, g := range selectedGPUs {
if sizeMB == 0 || g.MemoryMB < sizeMB {
sizeMB = g.MemoryMB
}
}
if sizeMB == 0 {
sizeMB = 64
}
var gpuIndices []int
for _, g := range selectedGPUs {
gpuIndices = append(gpuIndices, g.Index)
}
durationSec := nvidiaDurationOptions[m.nvidiaDurIdx].seconds
ctx, cancel := context.WithCancel(context.Background())
m.nvidiaSATCancel = cancel
m.nvidiaSATAborted = false
m.screen = screenNvidiaSATRunning
m.nvidiaSATCursor = 0
satCmd := func() tea.Msg {
result, err := m.app.RunNvidiaAcceptancePackWithOptions(ctx, "", durationSec, sizeMB, gpuIndices)
return nvidiaSATDoneMsg{title: result.Title, body: result.Body, err: err}
}
nvtopPath, lookErr := exec.LookPath("nvtop")
if lookErr != nil {
// nvtop not available: just run the SAT, show running screen
return m, satCmd
}
return m, tea.Batch(
satCmd,
tea.ExecProcess(exec.Command(nvtopPath), func(_ error) tea.Msg {
return nvtopClosedMsg{}
}),
)
}
// updateNvidiaSATRunning handles keys on the running screen.
func (m model) updateNvidiaSATRunning(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
switch msg.String() {
case "o", "O":
nvtopPath, err := exec.LookPath("nvtop")
if err != nil {
return m, nil
}
return m, tea.ExecProcess(exec.Command(nvtopPath), func(_ error) tea.Msg {
return nvtopClosedMsg{}
})
case "a", "A":
if m.nvidiaSATCancel != nil {
m.nvidiaSATCancel()
m.nvidiaSATCancel = nil
}
m.nvidiaSATAborted = true
m.screen = screenHealthCheck
m.cursor = 0
case "ctrl+c":
return m, tea.Quit
}
return m, nil
}
// renderNvidiaSATSetup renders the setup screen.
func renderNvidiaSATSetup(m model) string {
var b strings.Builder
fmt.Fprintln(&b, "NVIDIA SAT")
fmt.Fprintln(&b)
fmt.Fprintln(&b, "Duration:")
for i, opt := range nvidiaDurationOptions {
radio := "( )"
if i == m.nvidiaDurIdx {
radio = "(*)"
}
prefix := " "
if m.nvidiaSATCursor == i {
prefix = "> "
}
fmt.Fprintf(&b, "%s%s %s\n", prefix, radio, opt.label)
}
fmt.Fprintln(&b)
if len(m.nvidiaGPUs) == 0 {
fmt.Fprintln(&b, "GPUs: (none detected)")
} else {
fmt.Fprintln(&b, "GPUs:")
for i, gpu := range m.nvidiaGPUs {
check := "[ ]"
if m.nvidiaGPUSel[i] {
check = "[x]"
}
prefix := " "
if m.nvidiaSATCursor == len(nvidiaDurationOptions)+i {
prefix = "> "
}
fmt.Fprintf(&b, "%s%s %d: %s (%d MB)\n", prefix, check, gpu.Index, gpu.Name, gpu.MemoryMB)
}
}
fmt.Fprintln(&b)
startIdx := len(nvidiaDurationOptions) + len(m.nvidiaGPUs)
startPfx := " "
cancelPfx := " "
if m.nvidiaSATCursor == startIdx {
startPfx = "> "
}
if m.nvidiaSATCursor == startIdx+1 {
cancelPfx = "> "
}
fmt.Fprintf(&b, "%sStart\n", startPfx)
fmt.Fprintf(&b, "%sCancel\n", cancelPfx)
fmt.Fprintln(&b)
b.WriteString("[↑/↓] move [space] toggle [enter] select [esc] cancel\n")
return b.String()
}
// renderNvidiaSATRunning renders the running screen.
func renderNvidiaSATRunning() string {
return "NVIDIA SAT\n\nTest is running...\n\n[o] Open nvtop [a] Abort test [ctrl+c] quit\n"
}

View File

@@ -1,47 +0,0 @@
package tui
import (
"bee/audit/internal/platform"
tea "github.com/charmbracelet/bubbletea"
)
func (m model) handleServicesMenu() (tea.Model, tea.Cmd) {
if len(m.services) == 0 {
return m, resultCmd("Services", "No bee-* services found.", nil, screenSettings)
}
m.selectedService = m.services[m.cursor]
m.screen = screenServiceAction
m.cursor = 0
return m, nil
}
func (m model) handleServiceActionMenu() (tea.Model, tea.Cmd) {
action := m.serviceMenu[m.cursor]
if action == "back" {
m.screen = screenServices
m.cursor = 0
return m, nil
}
m.busy = true
m.busyTitle = "service: " + m.selectedService
return m, func() tea.Msg {
switch action {
case "Status":
result, err := m.app.ServiceStatusResult(m.selectedService)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenServiceAction}
case "Restart":
result, err := m.app.ServiceActionResult(m.selectedService, platform.ServiceRestart)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenServiceAction}
case "Start":
result, err := m.app.ServiceActionResult(m.selectedService, platform.ServiceStart)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenServiceAction}
case "Stop":
result, err := m.app.ServiceActionResult(m.selectedService, platform.ServiceStop)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenServiceAction}
default:
return resultMsg{title: "Service", body: "Unknown action.", back: screenServiceAction}
}
}
}

View File

@@ -1,64 +0,0 @@
package tui
import tea "github.com/charmbracelet/bubbletea"
func (m model) handleSettingsMenu() (tea.Model, tea.Cmd) {
switch m.cursor {
case 0: // Network
m.screen = screenNetwork
m.cursor = 0
return m, nil
case 1: // Services
m.busy = true
m.busyTitle = "Services"
return m, func() tea.Msg {
services, err := m.app.ListBeeServices()
return servicesMsg{services: services, err: err}
}
case 2: // Re-run audit
m.busy = true
m.busyTitle = "Re-run audit"
runtimeMode := m.runtimeMode
return m, func() tea.Msg {
result, err := m.app.RunAuditNow(runtimeMode)
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenSettings}
}
case 3: // Run self-check
m.busy = true
m.busyTitle = "Self-check"
return m, func() tea.Msg {
result, err := m.app.RunRuntimePreflightResult()
return resultMsg{title: result.Title, body: result.Body, err: err, back: screenSettings}
}
case 4: // Runtime issues
m.busy = true
m.busyTitle = "Runtime issues"
return m, func() tea.Msg {
result := m.app.RuntimeHealthResult()
return resultMsg{title: result.Title, body: result.Body, back: screenSettings}
}
case 5: // Audit logs
m.busy = true
m.busyTitle = "Audit logs"
return m, func() tea.Msg {
result := m.app.AuditLogTailResult()
return resultMsg{title: result.Title, body: result.Body, back: screenSettings}
}
case 6: // Check tools
m.busy = true
m.busyTitle = "Check tools"
return m, func() tea.Msg {
result := m.app.ToolCheckResult([]string{
"dmidecode", "smartctl", "nvme", "ipmitool", "lspci",
"ethtool", "bee", "nvidia-smi", "bee-gpu-stress",
"memtester", "dhclient", "lsblk", "mount",
})
return resultMsg{title: result.Title, body: result.Body, back: screenSettings}
}
case 7: // Back
m.screen = screenMain
m.cursor = 0
return m, nil
}
return m, nil
}

View File

@@ -1,579 +0,0 @@
package tui
import (
"strings"
"testing"
"bee/audit/internal/app"
"bee/audit/internal/platform"
"bee/audit/internal/runtimeenv"
tea "github.com/charmbracelet/bubbletea"
)
func newTestModel() model {
return newModel(app.New(platform.New()), runtimeenv.ModeLocal)
}
func sendKey(t *testing.T, m model, key tea.KeyType) model {
t.Helper()
next, _ := m.Update(tea.KeyMsg{Type: key})
return next.(model)
}
func TestUpdateMainMenuCursorNavigation(t *testing.T) {
t.Parallel()
m := newTestModel()
m = sendKey(t, m, tea.KeyDown)
if m.cursor != 1 {
t.Fatalf("cursor=%d want 1 after down", m.cursor)
}
m = sendKey(t, m, tea.KeyDown)
if m.cursor != 2 {
t.Fatalf("cursor=%d want 2 after second down", m.cursor)
}
m = sendKey(t, m, tea.KeyUp)
if m.cursor != 1 {
t.Fatalf("cursor=%d want 1 after up", m.cursor)
}
}
func TestUpdateMainMenuEnterActions(t *testing.T) {
t.Parallel()
tests := []struct {
name string
cursor int
wantScreen screen
wantBusy bool
wantCmd bool
}{
{name: "health_check", cursor: 0, wantScreen: screenHealthCheck},
{name: "export", cursor: 1, wantScreen: screenMain, wantBusy: true, wantCmd: true},
{name: "settings", cursor: 2, wantScreen: screenSettings},
{name: "exit", cursor: 3, wantScreen: screenMain, wantCmd: true},
}
for _, test := range tests {
test := test
t.Run(test.name, func(t *testing.T) {
t.Parallel()
m := newTestModel()
m.cursor = test.cursor
next, cmd := m.Update(tea.KeyMsg{Type: tea.KeyEnter})
got := next.(model)
if got.screen != test.wantScreen {
t.Fatalf("screen=%q want %q", got.screen, test.wantScreen)
}
if got.busy != test.wantBusy {
t.Fatalf("busy=%v want %v", got.busy, test.wantBusy)
}
if (cmd != nil) != test.wantCmd {
t.Fatalf("cmd present=%v want %v", cmd != nil, test.wantCmd)
}
})
}
}
func TestUpdateConfirmCancelViaKeys(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenConfirm
m.pendingAction = actionRunMemorySAT
next, _ := m.Update(tea.KeyMsg{Type: tea.KeyRight})
got := next.(model)
if got.cursor != 1 {
t.Fatalf("cursor=%d want 1 after right", got.cursor)
}
next, _ = got.Update(tea.KeyMsg{Type: tea.KeyEnter})
got = next.(model)
if got.screen != screenHealthCheck {
t.Fatalf("screen=%q want %q", got.screen, screenHealthCheck)
}
if got.cursor != 0 {
t.Fatalf("cursor=%d want 0 after cancel", got.cursor)
}
}
func TestMainMenuSimpleTransitions(t *testing.T) {
t.Parallel()
tests := []struct {
name string
cursor int
wantScreen screen
}{
{name: "health_check", cursor: 0, wantScreen: screenHealthCheck},
{name: "settings", cursor: 2, wantScreen: screenSettings},
}
for _, test := range tests {
test := test
t.Run(test.name, func(t *testing.T) {
t.Parallel()
m := newTestModel()
m.cursor = test.cursor
next, cmd := m.handleMainMenu()
got := next.(model)
if cmd != nil {
t.Fatalf("expected nil cmd for %s", test.name)
}
if got.screen != test.wantScreen {
t.Fatalf("screen=%q want %q", got.screen, test.wantScreen)
}
if got.cursor != 0 {
t.Fatalf("cursor=%d want 0", got.cursor)
}
})
}
}
func TestMainMenuExportSetsBusy(t *testing.T) {
t.Parallel()
m := newTestModel()
m.cursor = 1 // Export support bundle
next, cmd := m.handleMainMenu()
got := next.(model)
if !got.busy {
t.Fatal("busy=false for export")
}
if cmd == nil {
t.Fatal("expected async cmd for export")
}
}
func TestMainViewRendersTwoColumns(t *testing.T) {
t.Parallel()
m := newTestModel()
m.cursor = 1
view := m.View()
for _, want := range []string{
"bee",
"Health Check",
"> Export support bundle",
"Settings",
"Exit",
"│",
"[↑↓] move",
} {
if !strings.Contains(view, want) {
t.Fatalf("view missing %q\nview:\n%s", want, view)
}
}
}
func TestEscapeNavigation(t *testing.T) {
t.Parallel()
tests := []struct {
name string
screen screen
wantScreen screen
}{
{name: "network to settings", screen: screenNetwork, wantScreen: screenSettings},
{name: "services to settings", screen: screenServices, wantScreen: screenSettings},
{name: "settings to main", screen: screenSettings, wantScreen: screenMain},
{name: "service action to services", screen: screenServiceAction, wantScreen: screenServices},
{name: "export targets to main", screen: screenExportTargets, wantScreen: screenMain},
{name: "interface pick to network", screen: screenInterfacePick, wantScreen: screenNetwork},
}
for _, test := range tests {
test := test
t.Run(test.name, func(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = test.screen
m.cursor = 3
next, _ := m.updateKey(tea.KeyMsg{Type: tea.KeyEsc})
got := next.(model)
if got.screen != test.wantScreen {
t.Fatalf("screen=%q want %q", got.screen, test.wantScreen)
}
if got.cursor != 0 {
t.Fatalf("cursor=%d want 0", got.cursor)
}
})
}
}
func TestHealthCheckEscReturnsToMain(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenHealthCheck
m.hcCursor = 3
next, _ := m.updateHealthCheck(tea.KeyMsg{Type: tea.KeyEsc})
got := next.(model)
if got.screen != screenMain {
t.Fatalf("screen=%q want %q", got.screen, screenMain)
}
if got.cursor != 0 {
t.Fatalf("cursor=%d want 0", got.cursor)
}
}
func TestOutputScreenReturnsToPreviousScreen(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenOutput
m.prevScreen = screenNetwork
m.title = "title"
m.body = "body"
next, _ := m.updateKey(tea.KeyMsg{Type: tea.KeyEnter})
got := next.(model)
if got.screen != screenNetwork {
t.Fatalf("screen=%q want %q", got.screen, screenNetwork)
}
if got.title != "" || got.body != "" {
t.Fatalf("expected output state cleared, got title=%q body=%q", got.title, got.body)
}
}
func TestHealthCheckGPUOpensNvidiaSATSetup(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenHealthCheck
m.hcInitialized = true
m.hcSel = [4]bool{true, true, true, true}
next, cmd := m.hcRunSingle(hcGPU)
got := next.(model)
if cmd == nil {
t.Fatal("expected non-nil cmd (GPU list loader)")
}
if got.screen != screenNvidiaSATSetup {
t.Fatalf("screen=%q want %q", got.screen, screenNvidiaSATSetup)
}
// esc from setup returns to health check
next, _ = got.updateNvidiaSATSetup(tea.KeyMsg{Type: tea.KeyEsc})
got = next.(model)
if got.screen != screenHealthCheck {
t.Fatalf("screen after esc=%q want %q", got.screen, screenHealthCheck)
}
}
func TestHealthCheckRunSingleMapsActions(t *testing.T) {
t.Parallel()
tests := []struct {
idx int
want actionKind
}{
{idx: hcMemory, want: actionRunMemorySAT},
{idx: hcStorage, want: actionRunStorageSAT},
}
for _, test := range tests {
m := newTestModel()
m.screen = screenHealthCheck
m.hcInitialized = true
next, _ := m.hcRunSingle(test.idx)
got := next.(model)
if got.pendingAction != test.want {
t.Fatalf("idx=%d pendingAction=%q want %q", test.idx, got.pendingAction, test.want)
}
if got.screen != screenConfirm {
t.Fatalf("idx=%d screen=%q want %q", test.idx, got.screen, screenConfirm)
}
}
}
func TestExportTargetSelectionOpensConfirm(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenExportTargets
m.targets = []platform.RemovableTarget{{Device: "/dev/sdb1", FSType: "vfat", Size: "16G"}}
next, cmd := m.handleExportTargetsMenu()
got := next.(model)
if cmd != nil {
t.Fatal("expected nil cmd")
}
if got.screen != screenConfirm {
t.Fatalf("screen=%q want %q", got.screen, screenConfirm)
}
if got.pendingAction != actionExportBundle {
t.Fatalf("pendingAction=%q want %q", got.pendingAction, actionExportBundle)
}
if got.selectedTarget == nil || got.selectedTarget.Device != "/dev/sdb1" {
t.Fatalf("selectedTarget=%+v want /dev/sdb1", got.selectedTarget)
}
}
func TestInterfacePickStaticIPv4OpensForm(t *testing.T) {
t.Parallel()
m := newTestModel()
m.pendingAction = actionStaticIPv4
m.interfaces = []platform.InterfaceInfo{{Name: "eth0"}}
next, cmd := m.handleInterfacePickMenu()
got := next.(model)
if cmd != nil {
t.Fatal("expected nil cmd")
}
if got.screen != screenStaticForm {
t.Fatalf("screen=%q want %q", got.screen, screenStaticForm)
}
if got.selectedIface != "eth0" {
t.Fatalf("selectedIface=%q want eth0", got.selectedIface)
}
if len(got.formFields) != 4 {
t.Fatalf("len(formFields)=%d want 4", len(got.formFields))
}
}
func TestResultMsgUsesExplicitBackScreen(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenConfirm
next, _ := m.Update(resultMsg{title: "done", body: "ok", back: screenNetwork})
got := next.(model)
if got.screen != screenOutput {
t.Fatalf("screen=%q want %q", got.screen, screenOutput)
}
if got.prevScreen != screenNetwork {
t.Fatalf("prevScreen=%q want %q", got.prevScreen, screenNetwork)
}
}
func TestConfirmCancelTarget(t *testing.T) {
t.Parallel()
m := newTestModel()
m.pendingAction = actionExportBundle
if got := m.confirmCancelTarget(); got != screenExportTargets {
t.Fatalf("export cancel target=%q want %q", got, screenExportTargets)
}
m.pendingAction = actionRunAll
if got := m.confirmCancelTarget(); got != screenHealthCheck {
t.Fatalf("run all cancel target=%q want %q", got, screenHealthCheck)
}
m.pendingAction = actionRunMemorySAT
if got := m.confirmCancelTarget(); got != screenHealthCheck {
t.Fatalf("memory sat cancel target=%q want %q", got, screenHealthCheck)
}
m.pendingAction = actionRunStorageSAT
if got := m.confirmCancelTarget(); got != screenHealthCheck {
t.Fatalf("storage sat cancel target=%q want %q", got, screenHealthCheck)
}
m.pendingAction = actionNone
if got := m.confirmCancelTarget(); got != screenMain {
t.Fatalf("default cancel target=%q want %q", got, screenMain)
}
}
func TestViewBusyStateIsMinimal(t *testing.T) {
t.Parallel()
m := newTestModel()
m.busy = true
view := m.View()
want := "bee\n\nWorking...\n\n[ctrl+c] quit\n"
if view != want {
t.Fatalf("busy view mismatch\nwant:\n%s\ngot:\n%s", want, view)
}
}
func TestViewBusyStateUsesBusyTitle(t *testing.T) {
t.Parallel()
m := newTestModel()
m.busy = true
m.busyTitle = "Export support bundle"
view := m.View()
for _, want := range []string{
"Export support bundle",
"Working...",
"[ctrl+c] quit",
} {
if !strings.Contains(view, want) {
t.Fatalf("view missing %q\nview:\n%s", want, view)
}
}
}
func TestViewOutputScreenRendersBodyAndBackHint(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenOutput
m.title = "Run audit"
m.body = "audit output: /appdata/bee/export/bee-audit.json\n"
view := m.View()
for _, want := range []string{
"Run audit",
"audit output: /appdata/bee/export/bee-audit.json",
"[enter/esc] back [ctrl+c] quit",
} {
if !strings.Contains(view, want) {
t.Fatalf("view missing %q\nview:\n%s", want, view)
}
}
}
func TestViewExportTargetsRendersDeviceMetadata(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenExportTargets
m.targets = []platform.RemovableTarget{
{
Device: "/dev/sdb1",
FSType: "vfat",
Size: "29G",
Label: "BEEUSB",
Mountpoint: "/media/bee",
},
}
view := m.View()
for _, want := range []string{
"Export support bundle",
"Select removable filesystem",
"> /dev/sdb1 [vfat 29G] label=BEEUSB mounted=/media/bee",
} {
if !strings.Contains(view, want) {
t.Fatalf("view missing %q\nview:\n%s", want, view)
}
}
}
func TestViewStaticFormRendersFields(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenStaticForm
m.selectedIface = "enp1s0"
m.formFields = []formField{
{Label: "Address", Value: "192.0.2.10/24"},
{Label: "Gateway", Value: "192.0.2.1"},
{Label: "DNS", Value: "1.1.1.1"},
}
m.formIndex = 1
view := m.View()
for _, want := range []string{
"Static IPv4: enp1s0",
" Address: 192.0.2.10/24",
"> Gateway: 192.0.2.1",
" DNS: 1.1.1.1",
"[tab/↑/↓] move [enter] next/submit [backspace] delete [esc] cancel",
} {
if !strings.Contains(view, want) {
t.Fatalf("view missing %q\nview:\n%s", want, view)
}
}
}
func TestViewConfirmScreenMatchesPendingExport(t *testing.T) {
t.Parallel()
m := newTestModel()
m.screen = screenConfirm
m.pendingAction = actionExportBundle
m.selectedTarget = &platform.RemovableTarget{Device: "/dev/sdb1"}
view := m.View()
for _, want := range []string{
"Export support bundle",
"Copy support bundle to /dev/sdb1?",
"> Confirm",
" Cancel",
} {
if !strings.Contains(view, want) {
t.Fatalf("view missing %q\nview:\n%s", want, view)
}
}
}
func TestResultMsgClearsBusyAndPendingAction(t *testing.T) {
t.Parallel()
m := newTestModel()
m.busy = true
m.busyTitle = "Export support bundle"
m.pendingAction = actionExportBundle
m.screen = screenConfirm
next, _ := m.Update(resultMsg{title: "Export support bundle", body: "done", back: screenMain})
got := next.(model)
if got.busy {
t.Fatal("busy=true want false")
}
if got.busyTitle != "" {
t.Fatalf("busyTitle=%q want empty", got.busyTitle)
}
if got.pendingAction != actionNone {
t.Fatalf("pendingAction=%q want empty", got.pendingAction)
}
}
func TestResultMsgErrorWithoutBodyFormatsCleanly(t *testing.T) {
t.Parallel()
m := newTestModel()
next, _ := m.Update(resultMsg{title: "Export support bundle", err: assertErr("boom"), back: screenMain})
got := next.(model)
if got.body != "ERROR: boom" {
t.Fatalf("body=%q want %q", got.body, "ERROR: boom")
}
}
type assertErr string
func (e assertErr) Error() string { return string(e) }

View File

@@ -1,192 +0,0 @@
package tui
import (
"strings"
"time"
"bee/audit/internal/app"
"bee/audit/internal/platform"
"bee/audit/internal/runtimeenv"
tea "github.com/charmbracelet/bubbletea"
)
type screen string
const (
screenMain screen = "main"
screenHealthCheck screen = "health_check"
screenSettings screen = "settings"
screenNetwork screen = "network"
screenInterfacePick screen = "interface_pick"
screenServices screen = "services"
screenServiceAction screen = "service_action"
screenExportTargets screen = "export_targets"
screenOutput screen = "output"
screenStaticForm screen = "static_form"
screenConfirm screen = "confirm"
screenNvidiaSATSetup screen = "nvidia_sat_setup"
screenNvidiaSATRunning screen = "nvidia_sat_running"
)
type actionKind string
const (
actionNone actionKind = ""
actionDHCPOne actionKind = "dhcp_one"
actionStaticIPv4 actionKind = "static_ipv4"
actionExportBundle actionKind = "export_bundle"
actionRunAll actionKind = "run_all"
actionRunMemorySAT actionKind = "run_memory_sat"
actionRunStorageSAT actionKind = "run_storage_sat"
actionRunCPUSAT actionKind = "run_cpu_sat"
actionRunAMDGPUSAT actionKind = "run_amd_gpu_sat"
)
type model struct {
app *app.App
runtimeMode runtimeenv.Mode
screen screen
prevScreen screen
cursor int
busy bool
busyTitle string
title string
body string
mainMenu []string
settingsMenu []string
networkMenu []string
serviceMenu []string
services []string
interfaces []platform.InterfaceInfo
targets []platform.RemovableTarget
selectedService string
selectedIface string
selectedTarget *platform.RemovableTarget
pendingAction actionKind
formFields []formField
formIndex int
// Hardware panel (right column)
panel app.HardwarePanelData
panelFocus bool
panelCursor int
// Health Check screen
hcSel [4]bool
hcMode int
hcCursor int
hcInitialized bool
// NVIDIA SAT setup
nvidiaGPUs []platform.NvidiaGPU
nvidiaGPUSel []bool
nvidiaDurIdx int
nvidiaSATCursor int
// NVIDIA SAT running
nvidiaSATCancel func()
nvidiaSATAborted bool
// SAT verbose progress (CPU / Memory / Storage / AMD GPU)
progressLines []string
progressPrefix string
progressSince time.Time
}
type formField struct {
Label string
Value string
}
func Run(application *app.App, runtimeMode runtimeenv.Mode) error {
options := []tea.ProgramOption{}
if runtimeMode != runtimeenv.ModeLiveCD {
options = append(options, tea.WithAltScreen())
}
program := tea.NewProgram(newModel(application, runtimeMode), options...)
_, err := program.Run()
return err
}
func newModel(application *app.App, runtimeMode runtimeenv.Mode) model {
return model{
app: application,
runtimeMode: runtimeMode,
screen: screenMain,
mainMenu: []string{
"Health Check",
"Export support bundle",
"Settings",
"Exit",
},
settingsMenu: []string{
"Network",
"Services",
"Re-run audit",
"Run self-check",
"Runtime issues",
"Audit logs",
"Check tools",
"Back",
},
networkMenu: []string{
"Show status",
"DHCP on all interfaces",
"DHCP on one interface",
"Set static IPv4",
"Back",
},
serviceMenu: []string{
"Status",
"Restart",
"Start",
"Stop",
"Back",
},
}
}
func (m model) Init() tea.Cmd {
return func() tea.Msg {
return panelMsg{data: m.app.LoadHardwarePanel()}
}
}
func (m model) confirmBody() (string, string) {
switch m.pendingAction {
case actionExportBundle:
if m.selectedTarget == nil {
return "Export support bundle", "No target selected"
}
return "Export support bundle", "Copy support bundle to " + m.selectedTarget.Device + "?"
case actionRunAll:
modes := []string{"Quick", "Standard", "Express"}
mode := modes[m.hcMode]
var sel []string
names := []string{"GPU", "Memory", "Storage", "CPU"}
for i, on := range m.hcSel {
if on {
sel = append(sel, names[i])
}
}
if len(sel) == 0 {
return "Health Check", "No components selected."
}
return "Health Check", "Run: " + strings.Join(sel, " + ") + "\nMode: " + mode
case actionRunMemorySAT:
return "Memory test", "Run memtester?"
case actionRunStorageSAT:
return "Storage test", "Run storage diagnostic pack?"
case actionRunCPUSAT:
modes := []string{"Quick (60s)", "Standard (300s)", "Express (900s)"}
return "CPU test", "Run stress-ng? Mode: " + modes[m.hcMode]
case actionRunAMDGPUSAT:
return "AMD GPU test", "Run AMD GPU diagnostic pack (rocm-smi)?"
default:
return "Confirm", "Proceed?"
}
}

View File

@@ -1,255 +0,0 @@
package tui
import (
"fmt"
"strings"
tea "github.com/charmbracelet/bubbletea"
)
func (m model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
switch msg := msg.(type) {
case tea.KeyMsg:
if m.busy {
if msg.String() == "ctrl+c" {
return m, tea.Quit
}
return m, nil
}
return m.updateKey(msg)
case satProgressMsg:
if m.busy && m.progressPrefix != "" {
if len(msg.lines) > 0 {
m.progressLines = msg.lines
}
return m, pollSATProgress(m.progressPrefix, m.progressSince)
}
return m, nil
case resultMsg:
m.busy = false
m.busyTitle = ""
m.progressLines = nil
m.progressPrefix = ""
m.title = msg.title
if msg.err != nil {
body := strings.TrimSpace(msg.body)
if body == "" {
m.body = fmt.Sprintf("ERROR: %v", msg.err)
} else {
m.body = fmt.Sprintf("%s\n\nERROR: %v", body, msg.err)
}
} else {
m.body = msg.body
}
m.pendingAction = actionNone
if msg.back != "" {
m.prevScreen = msg.back
} else {
m.prevScreen = m.screen
}
m.screen = screenOutput
m.cursor = 0
return m, nil
case servicesMsg:
m.busy = false
m.busyTitle = ""
if msg.err != nil {
m.title = "Services"
m.body = msg.err.Error()
m.prevScreen = screenSettings
m.screen = screenOutput
return m, nil
}
m.services = msg.services
m.screen = screenServices
m.cursor = 0
return m, nil
case interfacesMsg:
m.busy = false
m.busyTitle = ""
if msg.err != nil {
m.title = "interfaces"
m.body = msg.err.Error()
m.prevScreen = screenNetwork
m.screen = screenOutput
return m, nil
}
m.interfaces = msg.ifaces
m.screen = screenInterfacePick
m.cursor = 0
return m, nil
case exportTargetsMsg:
m.busy = false
m.busyTitle = ""
if msg.err != nil {
m.title = "export"
m.body = msg.err.Error()
m.prevScreen = screenMain
m.screen = screenOutput
return m, nil
}
m.targets = msg.targets
m.screen = screenExportTargets
m.cursor = 0
return m, nil
case panelMsg:
m.panel = msg.data
return m, nil
case nvidiaGPUsMsg:
return m.handleNvidiaGPUsMsg(msg)
case nvtopClosedMsg:
return m, nil
case nvidiaSATDoneMsg:
if m.nvidiaSATAborted {
return m, nil
}
if m.nvidiaSATCancel != nil {
m.nvidiaSATCancel()
m.nvidiaSATCancel = nil
}
m.prevScreen = screenHealthCheck
m.screen = screenOutput
m.title = msg.title
if msg.err != nil {
body := strings.TrimSpace(msg.body)
if body == "" {
m.body = fmt.Sprintf("ERROR: %v", msg.err)
} else {
m.body = fmt.Sprintf("%s\n\nERROR: %v", body, msg.err)
}
} else {
m.body = msg.body
}
return m, nil
}
return m, nil
}
func (m model) updateKey(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
switch m.screen {
case screenMain:
return m.updateMain(msg)
case screenHealthCheck:
return m.updateHealthCheck(msg)
case screenSettings:
return m.updateMenu(msg, len(m.settingsMenu), m.handleSettingsMenu)
case screenNetwork:
return m.updateMenu(msg, len(m.networkMenu), m.handleNetworkMenu)
case screenServices:
return m.updateMenu(msg, len(m.services), m.handleServicesMenu)
case screenServiceAction:
return m.updateMenu(msg, len(m.serviceMenu), m.handleServiceActionMenu)
case screenNvidiaSATSetup:
return m.updateNvidiaSATSetup(msg)
case screenNvidiaSATRunning:
return m.updateNvidiaSATRunning(msg)
case screenExportTargets:
return m.updateMenu(msg, len(m.targets), m.handleExportTargetsMenu)
case screenInterfacePick:
return m.updateMenu(msg, len(m.interfaces), m.handleInterfacePickMenu)
case screenOutput:
switch msg.String() {
case "esc", "enter", "q":
m.screen = m.prevScreen
m.body = ""
m.title = ""
m.pendingAction = actionNone
// Refresh panel when returning to main screen.
if m.prevScreen == screenMain {
return m, func() tea.Msg { return panelMsg{data: m.app.LoadHardwarePanel()} }
}
return m, nil
case "ctrl+c":
return m, tea.Quit
}
case screenStaticForm:
return m.updateStaticForm(msg)
case screenConfirm:
return m.updateConfirm(msg)
}
if msg.String() == "ctrl+c" {
return m, tea.Quit
}
return m, nil
}
// updateMain handles keys on the main (two-column) screen.
func (m model) updateMain(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
if m.panelFocus {
return m.updateMainPanel(msg)
}
// Switch focus to right panel.
if (msg.String() == "tab" || msg.String() == "right" || msg.String() == "l") && len(m.panel.Rows) > 0 {
m.panelFocus = true
return m, nil
}
return m.updateMenu(msg, len(m.mainMenu), m.handleMainMenu)
}
// updateMainPanel handles keys when right panel has focus.
func (m model) updateMainPanel(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
switch msg.String() {
case "up", "k":
if m.panelCursor > 0 {
m.panelCursor--
}
case "down", "j":
if m.panelCursor < len(m.panel.Rows)-1 {
m.panelCursor++
}
case "enter":
if m.panelCursor < len(m.panel.Rows) {
key := m.panel.Rows[m.panelCursor].Key
m.busy = true
m.busyTitle = key
return m, func() tea.Msg {
r := m.app.ComponentDetailResult(key)
return resultMsg{title: r.Title, body: r.Body, back: screenMain}
}
}
case "tab", "left", "h", "esc":
m.panelFocus = false
case "q", "ctrl+c":
return m, tea.Quit
}
return m, nil
}
func (m model) updateMenu(msg tea.KeyMsg, size int, onEnter func() (tea.Model, tea.Cmd)) (tea.Model, tea.Cmd) {
if size == 0 {
size = 1
}
switch msg.String() {
case "up", "k":
if m.cursor > 0 {
m.cursor--
}
case "down", "j":
if m.cursor < size-1 {
m.cursor++
}
case "enter":
return onEnter()
case "esc":
switch m.screen {
case screenNetwork, screenServices:
m.screen = screenSettings
m.cursor = 0
case screenSettings:
m.screen = screenMain
m.cursor = 0
case screenServiceAction:
m.screen = screenServices
m.cursor = 0
case screenExportTargets:
m.screen = screenMain
m.cursor = 0
case screenInterfacePick:
m.screen = screenNetwork
m.cursor = 0
}
case "q", "ctrl+c":
return m, tea.Quit
}
return m, nil
}

View File

@@ -1,233 +0,0 @@
package tui
import (
"fmt"
"strings"
"bee/audit/internal/platform"
"github.com/charmbracelet/lipgloss"
tea "github.com/charmbracelet/bubbletea"
)
// Column widths for two-column main layout.
const leftColWidth = 30
var (
stylePass = lipgloss.NewStyle().Foreground(lipgloss.Color("10")) // bright green
styleFail = lipgloss.NewStyle().Foreground(lipgloss.Color("9")) // bright red
styleCancel = lipgloss.NewStyle().Foreground(lipgloss.Color("11")) // bright yellow
styleNA = lipgloss.NewStyle().Foreground(lipgloss.Color("8")) // dark gray
)
func colorStatus(status string) string {
switch status {
case "PASS":
return stylePass.Render("PASS")
case "FAIL":
return styleFail.Render("FAIL")
case "CANCEL":
return styleCancel.Render("CANC")
default:
return styleNA.Render("N/A ")
}
}
func (m model) View() string {
if m.busy {
title := "bee"
if m.busyTitle != "" {
title = m.busyTitle
}
if len(m.progressLines) > 0 {
var b strings.Builder
fmt.Fprintf(&b, "%s\n\n", title)
for _, l := range m.progressLines {
fmt.Fprintf(&b, " %s\n", l)
}
b.WriteString("\n[ctrl+c] quit\n")
return b.String()
}
return fmt.Sprintf("%s\n\nWorking...\n\n[ctrl+c] quit\n", title)
}
switch m.screen {
case screenMain:
return renderTwoColumnMain(m)
case screenHealthCheck:
return renderHealthCheck(m)
case screenSettings:
return renderMenu("Settings", "Select action", m.settingsMenu, m.cursor)
case screenNetwork:
return renderMenu("Network", "Select action", m.networkMenu, m.cursor)
case screenServices:
return renderMenu("Services", "Select service", m.services, m.cursor)
case screenServiceAction:
return renderMenu("Service: "+m.selectedService, "Select action", m.serviceMenu, m.cursor)
case screenExportTargets:
return renderMenu("Export support bundle", "Select removable filesystem", renderTargetItems(m.targets), m.cursor)
case screenInterfacePick:
return renderMenu("Interfaces", "Select interface", renderInterfaceItems(m.interfaces), m.cursor)
case screenStaticForm:
return renderForm("Static IPv4: "+m.selectedIface, m.formFields, m.formIndex)
case screenConfirm:
title, body := m.confirmBody()
return renderConfirm(title, body, m.cursor)
case screenNvidiaSATSetup:
return renderNvidiaSATSetup(m)
case screenNvidiaSATRunning:
return renderNvidiaSATRunning()
case screenOutput:
return fmt.Sprintf("%s\n\n%s\n\n[enter/esc] back [ctrl+c] quit\n", m.title, strings.TrimSpace(m.body))
default:
return "bee\n"
}
}
// renderTwoColumnMain renders the main screen with menu on the left and hardware panel on the right.
func renderTwoColumnMain(m model) string {
// Left column lines
leftLines := []string{"bee", ""}
for i, item := range m.mainMenu {
pfx := " "
if !m.panelFocus && m.cursor == i {
pfx = "> "
}
leftLines = append(leftLines, pfx+item)
}
// Right column lines
rightLines := buildPanelLines(m)
// Render side by side
var b strings.Builder
maxRows := max(len(leftLines), len(rightLines))
for i := 0; i < maxRows; i++ {
l := ""
if i < len(leftLines) {
l = leftLines[i]
}
r := ""
if i < len(rightLines) {
r = rightLines[i]
}
w := lipgloss.Width(l)
if w < leftColWidth {
l += strings.Repeat(" ", leftColWidth-w)
}
b.WriteString(l + " │ " + r + "\n")
}
sep := strings.Repeat("─", leftColWidth) + "─┴─" + strings.Repeat("─", 46)
b.WriteString(sep + "\n")
if m.panelFocus {
b.WriteString("[↑↓] move [enter] details [tab/←] menu [ctrl+c] quit\n")
} else {
b.WriteString("[↑↓] move [enter] select [tab/→] panel [ctrl+c] quit\n")
}
return b.String()
}
func buildPanelLines(m model) []string {
p := m.panel
var lines []string
for _, h := range p.Header {
lines = append(lines, h)
}
if len(p.Header) > 0 && len(p.Rows) > 0 {
lines = append(lines, "")
}
for i, row := range p.Rows {
pfx := " "
if m.panelFocus && m.panelCursor == i {
pfx = "> "
}
status := colorStatus(row.Status)
lines = append(lines, fmt.Sprintf("%s%s %-4s %s", pfx, status, row.Key, row.Detail))
}
return lines
}
func renderTargetItems(targets []platform.RemovableTarget) []string {
items := make([]string, 0, len(targets))
for _, target := range targets {
desc := fmt.Sprintf("%s [%s %s]", target.Device, target.FSType, target.Size)
if target.Label != "" {
desc += " label=" + target.Label
}
if target.Mountpoint != "" {
desc += " mounted=" + target.Mountpoint
}
items = append(items, desc)
}
return items
}
func renderInterfaceItems(interfaces []platform.InterfaceInfo) []string {
items := make([]string, 0, len(interfaces))
for _, iface := range interfaces {
label := iface.Name
if len(iface.IPv4) > 0 {
label += " [" + strings.Join(iface.IPv4, ", ") + "]"
}
items = append(items, label)
}
return items
}
func renderMenu(title, subtitle string, items []string, cursor int) string {
var body strings.Builder
fmt.Fprintf(&body, "%s\n\n%s\n\n", title, subtitle)
if len(items) == 0 {
body.WriteString("(no items)\n")
} else {
for i, item := range items {
prefix := " "
if i == cursor {
prefix = "> "
}
fmt.Fprintf(&body, "%s%s\n", prefix, item)
}
}
body.WriteString("\n[↑/↓] move [enter] select [esc] back [ctrl+c] quit\n")
return body.String()
}
func renderForm(title string, fields []formField, idx int) string {
var body strings.Builder
fmt.Fprintf(&body, "%s\n\n", title)
for i, field := range fields {
prefix := " "
if i == idx {
prefix = "> "
}
fmt.Fprintf(&body, "%s%s: %s\n", prefix, field.Label, field.Value)
}
body.WriteString("\n[tab/↑/↓] move [enter] next/submit [backspace] delete [esc] cancel\n")
return body.String()
}
func renderConfirm(title, body string, cursor int) string {
options := []string{"Confirm", "Cancel"}
var out strings.Builder
fmt.Fprintf(&out, "%s\n\n%s\n\n", title, body)
for i, option := range options {
prefix := " "
if i == cursor {
prefix = "> "
}
fmt.Fprintf(&out, "%s%s\n", prefix, option)
}
out.WriteString("\n[←/→/↑/↓] move [enter] select [esc] cancel\n")
return out.String()
}
func resultCmd(title, body string, err error, back screen) tea.Cmd {
return func() tea.Msg {
return resultMsg{title: title, body: body, err: err, back: back}
}
}

591
audit/internal/webui/api.go Normal file
View File

@@ -0,0 +1,591 @@
package webui
import (
"bufio"
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"os/exec"
"path/filepath"
"strings"
"sync/atomic"
"time"
"bee/audit/internal/app"
"bee/audit/internal/platform"
)
// ── Job ID counter ────────────────────────────────────────────────────────────
var jobCounter atomic.Uint64
func newJobID(prefix string) string {
return fmt.Sprintf("%s-%d", prefix, jobCounter.Add(1))
}
// ── SSE helpers ───────────────────────────────────────────────────────────────
func sseWrite(w http.ResponseWriter, event, data string) bool {
f, ok := w.(http.Flusher)
if !ok {
return false
}
if event != "" {
fmt.Fprintf(w, "event: %s\n", event)
}
fmt.Fprintf(w, "data: %s\n\n", data)
f.Flush()
return true
}
func sseStart(w http.ResponseWriter) bool {
_, ok := w.(http.Flusher)
if !ok {
http.Error(w, "streaming not supported", http.StatusInternalServerError)
return false
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
w.Header().Set("Access-Control-Allow-Origin", "*")
return true
}
// streamJob streams lines from a jobState to a SSE response.
func streamJob(w http.ResponseWriter, r *http.Request, j *jobState) {
if !sseStart(w) {
return
}
existing, ch := j.subscribe()
for _, line := range existing {
sseWrite(w, "", line)
}
if ch == nil {
// Job already finished
sseWrite(w, "done", j.err)
return
}
for {
select {
case line, ok := <-ch:
if !ok {
sseWrite(w, "done", j.err)
return
}
sseWrite(w, "", line)
case <-r.Context().Done():
return
}
}
}
// runCmdJob runs an exec.Cmd as a background job, streaming stdout+stderr lines.
func runCmdJob(j *jobState, cmd *exec.Cmd) {
pr, pw := io.Pipe()
cmd.Stdout = pw
cmd.Stderr = pw
if err := cmd.Start(); err != nil {
j.finish(err.Error())
return
}
go func() {
scanner := bufio.NewScanner(pr)
for scanner.Scan() {
j.append(scanner.Text())
}
}()
err := cmd.Wait()
_ = pw.Close()
if err != nil {
j.finish(err.Error())
} else {
j.finish("")
}
}
// ── Audit ─────────────────────────────────────────────────────────────────────
func (h *handler) handleAPIAuditRun(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
id := newJobID("audit")
j := globalJobs.create(id)
go func() {
j.append("Running audit...")
result, err := h.opts.App.RunAuditNow(h.opts.RuntimeMode)
if err != nil {
j.append("ERROR: " + err.Error())
j.finish(err.Error())
return
}
for _, line := range strings.Split(result.Body, "\n") {
if line != "" {
j.append(line)
}
}
j.finish("")
}()
writeJSON(w, map[string]string{"job_id": id})
}
func (h *handler) handleAPIAuditStream(w http.ResponseWriter, r *http.Request) {
id := r.URL.Query().Get("job_id")
j, ok := globalJobs.get(id)
if !ok {
http.Error(w, "job not found", http.StatusNotFound)
return
}
streamJob(w, r, j)
}
// ── SAT ───────────────────────────────────────────────────────────────────────
func (h *handler) handleAPISATRun(target string) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
id := newJobID("sat-" + target)
j := globalJobs.create(id)
ctx, cancel := context.WithCancel(context.Background())
j.cancel = cancel
go func() {
defer cancel()
j.append(fmt.Sprintf("Starting %s acceptance test...", target))
var (
archive string
err error
)
// Parse optional parameters
var body struct {
Duration int `json:"duration"`
DiagLevel int `json:"diag_level"`
GPUIndices []int `json:"gpu_indices"`
}
body.DiagLevel = 1
if r.ContentLength > 0 {
_ = json.NewDecoder(r.Body).Decode(&body)
}
switch target {
case "nvidia":
if len(body.GPUIndices) > 0 || body.DiagLevel > 0 {
result, e := h.opts.App.RunNvidiaAcceptancePackWithOptions(
ctx, "", body.DiagLevel, body.GPUIndices,
)
if e != nil {
err = e
} else {
archive = result.Body
}
} else {
archive, err = h.opts.App.RunNvidiaAcceptancePack("")
}
case "memory":
archive, err = h.opts.App.RunMemoryAcceptancePack("")
case "storage":
archive, err = h.opts.App.RunStorageAcceptancePack("")
case "cpu":
dur := body.Duration
if dur <= 0 {
dur = 60
}
archive, err = h.opts.App.RunCPUAcceptancePack("", dur)
}
if err != nil {
if ctx.Err() != nil {
j.append("Aborted.")
j.finish("aborted")
} else {
j.append("ERROR: " + err.Error())
j.finish(err.Error())
}
return
}
j.append(fmt.Sprintf("Archive written: %s", archive))
j.finish("")
}()
writeJSON(w, map[string]string{"job_id": id})
}
}
func (h *handler) handleAPISATStream(w http.ResponseWriter, r *http.Request) {
id := r.URL.Query().Get("job_id")
j, ok := globalJobs.get(id)
if !ok {
http.Error(w, "job not found", http.StatusNotFound)
return
}
streamJob(w, r, j)
}
func (h *handler) handleAPISATAbort(w http.ResponseWriter, r *http.Request) {
id := r.URL.Query().Get("job_id")
j, ok := globalJobs.get(id)
if !ok {
http.Error(w, "job not found", http.StatusNotFound)
return
}
if j.abort() {
writeJSON(w, map[string]string{"status": "aborted"})
} else {
writeJSON(w, map[string]string{"status": "not_running"})
}
}
// ── Services ──────────────────────────────────────────────────────────────────
func (h *handler) handleAPIServicesList(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
names, err := h.opts.App.ListBeeServices()
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
type serviceInfo struct {
Name string `json:"name"`
State string `json:"state"`
Body string `json:"body"`
}
result := make([]serviceInfo, 0, len(names))
for _, name := range names {
state := h.opts.App.ServiceState(name)
body, _ := h.opts.App.ServiceStatus(name)
result = append(result, serviceInfo{Name: name, State: state, Body: body})
}
writeJSON(w, result)
}
func (h *handler) handleAPIServicesAction(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
var req struct {
Name string `json:"name"`
Action string `json:"action"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeError(w, http.StatusBadRequest, "invalid request body")
return
}
var action platform.ServiceAction
switch req.Action {
case "start":
action = platform.ServiceStart
case "stop":
action = platform.ServiceStop
case "restart":
action = platform.ServiceRestart
default:
writeError(w, http.StatusBadRequest, "action must be start|stop|restart")
return
}
result, err := h.opts.App.ServiceActionResult(req.Name, action)
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, map[string]string{"status": "ok", "output": result.Body})
}
// ── Network ───────────────────────────────────────────────────────────────────
func (h *handler) handleAPINetworkStatus(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
ifaces, err := h.opts.App.ListInterfaces()
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, map[string]any{
"interfaces": ifaces,
"default_route": h.opts.App.DefaultRoute(),
})
}
func (h *handler) handleAPINetworkDHCP(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
var req struct {
Interface string `json:"interface"`
}
_ = json.NewDecoder(r.Body).Decode(&req)
var result app.ActionResult
var err error
if req.Interface == "" || req.Interface == "all" {
result, err = h.opts.App.DHCPAllResult()
} else {
result, err = h.opts.App.DHCPOneResult(req.Interface)
}
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, map[string]string{"status": "ok", "output": result.Body})
}
func (h *handler) handleAPINetworkStatic(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
var req struct {
Interface string `json:"interface"`
Address string `json:"address"`
Prefix string `json:"prefix"`
Gateway string `json:"gateway"`
DNS []string `json:"dns"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeError(w, http.StatusBadRequest, "invalid request body")
return
}
cfg := platform.StaticIPv4Config{
Interface: req.Interface,
Address: req.Address,
Prefix: req.Prefix,
Gateway: req.Gateway,
DNS: req.DNS,
}
result, err := h.opts.App.SetStaticIPv4Result(cfg)
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, map[string]string{"status": "ok", "output": result.Body})
}
// ── Export ────────────────────────────────────────────────────────────────────
func (h *handler) handleAPIExportList(w http.ResponseWriter, r *http.Request) {
entries, err := listExportFiles(h.opts.ExportDir)
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, entries)
}
func (h *handler) handleAPIExportBundle(w http.ResponseWriter, r *http.Request) {
archive, err := app.BuildSupportBundle(h.opts.ExportDir)
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, map[string]string{
"status": "ok",
"path": archive,
"url": "/export/support.tar.gz",
})
}
// ── Tools ─────────────────────────────────────────────────────────────────────
var standardTools = []string{
"dmidecode", "smartctl", "nvme", "lspci", "ipmitool",
"nvidia-smi", "memtester", "stress-ng", "nvtop",
"mstflint", "qrencode",
}
func (h *handler) handleAPIToolsCheck(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
statuses := h.opts.App.CheckTools(standardTools)
writeJSON(w, statuses)
}
// ── Preflight ─────────────────────────────────────────────────────────────────
func (h *handler) handleAPIPreflight(w http.ResponseWriter, r *http.Request) {
data, err := loadSnapshot(filepath.Join(h.opts.ExportDir, "runtime-health.json"))
if err != nil {
writeError(w, http.StatusNotFound, "runtime health not found")
return
}
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.Header().Set("Cache-Control", "no-store")
_, _ = w.Write(data)
}
// ── Install ───────────────────────────────────────────────────────────────────
func (h *handler) handleAPIInstallDisks(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
disks, err := h.opts.App.ListInstallDisks()
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
type diskJSON struct {
Device string `json:"device"`
Model string `json:"model"`
Size string `json:"size"`
SizeBytes int64 `json:"size_bytes"`
MountedParts []string `json:"mounted_parts"`
Warnings []string `json:"warnings"`
}
result := make([]diskJSON, 0, len(disks))
for _, d := range disks {
result = append(result, diskJSON{
Device: d.Device,
Model: d.Model,
Size: d.Size,
SizeBytes: d.SizeBytes,
MountedParts: d.MountedParts,
Warnings: platform.DiskWarnings(d),
})
}
writeJSON(w, result)
}
func (h *handler) handleAPIInstallRun(w http.ResponseWriter, r *http.Request) {
if h.opts.App == nil {
writeError(w, http.StatusServiceUnavailable, "app not configured")
return
}
var req struct {
Device string `json:"device"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil || req.Device == "" {
writeError(w, http.StatusBadRequest, "device is required")
return
}
// Whitelist: only allow devices that ListInstallDisks() returns.
disks, err := h.opts.App.ListInstallDisks()
if err != nil {
writeError(w, http.StatusInternalServerError, err.Error())
return
}
allowed := false
for _, d := range disks {
if d.Device == req.Device {
allowed = true
break
}
}
if !allowed {
writeError(w, http.StatusBadRequest, "device not in install candidate list")
return
}
h.installMu.Lock()
if h.installJob != nil && !h.installJob.isDone() {
h.installMu.Unlock()
writeError(w, http.StatusConflict, "install already running")
return
}
j := &jobState{}
h.installJob = j
h.installMu.Unlock()
logFile := platform.InstallLogPath(req.Device)
go runCmdJob(j, exec.CommandContext(r.Context(), "bee-install", req.Device, logFile))
w.WriteHeader(http.StatusNoContent)
}
func (h *handler) handleAPIInstallStream(w http.ResponseWriter, r *http.Request) {
h.installMu.Lock()
j := h.installJob
h.installMu.Unlock()
if j == nil {
if !sseStart(w) {
return
}
sseWrite(w, "done", "")
return
}
streamJob(w, r, j)
}
// ── Metrics SSE ───────────────────────────────────────────────────────────────
func (h *handler) handleAPIMetricsStream(w http.ResponseWriter, r *http.Request) {
if !sseStart(w) {
return
}
ticker := time.NewTicker(time.Second)
defer ticker.Stop()
for {
select {
case <-r.Context().Done():
return
case <-ticker.C:
sample := platform.SampleLiveMetrics()
// Feed server ring buffers
for _, t := range sample.Temps {
if t.Name == "CPU" {
h.ringCPUTemp.push(t.Celsius)
break
}
}
h.ringPower.push(sample.PowerW)
h.ringCPULoad.push(sample.CPULoadPct)
h.ringMemLoad.push(sample.MemLoadPct)
// Feed fan ring buffers (grow on first sight)
h.ringsMu.Lock()
for i, fan := range sample.Fans {
for len(h.ringFans) <= i {
h.ringFans = append(h.ringFans, newMetricsRing(120))
h.fanNames = append(h.fanNames, fan.Name)
}
h.ringFans[i].push(float64(fan.RPM))
}
// Feed per-GPU ring buffers (grow on first sight)
for _, gpu := range sample.GPUs {
idx := gpu.GPUIndex
for len(h.gpuRings) <= idx {
h.gpuRings = append(h.gpuRings, &gpuRings{
Temp: newMetricsRing(120),
Util: newMetricsRing(120),
MemUtil: newMetricsRing(120),
Power: newMetricsRing(120),
})
}
h.gpuRings[idx].Temp.push(gpu.TempC)
h.gpuRings[idx].Util.push(gpu.UsagePct)
h.gpuRings[idx].MemUtil.push(gpu.MemUsagePct)
h.gpuRings[idx].Power.push(gpu.PowerW)
}
h.ringsMu.Unlock()
b, err := json.Marshal(sample)
if err != nil {
continue
}
if !sseWrite(w, "metrics", string(b)) {
return
}
}
}
}

View File

@@ -0,0 +1,102 @@
package webui
import (
"sync"
"time"
)
// jobState holds the output lines and completion status of an async job.
type jobState struct {
lines []string
done bool
err string
mu sync.Mutex
subs []chan string
cancel func() // optional cancel function; nil if job is not cancellable
}
// abort cancels the job if it has a cancel function and is not yet done.
func (j *jobState) abort() bool {
j.mu.Lock()
defer j.mu.Unlock()
if j.done || j.cancel == nil {
return false
}
j.cancel()
return true
}
func (j *jobState) append(line string) {
j.mu.Lock()
defer j.mu.Unlock()
j.lines = append(j.lines, line)
for _, ch := range j.subs {
select {
case ch <- line:
default:
}
}
}
func (j *jobState) finish(errMsg string) {
j.mu.Lock()
defer j.mu.Unlock()
j.done = true
j.err = errMsg
for _, ch := range j.subs {
close(ch)
}
j.subs = nil
}
// subscribe returns a channel that receives all future lines.
// Existing lines are returned first, then the channel streams new ones.
func (j *jobState) subscribe() ([]string, <-chan string) {
j.mu.Lock()
defer j.mu.Unlock()
existing := make([]string, len(j.lines))
copy(existing, j.lines)
if j.done {
return existing, nil
}
ch := make(chan string, 256)
j.subs = append(j.subs, ch)
return existing, ch
}
// jobManager manages async jobs identified by string IDs.
type jobManager struct {
mu sync.Mutex
jobs map[string]*jobState
}
var globalJobs = &jobManager{jobs: make(map[string]*jobState)}
func (m *jobManager) create(id string) *jobState {
m.mu.Lock()
defer m.mu.Unlock()
j := &jobState{}
m.jobs[id] = j
// Schedule cleanup after 30 minutes
go func() {
time.Sleep(30 * time.Minute)
m.mu.Lock()
delete(m.jobs, id)
m.mu.Unlock()
}()
return j
}
// isDone returns true if the job has finished (either successfully or with error).
func (j *jobState) isDone() bool {
j.mu.Lock()
defer j.mu.Unlock()
return j.done
}
func (m *jobManager) get(id string) (*jobState, bool) {
m.mu.Lock()
defer m.mu.Unlock()
j, ok := m.jobs[id]
return j, ok
}

View File

@@ -0,0 +1,786 @@
package webui
import (
"encoding/json"
"fmt"
"html"
"net/url"
"os"
"path/filepath"
"sort"
"strings"
)
// ── Layout ────────────────────────────────────────────────────────────────────
func layoutHead(title string) string {
return `<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
<title>` + html.EscapeString(title) + `</title>
<style>
:root{--bg:#fff;--surface:#fff;--surface-2:#f9fafb;--border:rgba(34,36,38,.15);--border-lite:rgba(34,36,38,.1);--ink:rgba(0,0,0,.87);--muted:rgba(0,0,0,.6);--accent:#2185d0;--accent-dark:#1678c2;--crit-bg:#fff6f6;--crit-fg:#9f3a38;--crit-border:#e0b4b4;--ok-bg:#fcfff5;--ok-fg:#2c662d;--warn-bg:#fffaf3;--warn-fg:#573a08}
*{box-sizing:border-box;margin:0;padding:0}
body{font:14px/1.5 Lato,"Helvetica Neue",Arial,Helvetica,sans-serif;background:var(--bg);color:var(--ink);display:flex;min-height:100vh}
a{color:var(--accent);text-decoration:none}
/* Sidebar */
.sidebar{width:210px;min-height:100vh;background:#1b1c1d;flex-shrink:0;display:flex;flex-direction:column}
.sidebar-logo{padding:18px 16px 12px;font-size:18px;font-weight:700;color:#fff;letter-spacing:-.5px}
.sidebar-logo span{color:rgba(255,255,255,.5);font-weight:400;font-size:12px;display:block;margin-top:2px}
.nav{flex:1}
.nav-item{display:block;padding:10px 16px;color:rgba(255,255,255,.7);font-size:13px;border-left:3px solid transparent;transition:all .15s}
.nav-item:hover{color:#fff;background:rgba(255,255,255,.08)}
.nav-item.active{color:#fff;background:rgba(33,133,208,.25);border-left-color:var(--accent)}
/* Content */
.main{flex:1;display:flex;flex-direction:column;overflow:auto}
.topbar{padding:13px 24px;background:#1b1c1d;display:flex;align-items:center;gap:12px}
.topbar h1{font-size:16px;font-weight:700;color:rgba(255,255,255,.9)}
.content{padding:24px;flex:1}
/* Cards */
.card{background:var(--surface);border:1px solid var(--border);border-radius:4px;box-shadow:0 1px 2px rgba(34,36,38,.15);margin-bottom:16px;overflow:hidden}
.card-head{padding:11px 16px;background:var(--surface-2);border-bottom:1px solid var(--border);font-weight:700;font-size:13px;display:flex;align-items:center;gap:8px}
.card-body{padding:16px}
/* Buttons */
.btn{display:inline-flex;align-items:center;gap:6px;padding:8px 16px;border-radius:4px;font-size:13px;font-weight:700;cursor:pointer;border:none;transition:background .1s;font-family:inherit}
.btn-primary{background:var(--accent);color:#fff}.btn-primary:hover{background:var(--accent-dark)}
.btn-danger{background:#db2828;color:#fff}.btn-danger:hover{background:#b91c1c}
.btn-secondary{background:var(--surface-2);color:var(--ink);border:1px solid var(--border)}.btn-secondary:hover{background:#eee}
.btn-sm{padding:5px 10px;font-size:12px}
/* Tables */
table{width:100%;border-collapse:collapse;font-size:13px;background:var(--surface)}
th{text-align:left;padding:9px 14px;color:var(--ink);font-weight:700;background:var(--surface-2);border-bottom:1px solid var(--border-lite)}
td{padding:9px 14px;border-top:1px solid var(--border-lite)}
tr:first-child td{border-top:0}
tbody tr:hover td{background:rgba(0,0,0,.03)}
/* Status badges */
.badge{display:inline-block;padding:2px 9px;border-radius:4px;font-size:11px;font-weight:700}
.badge-ok{background:var(--ok-bg);color:var(--ok-fg);border:1px solid #a3c293}
.badge-warn{background:var(--warn-bg);color:var(--warn-fg);border:1px solid #c9ba9b}
.badge-err{background:var(--crit-bg);color:var(--crit-fg);border:1px solid var(--crit-border)}
.badge-unknown{background:var(--surface-2);color:var(--muted);border:1px solid var(--border)}
/* Output terminal */
.terminal{background:#1b1c1d;border:1px solid rgba(0,0,0,.2);border-radius:4px;padding:14px;font-family:monospace;font-size:12px;color:#b5cea8;max-height:400px;overflow-y:auto;white-space:pre-wrap;word-break:break-all}
/* Forms */
.form-row{margin-bottom:14px}
.form-row label{display:block;font-size:12px;color:var(--muted);margin-bottom:5px;font-weight:700}
.form-row input,.form-row select{width:100%;padding:8px 10px;background:var(--surface);border:1px solid var(--border);border-radius:4px;color:var(--ink);font-size:13px;outline:none;font-family:inherit}
.form-row input:focus,.form-row select:focus{border-color:var(--accent);box-shadow:0 0 0 2px rgba(33,133,208,.2)}
/* Grid */
.grid2{display:grid;grid-template-columns:1fr 1fr;gap:16px}
.grid3{display:grid;grid-template-columns:1fr 1fr 1fr;gap:16px}
@media(max-width:900px){.grid2,.grid3{grid-template-columns:1fr}}
/* iframe viewer */
.viewer-frame{width:100%;height:calc(100vh - 160px);border:0;border-radius:4px;background:var(--surface-2)}
/* Alerts */
.alert{padding:10px 14px;border-radius:4px;font-size:13px;margin-bottom:14px}
.alert-info{background:#dff0ff;border:1px solid #a9d4f5;color:#1e3a5f}
.alert-warn{background:var(--warn-bg);border:1px solid #c9ba9b;color:var(--warn-fg)}
</style>
</head>
<body>
`
}
func layoutNav(active string) string {
items := []struct{ id, label, href string }{
{"dashboard", "Dashboard", "/"},
{"viewer", "Audit Snapshot", "/viewer"},
{"metrics", "Metrics", "/metrics"},
{"tests", "Acceptance Tests", "/tests"},
{"burn-in", "Burn-in", "/burn-in"},
{"network", "Network", "/network"},
{"services", "Services", "/services"},
{"export", "Export", "/export"},
{"tools", "Tools", "/tools"},
{"install", "Install to Disk", "/install"},
}
var b strings.Builder
b.WriteString(`<aside class="sidebar">`)
b.WriteString(`<div class="sidebar-logo">bee<span>hardware audit</span></div>`)
b.WriteString(`<nav class="nav">`)
for _, item := range items {
cls := "nav-item"
if item.id == active {
cls += " active"
}
b.WriteString(fmt.Sprintf(`<a class="%s" href="%s">%s</a>`,
cls, item.href, item.label))
}
b.WriteString(`</nav></aside>`)
return b.String()
}
// renderPage dispatches to the appropriate page renderer.
func renderPage(page string, opts HandlerOptions) string {
var pageID, title, body string
switch page {
case "dashboard", "":
pageID = "dashboard"
title = "Dashboard"
body = renderDashboard(opts)
case "metrics":
pageID = "metrics"
title = "Live Metrics"
body = renderMetrics()
case "tests":
pageID = "tests"
title = "Acceptance Tests"
body = renderTests()
case "burn-in":
pageID = "burn-in"
title = "Burn-in Tests"
body = renderBurnIn()
case "network":
pageID = "network"
title = "Network"
body = renderNetwork()
case "services":
pageID = "services"
title = "Services"
body = renderServices()
case "export":
pageID = "export"
title = "Export"
body = renderExport(opts.ExportDir)
case "tools":
pageID = "tools"
title = "Tools"
body = renderTools()
case "install":
pageID = "install"
title = "Install to Disk"
body = renderInstall()
default:
pageID = "dashboard"
title = "Not Found"
body = `<div class="alert alert-warn">Page not found.</div>`
}
return layoutHead(opts.Title+" — "+title) +
layoutNav(pageID) +
`<div class="main"><div class="topbar"><h1>` + html.EscapeString(title) + `</h1></div><div class="content">` +
body +
`</div></div></body></html>`
}
// ── Dashboard ─────────────────────────────────────────────────────────────────
func renderDashboard(opts HandlerOptions) string {
var b strings.Builder
b.WriteString(`<div class="grid2">`)
// Left: health summary
b.WriteString(`<div>`)
b.WriteString(renderHealthCard(opts))
b.WriteString(`</div>`)
// Right: quick actions
b.WriteString(`<div>`)
b.WriteString(`<div class="card"><div class="card-head">Quick Actions</div><div class="card-body">`)
b.WriteString(`<a class="btn btn-primary" href="/export/support.tar.gz" style="display:block;margin-bottom:10px">⬇ Download Support Bundle</a>`)
b.WriteString(`<a class="btn btn-secondary" href="/audit.json" style="display:block;margin-bottom:10px" target="_blank">📄 Open audit.json</a>`)
b.WriteString(`<a class="btn btn-secondary" href="/export/" style="display:block">📁 Browse Export Files</a>`)
b.WriteString(`<div style="margin-top:14px"><button class="btn btn-secondary" onclick="runAudit()">▶ Re-run Audit</button></div>`)
b.WriteString(`</div></div>`)
b.WriteString(`</div>`)
b.WriteString(`</div>`)
// Audit run output div
b.WriteString(`<div id="audit-output" style="display:none" class="card"><div class="card-head">Audit Output</div><div class="card-body"><div id="audit-terminal" class="terminal"></div></div></div>`)
b.WriteString(`<script>
function runAudit() {
document.getElementById('audit-output').style.display='block';
const term = document.getElementById('audit-terminal');
term.textContent = 'Starting audit...\n';
fetch('/api/audit/run', {method:'POST'})
.then(r => r.json())
.then(d => {
const es = new EventSource('/api/audit/stream?job_id=' + d.job_id);
es.onmessage = e => { term.textContent += e.data + '\n'; term.scrollTop = term.scrollHeight; };
es.addEventListener('done', e => { es.close(); term.textContent += (e.data ? '\\nERROR: ' + e.data : '\\nDone.') + '\n'; location.reload(); });
});
}
</script>`)
return b.String()
}
func renderHealthCard(opts HandlerOptions) string {
data, err := loadSnapshot(filepath.Join(opts.ExportDir, "runtime-health.json"))
if err != nil {
return `<div class="card"><div class="card-head">Runtime Health</div><div class="card-body"><span class="badge badge-unknown">No data</span></div></div>`
}
var health map[string]any
if err := json.Unmarshal(data, &health); err != nil {
return `<div class="card"><div class="card-head">Runtime Health</div><div class="card-body"><span class="badge badge-err">Parse error</span></div></div>`
}
status := fmt.Sprintf("%v", health["status"])
badge := "badge-ok"
if status == "PARTIAL" {
badge = "badge-warn"
} else if status == "FAIL" || status == "FAILED" {
badge = "badge-err"
}
var b strings.Builder
b.WriteString(`<div class="card"><div class="card-head">Runtime Health</div><div class="card-body">`)
b.WriteString(fmt.Sprintf(`<div style="margin-bottom:10px"><span class="badge %s">%s</span></div>`, badge, html.EscapeString(status)))
if issues, ok := health["issues"].([]any); ok && len(issues) > 0 {
b.WriteString(`<div style="font-size:12px;color:#f87171">Issues:<br>`)
for _, issue := range issues {
if m, ok := issue.(map[string]any); ok {
b.WriteString(html.EscapeString(fmt.Sprintf("%v: %v", m["code"], m["message"])) + "<br>")
}
}
b.WriteString(`</div>`)
}
b.WriteString(`</div></div>`)
return b.String()
}
// ── Metrics ───────────────────────────────────────────────────────────────────
func renderMetrics() string {
return `<p style="color:var(--muted);font-size:13px;margin-bottom:16px">Live metrics — updated every 2 seconds. Charts use go-analyze/charts (grafana theme).</p>
<div class="card" style="margin-bottom:16px">
<div class="card-head">Server</div>
<div class="card-body" style="padding:8px">
<img id="chart-server" src="/api/metrics/chart/server.svg" style="width:100%;display:block;border-radius:6px" alt="Server metrics">
<div id="sys-table" style="margin-top:8px;font-size:12px"></div>
</div>
</div>
<div id="gpu-charts"></div>
<script>
let knownGPUs = [];
function refreshCharts() {
const t = '?t=' + Date.now();
const srv = document.getElementById('chart-server');
if (srv) srv.src = srv.src.split('?')[0] + t;
knownGPUs.forEach(idx => {
const el = document.getElementById('chart-gpu-' + idx);
if (el) el.src = el.src.split('?')[0] + t;
});
}
setInterval(refreshCharts, 2000);
const es = new EventSource('/api/metrics/stream');
es.addEventListener('metrics', e => {
const d = JSON.parse(e.data);
// Add GPU chart cards as GPUs appear
(d.gpus||[]).forEach(g => {
if (knownGPUs.includes(g.index)) return;
knownGPUs.push(g.index);
const div = document.createElement('div');
div.className = 'card';
div.style.marginBottom = '16px';
div.innerHTML = '<div class="card-head">GPU ' + g.index + '</div>' +
'<div class="card-body" style="padding:8px">' +
'<img id="chart-gpu-' + g.index + '" src="/api/metrics/chart/gpu/' + g.index + '.svg" style="width:100%;display:block;border-radius:6px" alt="GPU ' + g.index + '">' +
'<div id="gpu-table-' + g.index + '" style="margin-top:8px;font-size:12px"></div>' +
'</div>';
document.getElementById('gpu-charts').appendChild(div);
});
// Update numeric tables
let sysHTML = '';
const cpuTemp = (d.temps||[]).find(t => t.name==='CPU');
if (cpuTemp) sysHTML += '<tr><td>CPU Temp</td><td>'+cpuTemp.celsius.toFixed(1)+'°C</td></tr>';
if (d.cpu_load_pct) sysHTML += '<tr><td>CPU Load</td><td>'+d.cpu_load_pct.toFixed(1)+'%</td></tr>';
if (d.mem_load_pct) sysHTML += '<tr><td>Mem Load</td><td>'+d.mem_load_pct.toFixed(1)+'%</td></tr>';
(d.fans||[]).forEach(f => sysHTML += '<tr><td>'+f.name+'</td><td>'+f.rpm+' RPM</td></tr>');
if (d.power_w) sysHTML += '<tr><td>Power</td><td>'+d.power_w.toFixed(0)+' W</td></tr>';
const st = document.getElementById('sys-table');
if (st) st.innerHTML = sysHTML ? '<table>'+sysHTML+'</table>' : '<p style="color:var(--muted)">No sensor data (ipmitool/sensors required)</p>';
(d.gpus||[]).forEach(g => {
const t = document.getElementById('gpu-table-' + g.index);
if (!t) return;
t.innerHTML = '<table>' +
'<tr><td>Temp</td><td>'+g.temp_c+'°C</td>' +
'<td>Load</td><td>'+g.usage_pct+'%</td>' +
'<td>Mem</td><td>'+g.mem_usage_pct+'%</td>' +
'<td>Power</td><td>'+g.power_w+' W</td></tr></table>';
});
});
es.onerror = () => {};
</script>`
}
// ── Acceptance Tests ──────────────────────────────────────────────────────────
func renderTests() string {
return `<p style="color:var(--muted);font-size:13px;margin-bottom:16px">Run hardware acceptance tests and view results.</p>
<div class="grid2">
` + renderSATCard("nvidia", "NVIDIA GPU", `<div class="form-row"><label>Diag Level</label><select id="sat-nvidia-level"><option value="1">Level 1 — Quick</option><option value="2">Level 2 — Standard</option><option value="3">Level 3 — Extended</option><option value="4">Level 4 — Full</option></select></div>`) +
renderSATCard("memory", "Memory", "") +
renderSATCard("storage", "Storage", "") +
renderSATCard("cpu", "CPU", `<div class="form-row"><label>Duration (seconds)</label><input type="number" id="sat-cpu-dur" value="60" min="10"></div>`) +
`</div>
<div id="sat-output" style="display:none;margin-top:16px" class="card">
<div class="card-head">Test Output <span id="sat-title"></span></div>
<div class="card-body"><div id="sat-terminal" class="terminal"></div></div>
</div>
<script>
let satES = null;
function runSAT(target) {
if (satES) satES.close();
const body = {};
if (target === 'nvidia') body.diag_level = parseInt(document.getElementById('sat-nvidia-level').value)||1;
if (target === 'cpu') body.duration = parseInt(document.getElementById('sat-cpu-dur').value)||60;
document.getElementById('sat-output').style.display='block';
document.getElementById('sat-title').textContent = '— ' + target;
const term = document.getElementById('sat-terminal');
term.textContent = 'Starting ' + target + ' test...\n';
fetch('/api/sat/'+target+'/run', {method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify(body)})
.then(r => r.json())
.then(d => {
satES = new EventSource('/api/sat/stream?job_id='+d.job_id);
satES.onmessage = e => { term.textContent += e.data+'\n'; term.scrollTop=term.scrollHeight; };
satES.addEventListener('done', e => { satES.close(); term.textContent += (e.data ? '\nERROR: '+e.data : '\nCompleted.')+'\n'; });
});
}
</script>`
}
func renderSATCard(id, label, extra string) string {
return fmt.Sprintf(`<div class="card"><div class="card-head">%s</div><div class="card-body">%s<button class="btn btn-primary" onclick="runSAT('%s')">▶ Run Test</button></div></div>`,
label, extra, id)
}
// ── Burn-in ───────────────────────────────────────────────────────────────────
func renderBurnIn() string {
return `<p style="color:var(--muted);font-size:13px;margin-bottom:16px">Long-running GPU and system stress tests. Check <a href="/metrics" style="color:var(--accent)">Metrics</a> page for live telemetry.</p>
<div class="grid2">
<div class="card"><div class="card-head">GPU Platform Stress</div><div class="card-body">
<div class="form-row"><label>Duration</label><select id="bi-dur"><option value="600">10 minutes</option><option value="3600">1 hour</option><option value="28800">8 hours</option><option value="86400">24 hours</option></select></div>
<button class="btn btn-primary" onclick="runBurnIn('nvidia')">▶ Start GPU Stress</button>
</div></div>
<div class="card"><div class="card-head">CPU Stress</div><div class="card-body">
<div class="form-row"><label>Duration (seconds)</label><input type="number" id="bi-cpu-dur" value="300" min="60"></div>
<button class="btn btn-primary" onclick="runBurnIn('cpu')">▶ Start CPU Stress</button>
</div></div>
</div>
<div id="bi-output" style="display:none;margin-top:16px" class="card">
<div class="card-head">Output</div>
<div class="card-body"><div id="bi-terminal" class="terminal"></div></div>
</div>
<script>
let biES = null;
function runBurnIn(target) {
if (biES) biES.close();
const body = {};
if (target === 'nvidia') body.duration = parseInt(document.getElementById('bi-dur').value)||600;
if (target === 'cpu') body.duration = parseInt(document.getElementById('bi-cpu-dur').value)||300;
document.getElementById('bi-output').style.display='block';
const term = document.getElementById('bi-terminal');
term.textContent = 'Starting ' + target + ' burn-in...\n';
fetch('/api/sat/'+target+'/run', {method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify(body)})
.then(r => r.json())
.then(d => {
biES = new EventSource('/api/sat/stream?job_id='+d.job_id);
biES.onmessage = e => { term.textContent += e.data+'\n'; term.scrollTop=term.scrollHeight; };
biES.addEventListener('done', e => { biES.close(); term.textContent += (e.data ? '\nERROR: '+e.data : '\nCompleted.')+'\n'; });
});
}
</script>`
}
// ── Network ───────────────────────────────────────────────────────────────────
func renderNetwork() string {
return `<div class="card"><div class="card-head">Network Interfaces</div><div class="card-body">
<div id="iface-table"><p style="color:var(--muted);font-size:13px">Loading...</p></div>
</div></div>
<div class="grid2">
<div class="card"><div class="card-head">DHCP</div><div class="card-body">
<div class="form-row"><label>Interface (leave empty for all)</label><input type="text" id="dhcp-iface" placeholder="eth0"></div>
<button class="btn btn-primary" onclick="runDHCP()">▶ Run DHCP</button>
<div id="dhcp-out" style="margin-top:10px;font-size:12px;color:var(--ok-fg)"></div>
</div></div>
<div class="card"><div class="card-head">Static IPv4</div><div class="card-body">
<div class="form-row"><label>Interface</label><input type="text" id="st-iface" placeholder="eth0"></div>
<div class="form-row"><label>Address</label><input type="text" id="st-addr" placeholder="192.168.1.100"></div>
<div class="form-row"><label>Prefix length</label><input type="text" id="st-prefix" placeholder="24"></div>
<div class="form-row"><label>Gateway</label><input type="text" id="st-gw" placeholder="192.168.1.1"></div>
<div class="form-row"><label>DNS (comma-separated)</label><input type="text" id="st-dns" placeholder="8.8.8.8,8.8.4.4"></div>
<button class="btn btn-primary" onclick="setStatic()">Apply Static IP</button>
<div id="static-out" style="margin-top:10px;font-size:12px;color:var(--ok-fg)"></div>
</div></div>
</div>
<script>
function loadNetwork() {
fetch('/api/network').then(r=>r.json()).then(d => {
const rows = (d.interfaces||[]).map(i =>
'<tr><td>'+i.Name+'</td><td><span class="badge '+(i.State==='up'?'badge-ok':'badge-warn')+'">'+i.State+'</span></td><td>'+(i.IPv4||[]).join(', ')+'</td></tr>'
).join('');
document.getElementById('iface-table').innerHTML =
'<table><tr><th>Interface</th><th>State</th><th>Addresses</th></tr>'+rows+'</table>' +
(d.default_route ? '<p style="font-size:12px;color:var(--muted);margin-top:8px">Default route: '+d.default_route+'</p>' : '');
});
}
function runDHCP() {
const iface = document.getElementById('dhcp-iface').value.trim();
fetch('/api/network/dhcp',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({interface:iface||'all'})})
.then(r=>r.json()).then(d => {
document.getElementById('dhcp-out').textContent = d.output || d.error || 'Done.';
loadNetwork();
});
}
function setStatic() {
const dns = document.getElementById('st-dns').value.split(',').map(s=>s.trim()).filter(Boolean);
fetch('/api/network/static',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({
interface: document.getElementById('st-iface').value,
address: document.getElementById('st-addr').value,
prefix: document.getElementById('st-prefix').value,
gateway: document.getElementById('st-gw').value,
dns: dns,
})}).then(r=>r.json()).then(d => {
document.getElementById('static-out').textContent = d.output || d.error || 'Done.';
loadNetwork();
});
}
loadNetwork();
</script>`
}
// ── Services ──────────────────────────────────────────────────────────────────
func renderServices() string {
return `<div class="card"><div class="card-head">Bee Services <button class="btn btn-sm btn-secondary" onclick="loadServices()" style="margin-left:auto">↻ Refresh</button></div>
<div class="card-body">
<div id="svc-table"><p style="color:var(--muted);font-size:13px">Loading...</p></div>
</div></div>
<div id="svc-out" style="display:none;margin-top:8px" class="card">
<div class="card-head">Output</div>
<div class="card-body" style="padding:10px"><div id="svc-terminal" class="terminal" style="max-height:150px"></div></div>
</div>
<script>
function loadServices() {
fetch('/api/services').then(r=>r.json()).then(svcs => {
const rows = svcs.map(s => {
const st = s.state||'unknown';
const badge = st==='active' ? 'badge-ok' : st==='failed' ? 'badge-err' : 'badge-warn';
const id = 'svc-body-'+s.name.replace(/[^a-z0-9]/g,'-');
const body = (s.body||'').replace(/</g,'&lt;').replace(/>/g,'&gt;');
return '<tr>' +
'<td style="white-space:nowrap">'+s.name+'</td>' +
'<td style="white-space:nowrap"><span class="badge '+badge+'" style="cursor:pointer" onclick="toggleBody(\''+id+'\')">'+st+' ▾</span>' +
'<div id="'+id+'" style="display:none;margin-top:6px"><pre style="font-size:11px;white-space:pre-wrap;word-break:break-all;max-height:200px;overflow-y:auto;background:#1b1c1d;padding:8px;border-radius:4px;color:#b5cea8">'+body+'</pre></div>' +
'</td>' +
'<td style="white-space:nowrap">' +
'<button class="btn btn-sm btn-secondary" onclick="svcAction(\''+s.name+'\',\'start\')">Start</button> ' +
'<button class="btn btn-sm btn-secondary" onclick="svcAction(\''+s.name+'\',\'stop\')">Stop</button> ' +
'<button class="btn btn-sm btn-secondary" onclick="svcAction(\''+s.name+'\',\'restart\')">Restart</button>' +
'</td></tr>';
}).join('');
document.getElementById('svc-table').innerHTML =
'<table><tr><th>Service</th><th>Status</th><th>Actions</th></tr>'+rows+'</table>';
});
}
function toggleBody(id) {
const el = document.getElementById(id);
if (el) el.style.display = el.style.display==='none' ? 'block' : 'none';
}
function svcAction(name, action) {
fetch('/api/services/action',{method:'POST',headers:{'Content-Type':'application/json'},body:JSON.stringify({name,action})})
.then(r=>r.json()).then(d => {
document.getElementById('svc-out').style.display='block';
document.getElementById('svc-terminal').textContent = d.output || d.error || action+' '+name;
setTimeout(loadServices, 1000);
});
}
loadServices();
</script>`
}
// ── Export ────────────────────────────────────────────────────────────────────
func renderExport(exportDir string) string {
entries, _ := listExportFiles(exportDir)
var rows strings.Builder
for _, e := range entries {
rows.WriteString(fmt.Sprintf(`<tr><td><a href="/export/file?path=%s" target="_blank">%s</a></td></tr>`,
url.QueryEscape(e), html.EscapeString(e)))
}
if len(entries) == 0 {
rows.WriteString(`<tr><td style="color:var(--muted)">No export files found.</td></tr>`)
}
return `<div class="grid2">
<div class="card"><div class="card-head">Support Bundle</div><div class="card-body">
<p style="font-size:13px;color:var(--muted);margin-bottom:12px">Creates a tar.gz archive of all audit files, SAT results, and logs.</p>
<a class="btn btn-primary" href="/export/support.tar.gz">⬇ Download Support Bundle</a>
</div></div>
<div class="card"><div class="card-head">Export Files</div><div class="card-body">
<table><tr><th>File</th></tr>` + rows.String() + `</table>
</div></div>
</div>`
}
func listExportFiles(exportDir string) ([]string, error) {
var entries []string
err := filepath.Walk(strings.TrimSpace(exportDir), func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if info.IsDir() {
return nil
}
rel, err := filepath.Rel(exportDir, path)
if err != nil {
return err
}
entries = append(entries, rel)
return nil
})
if err != nil && !os.IsNotExist(err) {
return nil, err
}
sort.Strings(entries)
return entries, nil
}
// ── Tools ─────────────────────────────────────────────────────────────────────
func renderTools() string {
return `<div class="card"><div class="card-head">Tool Check <button class="btn btn-sm btn-secondary" onclick="checkTools()" style="margin-left:auto">↻ Check</button></div>
<div class="card-body"><div id="tools-table"><p style="color:var(--muted);font-size:13px">Click Check to verify installed tools.</p></div></div></div>
<script>
function checkTools() {
document.getElementById('tools-table').innerHTML = '<p style="color:var(--muted);font-size:13px">Checking...</p>';
fetch('/api/tools/check').then(r=>r.json()).then(tools => {
const rows = tools.map(t =>
'<tr><td>'+t.Name+'</td><td><span class="badge '+(t.OK ? 'badge-ok' : 'badge-err')+'">'+(t.OK ? '✓ '+t.Path : '✗ missing')+'</span></td></tr>'
).join('');
document.getElementById('tools-table').innerHTML =
'<table><tr><th>Tool</th><th>Status</th></tr>'+rows+'</table>';
});
}
checkTools();
</script>`
}
// ── Install to Disk ──────────────────────────────────────────────────────────
func renderInstall() string {
return `
<div class="card">
<div class="card-head">Install Live System to Disk</div>
<div class="card-body">
<div class="alert alert-warn" style="margin-bottom:16px">
<strong>Warning:</strong> Installing will <strong>completely erase</strong> the selected
disk and write the live system onto it. All existing data on the target disk will be lost.
This operation cannot be undone.
</div>
<div id="install-loading" style="color:var(--muted);font-size:13px">Loading disk list…</div>
<div id="install-disk-section" style="display:none">
<div class="card" style="margin-bottom:0">
<table id="install-disk-table">
<thead><tr><th></th><th>Device</th><th>Model</th><th>Size</th><th>Status</th></tr></thead>
<tbody id="install-disk-tbody"></tbody>
</table>
</div>
<div style="margin-top:12px">
<button class="btn btn-secondary btn-sm" onclick="installRefreshDisks()">↻ Refresh</button>
</div>
</div>
<div id="install-confirm-section" style="display:none;margin-top:20px">
<div id="install-confirm-warn" class="alert" style="background:#fff6f6;border:1px solid #e0b4b4;color:#9f3a38;font-size:13px"></div>
<div class="form-row" style="max-width:360px">
<label>Type the device name to confirm (e.g. /dev/sda)</label>
<input type="text" id="install-confirm-input" placeholder="/dev/..." oninput="installCheckConfirm()" autocomplete="off" spellcheck="false">
</div>
<button class="btn btn-danger" id="install-start-btn" disabled onclick="installStart()">Install to Disk</button>
<button class="btn btn-secondary" style="margin-left:8px" onclick="installDeselect()">Cancel</button>
</div>
<div id="install-progress-section" style="display:none;margin-top:20px">
<div class="card-head" style="margin-bottom:8px">Installation Progress</div>
<div id="install-terminal" class="terminal" style="max-height:500px"></div>
<div id="install-status" style="margin-top:12px;font-size:13px"></div>
</div>
</div>
</div>
<style>
#install-disk-tbody tr{cursor:pointer}
#install-disk-tbody tr.selected td{background:rgba(33,133,208,.1)}
#install-disk-tbody tr:hover td{background:rgba(33,133,208,.07)}
</style>
<script>
var _installSelected = null;
function installRefreshDisks() {
document.getElementById('install-loading').style.display = '';
document.getElementById('install-disk-section').style.display = 'none';
document.getElementById('install-confirm-section').style.display = 'none';
_installSelected = null;
fetch('/api/install/disks').then(function(r){ return r.json(); }).then(function(disks){
document.getElementById('install-loading').style.display = 'none';
var tbody = document.getElementById('install-disk-tbody');
tbody.innerHTML = '';
if (!disks || disks.length === 0) {
tbody.innerHTML = '<tr><td colspan="5" style="color:var(--muted);text-align:center">No installable disks found</td></tr>';
} else {
disks.forEach(function(d) {
var warnings = (d.warnings || []);
var statusHtml;
if (warnings.length === 0) {
statusHtml = '<span class="badge badge-ok">OK</span>';
} else {
var hasSmall = warnings.some(function(w){ return w.indexOf('too small') >= 0; });
statusHtml = warnings.map(function(w){
var cls = hasSmall ? 'badge-err' : 'badge-warn';
return '<span class="badge ' + cls + '" title="' + w.replace(/"/g,'&quot;') + '">' +
(w.length > 40 ? w.substring(0,38)+'…' : w) + '</span>';
}).join(' ');
}
var mountedNote = (d.mounted_parts && d.mounted_parts.length > 0)
? ' <span style="color:var(--warn-fg);font-size:11px">(mounted)</span>' : '';
var tr = document.createElement('tr');
tr.dataset.device = d.device;
tr.dataset.model = d.model || 'Unknown';
tr.dataset.size = d.size;
tr.dataset.warnings = JSON.stringify(warnings);
tr.innerHTML =
'<td><input type="radio" name="install-disk" value="' + d.device + '"></td>' +
'<td><code>' + d.device + '</code>' + mountedNote + '</td>' +
'<td>' + (d.model || '—') + '</td>' +
'<td>' + d.size + '</td>' +
'<td>' + statusHtml + '</td>';
tr.addEventListener('click', function(){ installSelectDisk(this); });
tbody.appendChild(tr);
});
}
document.getElementById('install-disk-section').style.display = '';
}).catch(function(e){
document.getElementById('install-loading').textContent = 'Failed to load disk list: ' + e;
});
}
function installSelectDisk(tr) {
document.querySelectorAll('#install-disk-tbody tr').forEach(function(r){ r.classList.remove('selected'); });
tr.classList.add('selected');
var radio = tr.querySelector('input[type=radio]');
if (radio) radio.checked = true;
_installSelected = {
device: tr.dataset.device,
model: tr.dataset.model,
size: tr.dataset.size,
warnings: JSON.parse(tr.dataset.warnings || '[]')
};
var warnBox = document.getElementById('install-confirm-warn');
var warnLines = '<strong>⚠ DANGER:</strong> ' + _installSelected.device +
' (' + _installSelected.model + ', ' + _installSelected.size + ')' +
' will be <strong>completely erased</strong> and repartitioned. All data will be lost.<br>';
if (_installSelected.warnings.length > 0) {
warnLines += '<br>' + _installSelected.warnings.map(function(w){ return '• ' + w; }).join('<br>');
}
warnBox.innerHTML = warnLines;
document.getElementById('install-confirm-input').value = '';
document.getElementById('install-start-btn').disabled = true;
document.getElementById('install-confirm-section').style.display = '';
document.getElementById('install-progress-section').style.display = 'none';
}
function installDeselect() {
_installSelected = null;
document.querySelectorAll('#install-disk-tbody tr').forEach(function(r){ r.classList.remove('selected'); });
document.querySelectorAll('#install-disk-tbody input[type=radio]').forEach(function(r){ r.checked = false; });
document.getElementById('install-confirm-section').style.display = 'none';
}
function installCheckConfirm() {
var val = document.getElementById('install-confirm-input').value.trim();
var ok = _installSelected && val === _installSelected.device;
document.getElementById('install-start-btn').disabled = !ok;
}
function installStart() {
if (!_installSelected) return;
document.getElementById('install-confirm-section').style.display = 'none';
document.getElementById('install-disk-section').style.display = 'none';
document.getElementById('install-loading').style.display = 'none';
var prog = document.getElementById('install-progress-section');
var term = document.getElementById('install-terminal');
var status = document.getElementById('install-status');
prog.style.display = '';
term.textContent = '';
status.textContent = 'Starting installation…';
status.style.color = 'var(--muted)';
fetch('/api/install/run', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({device: _installSelected.device})
}).then(function(r){
if (r.status === 204) {
installStreamLog();
} else {
return r.json().then(function(j){ throw new Error(j.error || r.statusText); });
}
}).catch(function(e){
status.textContent = 'Error: ' + e;
status.style.color = 'var(--crit-fg)';
});
}
function installStreamLog() {
var term = document.getElementById('install-terminal');
var status = document.getElementById('install-status');
var es = new EventSource('/api/install/stream');
es.onmessage = function(e) {
term.textContent += e.data + '\n';
term.scrollTop = term.scrollHeight;
};
es.addEventListener('done', function(e) {
es.close();
if (!e.data) {
status.innerHTML = '<span style="color:var(--ok-fg);font-weight:700">✓ Installation complete.</span> Remove the ISO and reboot.';
var rebootBtn = document.createElement('button');
rebootBtn.className = 'btn btn-primary btn-sm';
rebootBtn.style.marginLeft = '12px';
rebootBtn.textContent = 'Reboot now';
rebootBtn.onclick = function(){
fetch('/api/services/action', {method:'POST',headers:{'Content-Type':'application/json'},
body: JSON.stringify({name:'', action:'reboot'})});
};
status.appendChild(rebootBtn);
} else {
status.textContent = '✗ Installation failed: ' + e.data;
status.style.color = 'var(--crit-fg)';
}
});
es.onerror = function() {
es.close();
status.textContent = '✗ Stream disconnected.';
status.style.color = 'var(--crit-fg)';
};
}
// Auto-load on page open.
installRefreshDisks();
</script>
`
}
func renderExportIndex(exportDir string) (string, error) {
entries, err := listExportFiles(exportDir)
if err != nil {
return "", err
}
var body strings.Builder
body.WriteString(`<!DOCTYPE html><html><head><meta charset="utf-8"><title>Bee Export Files</title></head><body>`)
body.WriteString(`<h1>Bee Export Files</h1><ul>`)
for _, entry := range entries {
body.WriteString(`<li><a href="/export/file?path=` + url.QueryEscape(entry) + `">` + html.EscapeString(entry) + `</a></li>`)
}
if len(entries) == 0 {
body.WriteString(`<li>No export files found.</li>`)
}
body.WriteString(`</ul></body></html>`)
return body.String(), nil
}

View File

@@ -1,138 +1,405 @@
package webui
import (
"encoding/json"
"errors"
"fmt"
"html"
"net/http"
"net/url"
"os"
"path/filepath"
"sort"
"strings"
"sync"
"time"
"bee/audit/internal/app"
"bee/audit/internal/runtimeenv"
gocharts "github.com/go-analyze/charts"
"reanimator/chart/viewer"
chartweb "reanimator/chart/web"
"reanimator/chart/web"
)
const defaultTitle = "Bee Hardware Audit"
// HandlerOptions configures the web UI handler.
type HandlerOptions struct {
Title string
AuditPath string
ExportDir string
Title string
AuditPath string
ExportDir string
App *app.App
RuntimeMode runtimeenv.Mode
}
// metricsRing holds a rolling window of live metric samples.
type metricsRing struct {
mu sync.Mutex
vals []float64
labels []string
size int
}
func newMetricsRing(size int) *metricsRing {
return &metricsRing{size: size, vals: make([]float64, 0, size), labels: make([]string, 0, size)}
}
func (r *metricsRing) push(v float64) {
r.mu.Lock()
defer r.mu.Unlock()
if len(r.vals) >= r.size {
r.vals = r.vals[1:]
r.labels = r.labels[1:]
}
r.vals = append(r.vals, v)
r.labels = append(r.labels, time.Now().Format("15:04"))
}
func (r *metricsRing) snapshot() ([]float64, []string) {
r.mu.Lock()
defer r.mu.Unlock()
v := make([]float64, len(r.vals))
l := make([]string, len(r.labels))
copy(v, r.vals)
copy(l, r.labels)
return v, l
}
// gpuRings holds per-GPU ring buffers.
type gpuRings struct {
Temp *metricsRing
Util *metricsRing
MemUtil *metricsRing
Power *metricsRing
}
// handler is the HTTP handler for the web UI.
type handler struct {
opts HandlerOptions
mux *http.ServeMux
// server rings
ringCPUTemp *metricsRing
ringCPULoad *metricsRing
ringMemLoad *metricsRing
ringPower *metricsRing
ringFans []*metricsRing
fanNames []string
// per-GPU rings (index = GPU index)
gpuRings []*gpuRings
ringsMu sync.Mutex
// install job (at most one at a time)
installJob *jobState
installMu sync.Mutex
}
// NewHandler creates the HTTP mux with all routes.
func NewHandler(opts HandlerOptions) http.Handler {
title := strings.TrimSpace(opts.Title)
if title == "" {
title = defaultTitle
if strings.TrimSpace(opts.Title) == "" {
opts.Title = defaultTitle
}
if strings.TrimSpace(opts.ExportDir) == "" {
opts.ExportDir = app.DefaultExportDir
}
if opts.RuntimeMode == "" {
opts.RuntimeMode = runtimeenv.ModeAuto
}
auditPath := strings.TrimSpace(opts.AuditPath)
exportDir := strings.TrimSpace(opts.ExportDir)
if exportDir == "" {
exportDir = app.DefaultExportDir
h := &handler{
opts: opts,
ringCPUTemp: newMetricsRing(120),
ringCPULoad: newMetricsRing(120),
ringMemLoad: newMetricsRing(120),
ringPower: newMetricsRing(120),
}
mux := http.NewServeMux()
mux.Handle("GET /static/", http.StripPrefix("/static/", chartweb.Static()))
mux.HandleFunc("GET /healthz", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Cache-Control", "no-store")
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("ok"))
})
mux.HandleFunc("GET /audit.json", func(w http.ResponseWriter, r *http.Request) {
data, err := loadSnapshot(auditPath)
if err != nil {
if errors.Is(err, os.ErrNotExist) {
http.Error(w, "audit snapshot not found", http.StatusNotFound)
return
}
http.Error(w, fmt.Sprintf("read audit snapshot: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_, _ = w.Write(data)
})
mux.HandleFunc("GET /export/support.tar.gz", func(w http.ResponseWriter, r *http.Request) {
archive, err := app.BuildSupportBundle(exportDir)
if err != nil {
http.Error(w, fmt.Sprintf("build support bundle: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/gzip")
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=%q", filepath.Base(archive)))
http.ServeFile(w, r, archive)
})
mux.HandleFunc("GET /runtime-health.json", func(w http.ResponseWriter, r *http.Request) {
data, err := loadSnapshot(filepath.Join(exportDir, "runtime-health.json"))
if err != nil {
if errors.Is(err, os.ErrNotExist) {
http.Error(w, "runtime health not found", http.StatusNotFound)
return
}
http.Error(w, fmt.Sprintf("read runtime health: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_, _ = w.Write(data)
})
mux.HandleFunc("GET /export/", func(w http.ResponseWriter, r *http.Request) {
body, err := renderExportIndex(exportDir)
if err != nil {
http.Error(w, fmt.Sprintf("render export index: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "text/html; charset=utf-8")
_, _ = w.Write([]byte(body))
})
mux.HandleFunc("GET /export/file", func(w http.ResponseWriter, r *http.Request) {
rel := strings.TrimSpace(r.URL.Query().Get("path"))
if rel == "" {
http.Error(w, "path is required", http.StatusBadRequest)
return
}
clean := filepath.Clean(rel)
if clean == "." || strings.HasPrefix(clean, "..") {
http.Error(w, "invalid path", http.StatusBadRequest)
return
}
http.ServeFile(w, r, filepath.Join(exportDir, clean))
})
mux.HandleFunc("GET /viewer", func(w http.ResponseWriter, r *http.Request) {
snapshot, err := loadSnapshot(auditPath)
if err != nil && !errors.Is(err, os.ErrNotExist) {
http.Error(w, fmt.Sprintf("read audit snapshot: %v", err), http.StatusInternalServerError)
return
}
html, err := viewer.RenderHTML(snapshot, title)
if err != nil {
http.Error(w, fmt.Sprintf("render snapshot: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "text/html; charset=utf-8")
_, _ = w.Write(html)
})
mux.HandleFunc("GET /", func(w http.ResponseWriter, r *http.Request) {
noticeTitle, noticeBody := runtimeNotice(filepath.Join(exportDir, "runtime-health.json"))
body := renderShellPage(title, noticeTitle, noticeBody)
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "text/html; charset=utf-8")
_, _ = w.Write([]byte(body))
})
// ── Infrastructure ──────────────────────────────────────────────────────
mux.HandleFunc("GET /healthz", h.handleHealthz)
// ── Existing read-only endpoints (preserved for compatibility) ──────────
mux.HandleFunc("GET /audit.json", h.handleAuditJSON)
mux.HandleFunc("GET /runtime-health.json", h.handleRuntimeHealthJSON)
mux.HandleFunc("GET /export/support.tar.gz", h.handleSupportBundleDownload)
mux.HandleFunc("GET /export/file", h.handleExportFile)
mux.HandleFunc("GET /export/", h.handleExportIndex)
mux.HandleFunc("GET /viewer", h.handleViewer)
// ── API ──────────────────────────────────────────────────────────────────
// Audit
mux.HandleFunc("POST /api/audit/run", h.handleAPIAuditRun)
mux.HandleFunc("GET /api/audit/stream", h.handleAPIAuditStream)
// SAT
mux.HandleFunc("POST /api/sat/nvidia/run", h.handleAPISATRun("nvidia"))
mux.HandleFunc("POST /api/sat/memory/run", h.handleAPISATRun("memory"))
mux.HandleFunc("POST /api/sat/storage/run", h.handleAPISATRun("storage"))
mux.HandleFunc("POST /api/sat/cpu/run", h.handleAPISATRun("cpu"))
mux.HandleFunc("GET /api/sat/stream", h.handleAPISATStream)
mux.HandleFunc("POST /api/sat/abort", h.handleAPISATAbort)
// Services
mux.HandleFunc("GET /api/services", h.handleAPIServicesList)
mux.HandleFunc("POST /api/services/action", h.handleAPIServicesAction)
// Network
mux.HandleFunc("GET /api/network", h.handleAPINetworkStatus)
mux.HandleFunc("POST /api/network/dhcp", h.handleAPINetworkDHCP)
mux.HandleFunc("POST /api/network/static", h.handleAPINetworkStatic)
// Export
mux.HandleFunc("GET /api/export/list", h.handleAPIExportList)
mux.HandleFunc("POST /api/export/bundle", h.handleAPIExportBundle)
// Tools
mux.HandleFunc("GET /api/tools/check", h.handleAPIToolsCheck)
// Preflight
mux.HandleFunc("GET /api/preflight", h.handleAPIPreflight)
// Install
mux.HandleFunc("GET /api/install/disks", h.handleAPIInstallDisks)
mux.HandleFunc("POST /api/install/run", h.handleAPIInstallRun)
mux.HandleFunc("GET /api/install/stream", h.handleAPIInstallStream)
// Metrics — SSE stream of live sensor data + server-side SVG charts
mux.HandleFunc("GET /api/metrics/stream", h.handleAPIMetricsStream)
mux.HandleFunc("GET /api/metrics/chart/", h.handleMetricsChartSVG)
// Reanimator chart static assets (viewer template expects /static/*)
mux.Handle("GET /static/", http.StripPrefix("/static/", web.Static()))
// ── Pages ────────────────────────────────────────────────────────────────
mux.HandleFunc("GET /", h.handlePage)
h.mux = mux
return mux
}
// ListenAndServe starts the HTTP server.
func ListenAndServe(addr string, opts HandlerOptions) error {
return http.ListenAndServe(addr, NewHandler(opts))
}
// ── Infrastructure handlers ──────────────────────────────────────────────────
func (h *handler) handleHealthz(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Cache-Control", "no-store")
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("ok"))
}
// ── Compatibility endpoints ──────────────────────────────────────────────────
func (h *handler) handleAuditJSON(w http.ResponseWriter, r *http.Request) {
data, err := loadSnapshot(h.opts.AuditPath)
if err != nil {
if errors.Is(err, os.ErrNotExist) {
http.Error(w, "audit snapshot not found", http.StatusNotFound)
return
}
http.Error(w, fmt.Sprintf("read audit snapshot: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_, _ = w.Write(data)
}
func (h *handler) handleRuntimeHealthJSON(w http.ResponseWriter, r *http.Request) {
data, err := loadSnapshot(filepath.Join(h.opts.ExportDir, "runtime-health.json"))
if err != nil {
if errors.Is(err, os.ErrNotExist) {
http.Error(w, "runtime health not found", http.StatusNotFound)
return
}
http.Error(w, fmt.Sprintf("read runtime health: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/json; charset=utf-8")
_, _ = w.Write(data)
}
func (h *handler) handleSupportBundleDownload(w http.ResponseWriter, r *http.Request) {
archive, err := app.BuildSupportBundle(h.opts.ExportDir)
if err != nil {
http.Error(w, fmt.Sprintf("build support bundle: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "application/gzip")
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=%q", filepath.Base(archive)))
http.ServeFile(w, r, archive)
}
func (h *handler) handleExportFile(w http.ResponseWriter, r *http.Request) {
rel := strings.TrimSpace(r.URL.Query().Get("path"))
if rel == "" {
http.Error(w, "path is required", http.StatusBadRequest)
return
}
clean := filepath.Clean(rel)
if clean == "." || strings.HasPrefix(clean, "..") {
http.Error(w, "invalid path", http.StatusBadRequest)
return
}
http.ServeFile(w, r, filepath.Join(h.opts.ExportDir, clean))
}
func (h *handler) handleExportIndex(w http.ResponseWriter, r *http.Request) {
body, err := renderExportIndex(h.opts.ExportDir)
if err != nil {
http.Error(w, fmt.Sprintf("render export index: %v", err), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "text/html; charset=utf-8")
_, _ = w.Write([]byte(body))
}
func (h *handler) handleViewer(w http.ResponseWriter, r *http.Request) {
snapshot, _ := loadSnapshot(h.opts.AuditPath)
body, err := viewer.RenderHTML(snapshot, h.opts.Title)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "text/html; charset=utf-8")
_, _ = w.Write(body)
}
func (h *handler) handleMetricsChartSVG(w http.ResponseWriter, r *http.Request) {
path := strings.TrimPrefix(r.URL.Path, "/api/metrics/chart/")
path = strings.TrimSuffix(path, ".svg")
var datasets [][]float64
var names []string
var labels []string
var title string
switch {
case path == "server":
title = "Server"
vCPUTemp, l := h.ringCPUTemp.snapshot()
vCPULoad, _ := h.ringCPULoad.snapshot()
vMemLoad, _ := h.ringMemLoad.snapshot()
vPower, _ := h.ringPower.snapshot()
labels = l
datasets = [][]float64{vCPUTemp, vCPULoad, vMemLoad, vPower}
names = []string{"CPU Temp °C", "CPU Load %", "Mem Load %", "Power W"}
h.ringsMu.Lock()
for i, fr := range h.ringFans {
fv, _ := fr.snapshot()
datasets = append(datasets, fv)
name := "Fan"
if i < len(h.fanNames) {
name = h.fanNames[i]
}
names = append(names, name+" RPM")
}
h.ringsMu.Unlock()
case strings.HasPrefix(path, "gpu/"):
idxStr := strings.TrimPrefix(path, "gpu/")
idx := 0
fmt.Sscanf(idxStr, "%d", &idx)
h.ringsMu.Lock()
var gr *gpuRings
if idx < len(h.gpuRings) {
gr = h.gpuRings[idx]
}
h.ringsMu.Unlock()
if gr == nil {
http.NotFound(w, r)
return
}
vTemp, l := gr.Temp.snapshot()
vUtil, _ := gr.Util.snapshot()
vMemUtil, _ := gr.MemUtil.snapshot()
vPower, _ := gr.Power.snapshot()
labels = l
title = fmt.Sprintf("GPU %d", idx)
datasets = [][]float64{vTemp, vUtil, vMemUtil, vPower}
names = []string{"Temp °C", "Load %", "Mem %", "Power W"}
default:
http.NotFound(w, r)
return
}
// Ensure all datasets same length as labels
n := len(labels)
if n == 0 {
n = 1
labels = []string{""}
}
for i := range datasets {
if len(datasets[i]) == 0 {
datasets[i] = make([]float64, n)
}
}
sparse := sparseLabels(labels, 6)
opt := gocharts.NewLineChartOptionWithData(datasets)
opt.Title = gocharts.TitleOption{Text: title}
opt.XAxis.Labels = sparse
opt.Legend = gocharts.LegendOption{SeriesNames: names}
p := gocharts.NewPainter(gocharts.PainterOptions{
OutputFormat: gocharts.ChartOutputSVG,
Width: 1400,
Height: 280,
}, gocharts.PainterThemeOption(gocharts.GetTheme("grafana")))
if err := p.LineChart(opt); err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
buf, err := p.Bytes()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "image/svg+xml")
w.Header().Set("Cache-Control", "no-store")
_, _ = w.Write(buf)
}
func safeIdx(s []float64, i int) float64 {
if i < len(s) {
return s[i]
}
return 0
}
func sparseLabels(labels []string, n int) []string {
out := make([]string, len(labels))
step := len(labels) / n
if step < 1 {
step = 1
}
for i, l := range labels {
if i%step == 0 {
out[i] = l
}
}
return out
}
// ── Page handler ─────────────────────────────────────────────────────────────
func (h *handler) handlePage(w http.ResponseWriter, r *http.Request) {
page := strings.TrimPrefix(r.URL.Path, "/")
if page == "" {
page = "dashboard"
}
body := renderPage(page, h.opts)
w.Header().Set("Cache-Control", "no-store")
w.Header().Set("Content-Type", "text/html; charset=utf-8")
_, _ = w.Write([]byte(body))
}
// ── Helpers ──────────────────────────────────────────────────────────────────
func loadSnapshot(path string) ([]byte, error) {
if strings.TrimSpace(path) == "" {
return nil, os.ErrNotExist
@@ -140,101 +407,17 @@ func loadSnapshot(path string) ([]byte, error) {
return os.ReadFile(path)
}
func runtimeNotice(path string) (string, string) {
health, err := app.ReadRuntimeHealth(path)
if err != nil {
return "Runtime Health", "No runtime health snapshot found yet."
}
body := fmt.Sprintf("Status: %s. Export dir: %s. Driver ready: %t. CUDA ready: %t. Network: %s. Export files: /export/",
firstNonEmpty(health.Status, "UNKNOWN"),
firstNonEmpty(health.ExportDir, app.DefaultExportDir),
health.DriverReady,
health.CUDAReady,
firstNonEmpty(health.NetworkStatus, "UNKNOWN"),
)
if len(health.Issues) > 0 {
body += " Issues: "
parts := make([]string, 0, len(health.Issues))
for _, issue := range health.Issues {
parts = append(parts, issue.Code)
}
body += strings.Join(parts, ", ")
}
return "Runtime Health", body
// writeJSON sends v as JSON with status 200.
func writeJSON(w http.ResponseWriter, v any) {
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.Header().Set("Cache-Control", "no-store")
_ = json.NewEncoder(w).Encode(v)
}
func renderExportIndex(exportDir string) (string, error) {
var entries []string
err := filepath.Walk(strings.TrimSpace(exportDir), func(path string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if info.IsDir() {
return nil
}
rel, err := filepath.Rel(exportDir, path)
if err != nil {
return err
}
entries = append(entries, rel)
return nil
})
if err != nil && !errors.Is(err, os.ErrNotExist) {
return "", err
}
sort.Strings(entries)
var body strings.Builder
body.WriteString("<!DOCTYPE html><html><head><meta charset=\"utf-8\"><title>Bee Export Files</title></head><body>")
body.WriteString("<h1>Bee Export Files</h1><ul>")
for _, entry := range entries {
body.WriteString("<li><a href=\"/export/file?path=" + url.QueryEscape(entry) + "\">" + html.EscapeString(entry) + "</a></li>")
}
if len(entries) == 0 {
body.WriteString("<li>No export files found.</li>")
}
body.WriteString("</ul></body></html>")
return body.String(), nil
}
func renderShellPage(title, noticeTitle, noticeBody string) string {
var body strings.Builder
body.WriteString("<!DOCTYPE html><html><head><meta charset=\"utf-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">")
body.WriteString("<title>" + html.EscapeString(title) + "</title>")
body.WriteString(`<style>
body{margin:0;font-family:system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;background:#f4f1ea;color:#1b1b18}
.shell{min-height:100vh;display:grid;grid-template-rows:auto auto 1fr}
.header{padding:18px 20px 12px;border-bottom:1px solid rgba(0,0,0,.08);background:#fbf8f2}
.header h1{margin:0;font-size:24px}
.header p{margin:6px 0 0;color:#5a5a52}
.actions{display:flex;flex-wrap:wrap;gap:10px;padding:12px 20px;background:#fbf8f2}
.actions a{display:inline-block;text-decoration:none;padding:10px 14px;border-radius:999px;background:#1f5f4a;color:#fff;font-weight:600}
.actions a.secondary{background:#d8e5dd;color:#17372b}
.notice{margin:16px 20px 0;padding:14px 16px;border-radius:14px;background:#fff7df;border:1px solid #ead9a4}
.notice h2{margin:0 0 6px;font-size:16px}
.notice p{margin:0;color:#4f4a37}
.viewer-wrap{padding:16px 20px 20px}
.viewer{width:100%;height:calc(100vh - 170px);border:0;border-radius:18px;background:#fff;box-shadow:0 12px 40px rgba(0,0,0,.08)}
@media (max-width:720px){.viewer{height:calc(100vh - 240px)}}
</style></head><body><div class="shell">`)
body.WriteString("<header class=\"header\"><h1>" + html.EscapeString(title) + "</h1><p>Audit viewer with support bundle and raw export access.</p></header>")
body.WriteString("<nav class=\"actions\">")
body.WriteString("<a href=\"/export/support.tar.gz\">Download support bundle</a>")
body.WriteString("<a class=\"secondary\" href=\"/audit.json\">Open audit.json</a>")
body.WriteString("<a class=\"secondary\" href=\"/runtime-health.json\">Open runtime-health.json</a>")
body.WriteString("<a class=\"secondary\" href=\"/export/\">Browse export files</a>")
body.WriteString("</nav>")
if strings.TrimSpace(noticeTitle) != "" {
body.WriteString("<section class=\"notice\"><h2>" + html.EscapeString(noticeTitle) + "</h2><p>" + html.EscapeString(noticeBody) + "</p></section>")
}
body.WriteString("<main class=\"viewer-wrap\"><iframe class=\"viewer\" src=\"/viewer\" loading=\"eager\" referrerpolicy=\"same-origin\"></iframe></main>")
body.WriteString("</div></body></html>")
return body.String()
}
func firstNonEmpty(value, fallback string) string {
value = strings.TrimSpace(value)
if value == "" {
return fallback
}
return value
// writeError sends a JSON error response.
func writeError(w http.ResponseWriter, status int, msg string) {
w.Header().Set("Content-Type", "application/json; charset=utf-8")
w.Header().Set("Cache-Control", "no-store")
w.WriteHeader(status)
_ = json.NewEncoder(w).Encode(map[string]string{"error": msg})
}

2
bible

Submodule bible updated: 688b87e98d...456c1f022c

View File

@@ -0,0 +1,38 @@
# Charting architecture
## Decision: one chart engine for all live metrics
**Engine:** `github.com/go-analyze/charts` (pure Go, no CGO, SVG output)
**Theme:** `grafana` (dark background, coloured lines)
All live metrics charts in the web UI are server-side SVG images served by Go
and polled by the browser every 2 seconds via `<img src="...?t=now">`.
There is no client-side canvas or JS chart library.
### Why go-analyze/charts
- Pure Go, no CGO — builds cleanly inside the live-build container
- SVG output — crisp at any display resolution, full-width without pixelation
- Grafana theme matches the dark web UI colour scheme
- Active fork of the archived wcharczuk/go-chart
### SAT stress-test charts
The `drawGPUChartSVG` function in `platform/gpu_metrics.go` is a separate
self-contained SVG renderer used **only** for completed SAT run reports
(HTML export, burn-in summaries). It is not used for live metrics.
### Live metrics chart endpoints
| Path | Content |
|------|---------|
| `GET /api/metrics/chart/server.svg` | CPU temp, CPU load %, mem load %, power W, fan RPMs |
| `GET /api/metrics/chart/gpu/{idx}.svg` | GPU temp °C, load %, mem %, power W |
Charts are 1400 × 280 px SVG. The page renders them at `width: 100%` in a
single-column layout so they always fill the viewport width.
### Ring buffers
Each metric is stored in a 120-sample ring buffer (2 minutes of history at 1 Hz).
Buffers are per-server or per-GPU and grow dynamically as new GPUs appear.

View File

@@ -9,6 +9,8 @@ DHCP is used only for LAN (operator SSH access). Internet is NOT available.
## Boot sequence (single ISO)
The live system is expected to boot with `toram`, so `live-boot` copies the full read-only medium into RAM before mounting the root filesystem. After that point, runtime must not depend on the original USB/BMC virtual media staying readable.
`systemd` boot order:
```
@@ -20,11 +22,12 @@ local-fs.target
│ creates /dev/nvidia* nodes)
├── bee-audit.service (runs `bee audit` → /var/log/bee-audit.json,
│ never blocks boot on partial collector failures)
── bee-web.service (runs `bee web` on :80,
reads the latest audit snapshot on each request)
── bee-web.service (runs `bee web` on :80 — full interactive web UI)
└── bee-desktop.service (startx → openbox + chromium http://localhost/)
```
**Critical invariants:**
- The live ISO boots with `boot=live toram`. Runtime binaries must continue working even if the original boot media disappears after early boot.
- OpenSSH MUST start without network. `bee-sshsetup.service` runs before `ssh.service`.
- `bee-network.service` uses `dhclient -nw` (background) — network bring-up is best effort and non-blocking.
- `bee-nvidia.service` loads modules via `insmod` with absolute paths — NOT `modprobe`.
@@ -41,17 +44,21 @@ Local-console behavior:
```text
tty1
└── live-config autologin → bee
└── /home/bee/.profile
└── exec menu
└── /usr/local/bin/bee-tui
└── sudo -n /usr/local/bin/bee tui --runtime livecd
└── /home/bee/.profile (prints web UI URLs)
display :0
└── bee-desktop.service (User=bee)
└── startx /usr/local/bin/bee-openbox-session -- :0
├── tint2 (taskbar)
├── chromium http://localhost/
└── openbox (WM)
```
Rules:
- local `tty1` lands in user `bee`, not directly in `root`
- `menu` must work without typing `sudo`
- TUI actions still run as `root` via `sudo -n`
- SSH is independent from the tty1 path
- `bee-desktop.service` starts X11 + openbox + Chromium automatically after `bee-web.service`
- Chromium opens `http://localhost/` — the full interactive web UI
- SSH is independent from the desktop path
- serial console support is enabled for VM boot debugging
## ISO build sequence
@@ -71,24 +78,39 @@ build-in-container.sh [--authorized-keys /path/to/keys]
d. build kernel modules against Debian headers
e. create `libnvidia-ml.so.1` / `libcuda.so.1` symlinks in cache
f. cache in `dist/nvidia-<version>-<kver>/`
7. inject NVIDIA `.ko` → staged `/usr/local/lib/nvidia/`
8. inject `nvidia-smi` → staged `/usr/local/bin/nvidia-smi`
9. inject `libnvidia-ml` + `libcuda` → staged `/usr/lib/`
10. write staged `/etc/bee-release` (versions + git commit)
11. patch staged `motd` with build metadata
12. copy `iso/builder/` into a temporary live-build workdir under `dist/`
13. sync staged overlay into workdir `config/includes.chroot/`
14. run `lb config && lb build` inside the privileged builder container
7. `build-cublas.sh`:
a. download `libcublas`, `libcublasLt`, `libcudart` runtime + dev packages from the NVIDIA CUDA Debian repo
b. verify packages against repo `Packages.gz`
c. extract headers for `bee-gpu-stress` build
d. cache userspace libs in `dist/cublas-<version>+cuda<series>/`
8. build `bee-gpu-stress` against extracted cuBLASLt/cudart headers
9. inject NVIDIA `.ko` → staged `/usr/local/lib/nvidia/`
10. inject `nvidia-smi` → staged `/usr/local/bin/nvidia-smi`
11. inject `libnvidia-ml` + `libcuda` + `libcublas` + `libcublasLt` + `libcudart` → staged `/usr/lib/`
12. write staged `/etc/bee-release` (versions + git commit)
13. patch staged `motd` with build metadata
14. copy `iso/builder/` into a temporary live-build workdir under `dist/`
15. sync staged overlay into workdir `config/includes.chroot/`
16. run `lb config && lb build` inside the privileged builder container
```
Build host notes:
- `build-in-container.sh` targets `linux/amd64` builder containers by default, including Docker Desktop on macOS / Apple Silicon.
- Override with `BEE_BUILDER_PLATFORM=<os/arch>` only if you intentionally need a different container platform.
- If the local builder image under the same tag was previously built for the wrong architecture, the script rebuilds it automatically.
**Critical invariants:**
- `DEBIAN_KERNEL_ABI` in `iso/builder/VERSIONS` pins the exact kernel ABI used in BOTH places:
1. `build-in-container.sh` / `build-nvidia-module.sh` — Debian kernel headers for module build
2. `auto/config``linux-image-${DEBIAN_KERNEL_ABI}` in the ISO
- NVIDIA modules go to staged `usr/local/lib/nvidia/` — NOT to `/lib/modules/<kver>/extra/`.
- `bee-gpu-stress` must be built against cached CUDA userspace headers from `build-cublas.sh`, not against random host-installed CUDA headers.
- The live ISO must ship `libcublas`, `libcublasLt`, and `libcudart` together with `libcuda` so tensor-core stress works without internet or package installs at boot.
- The source overlay in `iso/overlay/` is treated as immutable source. Build-time files are injected only into the staged overlay.
- The live-build workdir under `dist/` is disposable; source files under `iso/builder/` stay clean.
- Container build requires `--privileged` because `live-build` uses mounts/chroots/loop devices during ISO assembly.
- On macOS / Docker Desktop, the builder still must run as `linux/amd64` so the shipped ISO binaries remain `amd64`.
- Operators must provision enough RAM to hold the full compressed live medium plus normal runtime overhead, because `toram` copies the entire read-only ISO payload into memory before the system reaches steady state.
## Post-boot smoke test
@@ -104,7 +126,7 @@ Key checks: NVIDIA modules loaded, `nvidia-smi` sees all GPUs, lib symlinks pres
systemd services running, audit completed with NVIDIA enrichment, LAN reachability.
Current validation state:
- local/libvirt VM boot path is validated for `systemd`, SSH, `bee audit`, `bee-network`, and TUI startup
- local/libvirt VM boot path is validated for `systemd`, SSH, `bee audit`, `bee-network`, and Web UI startup
- real hardware validation is still required before treating the ISO as release-ready
## Overlay mechanism
@@ -131,43 +153,32 @@ Current validation state:
Every collector returns `nil, nil` on tool-not-found. Errors are logged, never fatal.
Acceptance flows:
- `bee sat nvidia` → diagnostic archive with `nvidia-smi -q` + `nvidia-bug-report` + lightweight `bee-gpu-stress`
- `bee sat nvidia` → diagnostic archive with `nvidia-smi -q` + `nvidia-bug-report` + mixed-precision `bee-gpu-stress`
- `bee sat memory``memtester` archive
- `bee sat storage` → SMART/NVMe diagnostic archive and short self-test trigger where supported
- SAT `summary.txt` now includes `overall_status` and per-job `*_status` values (`OK`, `FAILED`, `UNSUPPORTED`)
- `bee-gpu-stress` should prefer cuBLASLt GEMM load over the old integer/PTX burn path:
- Ampere: `fp16` + `fp32`/TF32 tensor-core load
- Ada / Hopper: add `fp8`
- Blackwell+: add `fp4`
- PTX fallback is only for missing cuBLASLt/userspace or unsupported narrow datatypes
- Runtime overrides:
- `BEE_GPU_STRESS_SECONDS`
- `BEE_GPU_STRESS_SIZE_MB`
- `BEE_MEMTESTER_SIZE_MB`
- `BEE_MEMTESTER_PASSES`
## NVIDIA SAT TUI flow (v1.0.0+)
## NVIDIA SAT Web UI flow
```
TUI: Acceptance tests → NVIDIA command pack
1. screenNvidiaSATSetup
a. enumerate GPUs via `nvidia-smi --query-gpu=index,name,memory.total`
b. user selects duration preset: 10 min / 1 h / 8 h / 24 h
c. user selects GPUs via checkboxes (all selected by default)
d. memory size = max(selected GPU memory) — auto-detected, not exposed to user
2. Start → screenNvidiaSATRunning
a. CUDA_VISIBLE_DEVICES set to selected GPU indices
b. tea.Batch: SAT goroutine + tea.ExecProcess(nvtop) launched concurrently
c. nvtop occupies full terminal; SAT result queues in background
d. [o] reopen nvtop at any time; [a] abort (cancels context → kills bee-gpu-stress)
3. GPU metrics collection (during bee-gpu-stress)
- background goroutine polls `nvidia-smi` every second
- per-second rows: elapsed, GPU index, temp°C, usage%, power W, clock MHz
- outputs: gpu-metrics.csv, gpu-metrics.html (offline SVG chart), gpu-metrics-term.txt
4. After SAT completes
- result shown in screenOutput with terminal line-chart (gpu-metrics-term.txt)
- chart is asciigraph-style: box-drawing chars (╭╮╰╯─│), 4 series per GPU,
Y axis with ticks, ANSI colours (red=temp, blue=usage, green=power, yellow=clock)
Web UI: Acceptance Tests page → Run Test button
1. POST /api/sat/nvidia/run → returns job_id
2. GET /api/sat/stream?job_id=... (SSE) — streams stdout/stderr lines live
3. After completion — archive written to /appdata/bee/export/bee-sat/
summary.txt contains overall_status (OK / FAILED) and per-job status values
```
**Critical invariants:**
- `nvtop` must be in `iso/builder/config/package-lists/bee.list.chroot` (baked into ISO).
- `bee-gpu-stress` uses `exec.CommandContext` — aborted on cancel.
- `bee-gpu-stress` uses `exec.CommandContext` — killed on job context cancel.
- Metric goroutine uses stopCh/doneCh pattern; main goroutine waits `<-doneCh` before reading rows (no mutex needed).
- If `nvtop` is not found on PATH, SAT still runs without it (graceful degradation).
- SVG chart is fully offline: no JS, no external CSS, pure inline SVG.

View File

@@ -21,13 +21,14 @@ Fills gaps where Redfish/logpile is blind:
- Read-only hardware inventory: board, CPU, memory, storage, PCIe, PSU, GPU, NIC, RAID
- Machine-readable health summary derived from collector verdicts
- Operator-triggered acceptance tests for NVIDIA, memory, and storage
- NVIDIA SAT includes both diagnostic collection and lightweight GPU stress via `bee-gpu-stress`
- NVIDIA SAT includes both diagnostic collection and mixed-precision GPU stress via `bee-gpu-stress`
- `bee-gpu-stress` should exercise tensor/inference paths (`fp16`, `fp32`/TF32, `fp8`, `fp4` when supported by the GPU/userspace stack) and fall back to Driver API PTX burn only if cuBLASLt is unavailable
- Automatic boot audit with operator-facing local console and SSH access
- NVIDIA proprietary driver loaded at boot for GPU enrichment via `nvidia-smi`
- SSH access (OpenSSH) always available for inspection and debugging
- Interactive Go TUI via `bee tui` for network setup, service management, and acceptance tests
- Read-only web viewer via `bee web`, rendering the latest audit snapshot through the embedded Reanimator Chart
- Local `tty1` operator UX: `bee` autologin, `menu` auto-start, privileged actions via `sudo -n`
- Full web UI via `bee web` on port 80: interactive control panel with live metrics, SAT tests, network config, service management, export, and tools
- Local operator desktop: openbox + Xorg + Chromium auto-opening `http://localhost/`
- Local `tty1` operator UX: `bee` autologin, openbox desktop auto-starts with Chromium on `http://localhost/`
## Network isolation — CRITICAL
@@ -69,15 +70,18 @@ Fills gaps where Redfish/logpile is blind:
| SSH | OpenSSH server |
| NVIDIA driver | Proprietary `.run` installer, built against Debian kernel headers |
| NVIDIA modules | Loaded via `insmod` from `/usr/local/lib/nvidia/` |
| GPU stress backend | `bee-gpu-stress` + cuBLASLt/cuBLAS/cudart mixed-precision GEMM, with Driver API PTX fallback |
| Builder | Debian 12 host/VM or Debian 12 container image |
## Operator UX
- On the live ISO, `tty1` autologins as `bee`
- The login profile auto-runs `menu`, which enters the Go TUI
- The TUI itself executes privileged actions as `root` via `sudo -n`
- `bee-desktop.service` starts X11 + openbox + Chromium on display `:0`
- Chromium opens `http://localhost/` — the full web UI
- SSH remains available independently of the local console path
- Remote operators can open `http://<ip>/` in any browser on the same LAN
- VM-oriented builds also include `qemu-guest-agent` and serial console support for debugging
- The ISO boots with `toram`, so loss of the original USB/BMC virtual media after boot should not break already-installed runtime binaries
## Runtime split
@@ -85,6 +89,7 @@ Fills gaps where Redfish/logpile is blind:
- Live-ISO-only responsibilities stay in `iso/` integration code
- Live ISO launches the Go CLI with `--runtime livecd`
- Local/manual runs use `--runtime auto` or `--runtime local`
- Live ISO targets must have enough RAM for the full compressed live medium plus runtime working set because the boot medium is copied into memory at startup
## Key paths
@@ -99,7 +104,10 @@ Fills gaps where Redfish/logpile is blind:
| `internal/chart/` | Git submodule with `reanimator/chart`, embedded into `bee web` |
| `iso/builder/VERSIONS` | Pinned versions: Debian, Go, NVIDIA driver, kernel ABI |
| `iso/builder/smoketest.sh` | Post-boot smoke test — run via SSH to verify live ISO |
| `iso/overlay/etc/profile.d/bee.sh` | `menu` helper + tty1 auto-start policy |
| `iso/overlay/home/bee/.profile` | `bee` shell profile for local console startup |
| `iso/overlay/etc/profile.d/bee.sh` | tty1 welcome message with web UI URLs |
| `iso/overlay/home/bee/.profile` | `bee` shell profile (PATH only) |
| `iso/overlay/etc/systemd/system/bee-desktop.service` | starts X11 + openbox + chromium |
| `iso/overlay/usr/local/bin/bee-desktop` | startx wrapper for bee-desktop.service |
| `iso/overlay/usr/local/bin/bee-openbox-session` | xinitrc: tint2 + chromium + openbox |
| `dist/` | Build outputs (gitignored) |
| `iso/out/` | Downloaded ISO files (gitignored) |

58
iso/README.md Normal file
View File

@@ -0,0 +1,58 @@
# ISO Build
`bee` ISO is built inside a Debian 12 builder container via `iso/builder/build-in-container.sh`.
## Requirements
- Docker Desktop or another Docker-compatible container runtime
- Privileged containers enabled
- Enough free disk space for builder cache, Debian live-build artifacts, NVIDIA driver cache, and CUDA userspace packages
## Build On macOS
From the repository root:
```sh
sh iso/builder/build-in-container.sh
```
The script defaults to `linux/amd64` builder containers, so it works on:
- Intel Mac
- Apple Silicon (`M1` / `M2` / `M3` / `M4`) via Docker Desktop's Linux VM
You do not need to pass `--platform` manually for normal ISO builds.
## Useful Options
Build with explicit SSH keys baked into the ISO:
```sh
sh iso/builder/build-in-container.sh --authorized-keys ~/.ssh/id_ed25519.pub
```
Rebuild the builder image:
```sh
sh iso/builder/build-in-container.sh --rebuild-image
```
Use a custom cache directory:
```sh
sh iso/builder/build-in-container.sh --cache-dir /path/to/cache
```
## Notes
- The builder image is automatically rebuilt if the local tag exists for the wrong architecture.
- The live ISO boots with Debian `live-boot` `toram`, so the read-only medium is copied into RAM during boot and the runtime no longer depends on the original USB/BMC virtual media staying present.
- Target systems need enough RAM for the full compressed live medium plus normal runtime overhead, or boot may fail before reaching the TUI.
- Override the container platform only if you know why:
```sh
BEE_BUILDER_PLATFORM=linux/amd64 sh iso/builder/build-in-container.sh
```
- The shipped ISO is still `amd64`.
- Output ISO artifacts are written under `dist/`.

View File

@@ -26,6 +26,20 @@ RUN apt-get update -qq && apt-get install -y \
linux-headers-amd64 \
&& rm -rf /var/lib/apt/lists/*
# Add NVIDIA CUDA repo and install nvcc (needed to compile nccl-tests)
RUN wget -qO /tmp/cuda-keyring.gpg \
https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub \
&& gpg --dearmor < /tmp/cuda-keyring.gpg \
> /usr/share/keyrings/nvidia-cuda.gpg \
&& rm /tmp/cuda-keyring.gpg \
&& echo "deb [signed-by=/usr/share/keyrings/nvidia-cuda.gpg] \
https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /" \
> /etc/apt/sources.list.d/cuda.list \
&& apt-get update -qq \
&& apt-get install -y cuda-nvcc-12-8 \
&& rm -rf /var/lib/apt/lists/* \
&& ln -sfn /usr/local/cuda-12.8 /usr/local/cuda
RUN arch="$(dpkg --print-architecture)" \
&& case "$arch" in \
amd64) goarch=amd64 ;; \

View File

@@ -4,5 +4,12 @@ NVIDIA_DRIVER_VERSION=590.48.01
NCCL_VERSION=2.28.9-1
NCCL_CUDA_VERSION=13.0
NCCL_SHA256=2e6faafd2c19cffc7738d9283976a3200ea9db9895907f337f0c7e5a25563186
NCCL_TESTS_VERSION=2.13.10
NVCC_VERSION=12.8
CUBLAS_VERSION=13.0.2.14-1
CUDA_USERSPACE_VERSION=13.0.96-1
DCGM_VERSION=3.3.9
ROCM_VERSION=6.3.4
ROCM_SMI_VERSION=7.4.0.60304-76~22.04
GO_VERSION=1.24.0
AUDIT_VERSION=1.0.0

View File

@@ -32,6 +32,7 @@ lb config noauto \
--memtest none \
--iso-volume "EASY-BEE" \
--iso-application "EASY-BEE" \
--bootappend-live "boot=live components console=tty0 console=ttyS0,115200n8 username=bee user-fullname=Bee modprobe.blacklist=nouveau" \
--bootappend-live "boot=live components nomodeset video=1920x1080 console=tty0 console=ttyS0,115200n8 loglevel=7 username=bee user-fullname=Bee modprobe.blacklist=nouveau" \
--apt-recommends false \
--chroot-squashfs-compression-type zstd \
"${@}"

File diff suppressed because it is too large Load Diff

190
iso/builder/build-cublas.sh Normal file
View File

@@ -0,0 +1,190 @@
#!/bin/sh
# build-cublas.sh — download cuBLASLt/cuBLAS/cudart runtime + headers for bee-gpu-stress.
#
# Downloads .deb packages from NVIDIA's CUDA apt repository (Debian 12, x86_64),
# verifies them against Packages.gz, and extracts the small subset we need:
# - headers for compiling bee-gpu-stress against cuBLASLt
# - runtime libs for libcublas, libcublasLt, libcudart inside the ISO
set -e
CUBLAS_VERSION="$1"
CUDA_USERSPACE_VERSION="$2"
CUDA_SERIES="$3"
DIST_DIR="$4"
[ -n "$CUBLAS_VERSION" ] || { echo "usage: $0 <cublas-version> <cuda-userspace-version> <cuda-series> <dist-dir>"; exit 1; }
[ -n "$CUDA_USERSPACE_VERSION" ] || { echo "usage: $0 <cublas-version> <cuda-userspace-version> <cuda-series> <dist-dir>"; exit 1; }
[ -n "$CUDA_SERIES" ] || { echo "usage: $0 <cublas-version> <cuda-userspace-version> <cuda-series> <dist-dir>"; exit 1; }
[ -n "$DIST_DIR" ] || { echo "usage: $0 <cublas-version> <cuda-userspace-version> <cuda-series> <dist-dir>"; exit 1; }
CUDA_SERIES_DASH=$(printf '%s' "$CUDA_SERIES" | tr '.' '-')
REPO_BASE="https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64"
CACHE_DIR="${DIST_DIR}/cublas-${CUBLAS_VERSION}+cuda${CUDA_SERIES}"
CACHE_ROOT="${BEE_CACHE_DIR:-${DIST_DIR}/cache}"
DOWNLOAD_CACHE_DIR="${CACHE_ROOT}/cublas-downloads"
PACKAGES_GZ="${DOWNLOAD_CACHE_DIR}/Packages.gz"
echo "=== cuBLAS ${CUBLAS_VERSION} / cudart ${CUDA_USERSPACE_VERSION} / CUDA ${CUDA_SERIES} ==="
if [ -f "${CACHE_DIR}/include/cublasLt.h" ] && [ -f "${CACHE_DIR}/include/cuda_runtime_api.h" ] \
&& [ -f "${CACHE_DIR}/include/crt/host_defines.h" ] \
&& [ -f "${CACHE_DIR}/include/nv/target" ] \
&& [ "$(find "${CACHE_DIR}/lib" \( -name 'libcublas.so*' -o -name 'libcublasLt.so*' -o -name 'libcudart.so*' \) 2>/dev/null | wc -l)" -gt 0 ]; then
echo "=== cuBLAS cached, skipping download ==="
echo "cache: $CACHE_DIR"
exit 0
fi
mkdir -p "${DOWNLOAD_CACHE_DIR}" "${CACHE_DIR}/include" "${CACHE_DIR}/lib"
echo "=== downloading Packages.gz ==="
wget -q -O "${PACKAGES_GZ}" "${REPO_BASE}/Packages.gz"
lookup_pkg() {
pkg="$1"
ver="$2" # if empty, match any version (first found)
gzip -dc "${PACKAGES_GZ}" | awk -v pkg="$pkg" -v ver="$ver" '
/^Package: / { cur_pkg=$2; gsub(/\r/, "", cur_pkg) }
/^Version: / { cur_ver=$2; gsub(/\r/, "", cur_ver) }
/^Filename: / { cur_file=$2; gsub(/\r/, "", cur_file) }
/^SHA256: / { cur_sha=$2; gsub(/\r/, "", cur_sha) }
/^$/ {
if (cur_pkg == pkg && (ver == "" || cur_ver == ver)) {
print cur_file " " cur_sha
printed=1
exit
}
cur_pkg=""; cur_ver=""; cur_file=""; cur_sha=""
}
END {
if (!printed && cur_pkg == pkg && (ver == "" || cur_ver == ver)) {
print cur_file " " cur_sha
}
}'
}
download_verified_pkg() {
pkg="$1"
ver="$2"
meta="$(lookup_pkg "$pkg" "$ver")"
[ -n "$meta" ] || { echo "ERROR: package metadata not found for ${pkg} ${ver}"; exit 1; }
repo_file="$(printf '%s\n' "$meta" | awk '{print $1}')"
repo_sha="$(printf '%s\n' "$meta" | awk '{print $2}')"
[ -n "$repo_file" ] || { echo "ERROR: package filename missing for ${pkg}"; exit 1; }
[ -n "$repo_sha" ] || { echo "ERROR: package sha missing for ${pkg}"; exit 1; }
out="${DOWNLOAD_CACHE_DIR}/$(basename "$repo_file")"
if [ -f "$out" ]; then
actual_sha="$(sha256sum "$out" | awk '{print $1}')"
if [ "$actual_sha" = "$repo_sha" ]; then
echo "=== using cached $(basename "$repo_file") ===" >&2
printf '%s\n' "$out"
return 0
fi
echo "=== removing stale $(basename "$repo_file") (sha256 mismatch) ===" >&2
rm -f "$out"
fi
echo "=== downloading $(basename "$repo_file") ===" >&2
wget --show-progress -O "$out" "${REPO_BASE}/$(basename "$repo_file")"
actual_sha="$(sha256sum "$out" | awk '{print $1}')"
if [ "$actual_sha" != "$repo_sha" ]; then
echo "ERROR: sha256 mismatch for $(basename "$repo_file")" >&2
echo " expected: $repo_sha" >&2
echo " actual: $actual_sha" >&2
rm -f "$out"
exit 1
fi
echo "sha256 OK: $(basename "$repo_file")" >&2
printf '%s\n' "$out"
}
extract_deb() {
deb="$1"
dst="$2"
mkdir -p "$dst"
(
cd "$dst"
ar x "$deb"
data_tar=$(ls data.tar.* 2>/dev/null | head -1)
[ -n "$data_tar" ] || { echo "ERROR: data.tar.* not found in $deb"; exit 1; }
tar xf "$data_tar"
)
}
copy_headers() {
from="$1"
if [ -d "${from}/usr/include" ]; then
cp -a "${from}/usr/include/." "${CACHE_DIR}/include/"
fi
# NVIDIA CUDA packages install headers under /usr/local/cuda-X.Y/targets/x86_64-linux/include/
find "$from" -type d -name include | while read -r inc_dir; do
case "$inc_dir" in
*/usr/include) ;; # already handled above
*)
if find "${inc_dir}" -maxdepth 3 \( -name '*.h' -o -type f \) | grep -q .; then
cp -a "${inc_dir}/." "${CACHE_DIR}/include/"
fi
;;
esac
done
}
copy_libs() {
from="$1"
find "$from" \( -name 'libcublas.so*' -o -name 'libcublasLt.so*' -o -name 'libcudart.so*' \) \
\( -type f -o -type l \) -exec cp -a {} "${CACHE_DIR}/lib/" \;
}
make_links() {
base="$1"
versioned=$(find "${CACHE_DIR}/lib" -maxdepth 1 -name "${base}.so.[0-9]*" -type f | sort | head -1)
[ -n "$versioned" ] || return 0
soname=$(printf '%s\n' "$versioned" | sed -E "s#.*/(${base}\.so\.[0-9]+).*#\\1#")
target=$(basename "$versioned")
ln -sf "$target" "${CACHE_DIR}/lib/${soname}" 2>/dev/null || true
ln -sf "${soname}" "${CACHE_DIR}/lib/${base}.so" 2>/dev/null || true
}
TMP_DIR=$(mktemp -d)
trap 'rm -rf "$TMP_DIR"' EXIT INT TERM
CUBLAS_RT_DEB=$(download_verified_pkg "libcublas-${CUDA_SERIES_DASH}" "${CUBLAS_VERSION}")
CUBLAS_DEV_DEB=$(download_verified_pkg "libcublas-dev-${CUDA_SERIES_DASH}" "${CUBLAS_VERSION}")
CUDART_RT_DEB=$(download_verified_pkg "cuda-cudart-${CUDA_SERIES_DASH}" "${CUDA_USERSPACE_VERSION}")
CUDART_DEV_DEB=$(download_verified_pkg "cuda-cudart-dev-${CUDA_SERIES_DASH}" "${CUDA_USERSPACE_VERSION}")
CUDA_CRT_DEB=$(download_verified_pkg "cuda-crt-${CUDA_SERIES_DASH}" "")
CUDA_CCCL_DEB=$(download_verified_pkg "cuda-cccl-${CUDA_SERIES_DASH}" "")
extract_deb "$CUBLAS_RT_DEB" "${TMP_DIR}/cublas-rt"
extract_deb "$CUBLAS_DEV_DEB" "${TMP_DIR}/cublas-dev"
extract_deb "$CUDART_RT_DEB" "${TMP_DIR}/cudart-rt"
extract_deb "$CUDART_DEV_DEB" "${TMP_DIR}/cudart-dev"
extract_deb "$CUDA_CRT_DEB" "${TMP_DIR}/cuda-crt"
extract_deb "$CUDA_CCCL_DEB" "${TMP_DIR}/cuda-cccl"
copy_headers "${TMP_DIR}/cublas-dev"
copy_headers "${TMP_DIR}/cudart-dev"
copy_headers "${TMP_DIR}/cuda-crt"
copy_headers "${TMP_DIR}/cuda-cccl"
copy_libs "${TMP_DIR}/cublas-rt"
copy_libs "${TMP_DIR}/cudart-rt"
make_links "libcublas"
make_links "libcublasLt"
make_links "libcudart"
[ -f "${CACHE_DIR}/include/cublasLt.h" ] || { echo "ERROR: cublasLt.h not extracted"; exit 1; }
[ -f "${CACHE_DIR}/include/cuda_runtime_api.h" ] || { echo "ERROR: cuda_runtime_api.h not extracted"; exit 1; }
[ "$(find "${CACHE_DIR}/lib" -maxdepth 1 -name 'libcublasLt.so*' | wc -l)" -gt 0 ] || { echo "ERROR: libcublasLt not extracted"; exit 1; }
[ "$(find "${CACHE_DIR}/lib" -maxdepth 1 -name 'libcublas.so*' | wc -l)" -gt 0 ] || { echo "ERROR: libcublas not extracted"; exit 1; }
[ "$(find "${CACHE_DIR}/lib" -maxdepth 1 -name 'libcudart.so*' | wc -l)" -gt 0 ] || { echo "ERROR: libcudart not extracted"; exit 1; }
echo "=== cuBLAS extraction complete ==="
echo "cache: $CACHE_DIR"
echo "headers: $(find "${CACHE_DIR}/include" -type f | wc -l)"
echo "libs: $(find "${CACHE_DIR}/lib" -maxdepth 1 \( -name 'libcublas*.so*' -o -name 'libcudart.so*' \) | wc -l)"

View File

@@ -7,9 +7,11 @@ REPO_ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
BUILDER_DIR="${REPO_ROOT}/iso/builder"
CONTAINER_TOOL="${CONTAINER_TOOL:-docker}"
IMAGE_TAG="${BEE_BUILDER_IMAGE:-bee-iso-builder}"
BUILDER_PLATFORM="${BEE_BUILDER_PLATFORM:-linux/amd64}"
CACHE_DIR="${BEE_BUILDER_CACHE_DIR:-${REPO_ROOT}/dist/container-cache}"
AUTH_KEYS=""
REBUILD_IMAGE=0
CLEAN_CACHE=0
. "${BUILDER_DIR}/VERSIONS"
@@ -27,19 +29,43 @@ while [ $# -gt 0 ]; do
AUTH_KEYS="$2"
shift 2
;;
--clean-build)
CLEAN_CACHE=1
REBUILD_IMAGE=1
shift
;;
*)
echo "unknown arg: $1" >&2
echo "usage: $0 [--cache-dir /path] [--rebuild-image] [--authorized-keys /path/to/authorized_keys]" >&2
echo "usage: $0 [--cache-dir /path] [--rebuild-image] [--clean-build] [--authorized-keys /path/to/authorized_keys]" >&2
exit 1
;;
esac
done
if [ "$CLEAN_CACHE" = "1" ]; then
echo "=== cleaning build cache: ${CACHE_DIR} ==="
rm -rf "${CACHE_DIR:?}/go-build" \
"${CACHE_DIR:?}/go-mod" \
"${CACHE_DIR:?}/tmp" \
"${CACHE_DIR:?}/bee" \
"${CACHE_DIR:?}/lb-packages"
echo "=== cleaning live-build work dir: ${REPO_ROOT}/dist/live-build-work ==="
rm -rf "${REPO_ROOT}/dist/live-build-work"
echo "=== caches cleared, proceeding with build ==="
fi
if ! command -v "$CONTAINER_TOOL" >/dev/null 2>&1; then
echo "container tool not found: $CONTAINER_TOOL" >&2
exit 1
fi
PLATFORM_OS="${BUILDER_PLATFORM%/*}"
PLATFORM_ARCH="${BUILDER_PLATFORM#*/}"
if [ -z "$PLATFORM_OS" ] || [ -z "$PLATFORM_ARCH" ] || [ "$PLATFORM_OS" = "$BUILDER_PLATFORM" ]; then
echo "invalid BEE_BUILDER_PLATFORM: ${BUILDER_PLATFORM} (expected os/arch, e.g. linux/amd64)" >&2
exit 1
fi
if [ -n "$AUTH_KEYS" ]; then
[ -f "$AUTH_KEYS" ] || { echo "authorized_keys not found: $AUTH_KEYS" >&2; exit 1; }
AUTH_KEYS_ABS="$(cd "$(dirname "$AUTH_KEYS")" && pwd)/$(basename "$AUTH_KEYS")"
@@ -56,17 +82,35 @@ mkdir -p \
IMAGE_REF="${IMAGE_TAG}:debian${DEBIAN_VERSION}"
if [ "$REBUILD_IMAGE" = "1" ] || ! "$CONTAINER_TOOL" image inspect "${IMAGE_REF}" >/dev/null 2>&1; then
image_matches_platform() {
actual_platform="$("$CONTAINER_TOOL" image inspect --format '{{.Os}}/{{.Architecture}}' "${IMAGE_REF}" 2>/dev/null || true)"
[ "$actual_platform" = "${BUILDER_PLATFORM}" ]
}
NEED_BUILD_IMAGE=0
if [ "$REBUILD_IMAGE" = "1" ]; then
NEED_BUILD_IMAGE=1
elif ! "$CONTAINER_TOOL" image inspect "${IMAGE_REF}" >/dev/null 2>&1; then
NEED_BUILD_IMAGE=1
elif ! image_matches_platform; then
actual_platform="$("$CONTAINER_TOOL" image inspect --format '{{.Os}}/{{.Architecture}}' "${IMAGE_REF}" 2>/dev/null || echo unknown)"
echo "=== rebuilding builder image ${IMAGE_REF}: platform mismatch (${actual_platform} != ${BUILDER_PLATFORM}) ==="
NEED_BUILD_IMAGE=1
fi
if [ "$NEED_BUILD_IMAGE" = "1" ]; then
"$CONTAINER_TOOL" build \
--platform "${BUILDER_PLATFORM}" \
--build-arg GO_VERSION="${GO_VERSION}" \
-t "${IMAGE_REF}" \
"${BUILDER_DIR}"
else
echo "=== using existing builder image ${IMAGE_REF} ==="
echo "=== using existing builder image ${IMAGE_REF} (${BUILDER_PLATFORM}) ==="
fi
set -- \
run --rm --privileged \
--platform "${BUILDER_PLATFORM}" \
-v "${REPO_ROOT}:/work" \
-v "${CACHE_DIR}:/cache" \
-e BEE_CONTAINER_BUILD=1 \
@@ -80,6 +124,7 @@ set -- \
if [ -n "$AUTH_KEYS" ]; then
set -- run --rm --privileged \
--platform "${BUILDER_PLATFORM}" \
-v "${REPO_ROOT}:/work" \
-v "${CACHE_DIR}:/cache" \
-v "${AUTH_KEYS_DIR}:/tmp/bee-authkeys:ro" \

141
iso/builder/build-nccl-tests.sh Executable file
View File

@@ -0,0 +1,141 @@
#!/bin/sh
# build-nccl-tests.sh — build nccl-tests all_reduce_perf for the LiveCD.
#
# Downloads nccl-tests source from GitHub, downloads libnccl-dev .deb for
# nccl.h, and compiles all_reduce_perf with nvcc (cuda-nvcc-13-0).
#
# Output is cached in DIST_DIR/nccl-tests-<version>/ so subsequent builds
# are instant unless NCCL_TESTS_VERSION changes.
#
# Output layout:
# $CACHE_DIR/bin/all_reduce_perf
set -e
NCCL_TESTS_VERSION="$1"
NCCL_VERSION="$2"
NCCL_CUDA_VERSION="$3"
DIST_DIR="$4"
NVCC_VERSION="${5:-}"
DEBIAN_VERSION="${6:-12}"
[ -n "$NCCL_TESTS_VERSION" ] || { echo "usage: $0 <nccl-tests-version> <nccl-version> <cuda-version> <dist-dir> [nvcc-version] [debian-version]"; exit 1; }
[ -n "$NCCL_VERSION" ] || { echo "usage: $0 <nccl-tests-version> <nccl-version> <cuda-version> <dist-dir> [nvcc-version] [debian-version]"; exit 1; }
[ -n "$NCCL_CUDA_VERSION" ] || { echo "usage: $0 <nccl-tests-version> <nccl-version> <cuda-version> <dist-dir> [nvcc-version] [debian-version]"; exit 1; }
[ -n "$DIST_DIR" ] || { echo "usage: $0 <nccl-tests-version> <nccl-version> <cuda-version> <dist-dir> [nvcc-version] [debian-version]"; exit 1; }
echo "=== nccl-tests ${NCCL_TESTS_VERSION} ==="
CACHE_DIR="${DIST_DIR}/nccl-tests-${NCCL_TESTS_VERSION}"
CACHE_ROOT="${BEE_CACHE_DIR:-${DIST_DIR}/cache}"
DOWNLOAD_CACHE_DIR="${CACHE_ROOT}/nccl-tests-downloads"
if [ -f "${CACHE_DIR}/bin/all_reduce_perf" ]; then
echo "=== nccl-tests cached, skipping build ==="
echo "binary: ${CACHE_DIR}/bin/all_reduce_perf"
exit 0
fi
# Resolve nvcc path (cuda-nvcc-X-Y installs to /usr/local/cuda-X.Y/bin/nvcc)
NVCC_VERSION_PATH="$(echo "${NVCC_VERSION}" | tr '.' '.')"
NVCC=""
for candidate in nvcc "/usr/local/cuda-${NVCC_VERSION_PATH}/bin/nvcc" /usr/local/cuda-12/bin/nvcc /usr/local/cuda/bin/nvcc; do
if command -v "$candidate" >/dev/null 2>&1 || [ -x "$candidate" ]; then
NVCC="$candidate"
break
fi
done
[ -n "$NVCC" ] || { echo "ERROR: nvcc not found — install cuda-nvcc-$(echo "${NVCC_VERSION}" | tr '.' '-')"; exit 1; }
echo "nvcc: $NVCC"
# Determine CUDA_HOME from nvcc location
CUDA_HOME="$(dirname "$(dirname "$NVCC")")"
echo "CUDA_HOME: $CUDA_HOME"
# Download libnccl-dev for nccl.h
REPO_BASE="https://developer.download.nvidia.com/compute/cuda/repos/debian${DEBIAN_VERSION}/x86_64"
DEV_PKG="libnccl-dev_${NCCL_VERSION}+cuda${NCCL_CUDA_VERSION}_amd64.deb"
DEV_URL="${REPO_BASE}/${DEV_PKG}"
mkdir -p "$DOWNLOAD_CACHE_DIR"
DEV_DEB="${DOWNLOAD_CACHE_DIR}/${DEV_PKG}"
if [ ! -f "$DEV_DEB" ]; then
echo "=== downloading libnccl-dev ==="
wget --show-progress -O "$DEV_DEB" "$DEV_URL"
fi
# Extract nccl.h from libnccl-dev
NCCL_INCLUDE_TMP=$(mktemp -d)
trap 'rm -rf "$NCCL_INCLUDE_TMP" "$BUILD_TMP"' EXIT INT TERM
cd "$NCCL_INCLUDE_TMP"
ar x "$DEV_DEB"
DATA_TAR=$(ls data.tar.* 2>/dev/null | head -1)
[ -n "$DATA_TAR" ] || { echo "ERROR: data.tar.* not found in libnccl-dev .deb"; exit 1; }
tar xf "$DATA_TAR"
# nccl.h lands in ./usr/include/ or ./usr/local/cuda-X.Y/targets/.../include/
NCCL_H=$(find . -name 'nccl.h' -type f 2>/dev/null | head -1)
[ -n "$NCCL_H" ] || { echo "ERROR: nccl.h not found in libnccl-dev package"; exit 1; }
NCCL_INCLUDE_DIR="$(pwd)/$(dirname "$NCCL_H")"
echo "nccl.h: $NCCL_H"
# libnccl.so comes from the already-built NCCL cache (build-nccl.sh ran first)
NCCL_LIB_DIR="${DIST_DIR}/nccl-${NCCL_VERSION}+cuda${NCCL_CUDA_VERSION}/lib"
[ -d "$NCCL_LIB_DIR" ] || { echo "ERROR: NCCL lib dir not found at $NCCL_LIB_DIR — run build-nccl.sh first"; exit 1; }
echo "nccl lib: $NCCL_LIB_DIR"
# Download nccl-tests source
SRC_TAR="${DOWNLOAD_CACHE_DIR}/nccl-tests-v${NCCL_TESTS_VERSION}.tar.gz"
SRC_URL="https://github.com/NVIDIA/nccl-tests/archive/refs/tags/v${NCCL_TESTS_VERSION}.tar.gz"
if [ ! -f "$SRC_TAR" ]; then
echo "=== downloading nccl-tests v${NCCL_TESTS_VERSION} ==="
wget --show-progress -O "$SRC_TAR" "$SRC_URL"
fi
# Extract and build
BUILD_TMP=$(mktemp -d)
cd "$BUILD_TMP"
tar xf "$SRC_TAR"
SRC_DIR=$(ls -d nccl-tests-* 2>/dev/null | head -1)
[ -n "$SRC_DIR" ] || { echo "ERROR: source directory not found in archive"; exit 1; }
cd "$SRC_DIR"
echo "=== building all_reduce_perf ==="
# Pick gencode based on the actual nvcc version:
# CUDA 12.x — Volta..Blackwell (sm_70..sm_100)
# CUDA 13.x — Hopper..Blackwell (sm_90..sm_100, Pascal/Volta/Ampere dropped)
NVCC_MAJOR=$("$NVCC" --version 2>/dev/null | grep -oE 'release [0-9]+' | awk '{print $2}' | head -1)
echo "nvcc major version: ${NVCC_MAJOR:-unknown}"
if [ "${NVCC_MAJOR:-0}" -ge 13 ] 2>/dev/null; then
GENCODE="-gencode=arch=compute_90,code=sm_90 \
-gencode=arch=compute_100,code=sm_100"
echo "gencode: sm_90 sm_100 (CUDA 13+)"
else
GENCODE="-gencode=arch=compute_70,code=sm_70 \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_86,code=sm_86 \
-gencode=arch=compute_90,code=sm_90 \
-gencode=arch=compute_100,code=sm_100"
echo "gencode: sm_70..sm_100 (CUDA 12)"
fi
LIBRARY_PATH="$NCCL_LIB_DIR${LIBRARY_PATH:+:$LIBRARY_PATH}" \
make MPI=0 \
NVCC="$NVCC" \
CUDA_HOME="$CUDA_HOME" \
NCCL_HOME="$NCCL_INCLUDE_DIR/.." \
NCCL_LIB="$NCCL_LIB_DIR" \
NVCC_GENCODE="$GENCODE" \
BUILDDIR="./build"
[ -f "./build/all_reduce_perf" ] || { echo "ERROR: all_reduce_perf not found after build"; exit 1; }
mkdir -p "${CACHE_DIR}/bin"
cp "./build/all_reduce_perf" "${CACHE_DIR}/bin/all_reduce_perf"
chmod +x "${CACHE_DIR}/bin/all_reduce_perf"
echo "=== nccl-tests build complete ==="
echo "binary: ${CACHE_DIR}/bin/all_reduce_perf"
ls -lh "${CACHE_DIR}/bin/all_reduce_perf"

View File

@@ -46,7 +46,8 @@ CACHE_DIR="${DIST_DIR}/nvidia-${NVIDIA_VERSION}-${KVER}"
CACHE_ROOT="${BEE_CACHE_DIR:-${DIST_DIR}/cache}"
DOWNLOAD_CACHE_DIR="${CACHE_ROOT}/nvidia-downloads"
EXTRACT_CACHE_DIR="${CACHE_ROOT}/nvidia-extract"
if [ -d "$CACHE_DIR/modules" ] && [ -f "$CACHE_DIR/bin/nvidia-smi" ]; then
if [ -d "$CACHE_DIR/modules" ] && [ -f "$CACHE_DIR/bin/nvidia-smi" ] \
&& [ "$(ls "$CACHE_DIR/lib/libnvidia-ptxjitcompiler.so."* 2>/dev/null | wc -l)" -gt 0 ]; then
echo "=== NVIDIA cached, skipping build ==="
echo "cache: $CACHE_DIR"
echo "modules: $(ls "$CACHE_DIR/modules/"*.ko 2>/dev/null | wc -l) .ko files"
@@ -129,8 +130,10 @@ else
echo "WARNING: no firmware/ dir found in installer (may be needed for Hopper GPUs)"
fi
# Copy ALL userspace library files
for lib in libnvidia-ml libcuda; do
# Copy ALL userspace library files.
# libnvidia-ptxjitcompiler is required by libcuda for PTX JIT compilation
# (cuModuleLoadDataEx with PTX source) — without it CUDA_ERROR_JIT_COMPILER_NOT_FOUND.
for lib in libnvidia-ml libcuda libnvidia-ptxjitcompiler; do
count=0
for f in $(find "$EXTRACT_DIR" -maxdepth 1 -name "${lib}.so.*" 2>/dev/null); do
cp "$f" "$CACHE_DIR/lib/" && count=$((count+1))
@@ -147,7 +150,7 @@ ko_count=$(ls "$CACHE_DIR/modules/"*.ko 2>/dev/null | wc -l)
[ "$ko_count" -gt 0 ] || { echo "ERROR: no .ko files built in $CACHE_DIR/modules/"; exit 1; }
# Create soname symlinks: use [0-9][0-9]* to avoid circular symlink (.so.1 has single digit)
for lib in libnvidia-ml libcuda; do
for lib in libnvidia-ml libcuda libnvidia-ptxjitcompiler; do
versioned=$(ls "$CACHE_DIR/lib/${lib}.so."[0-9][0-9]* 2>/dev/null | head -1)
[ -n "$versioned" ] || continue
base=$(basename "$versioned")

View File

@@ -28,12 +28,82 @@ done
. "${BUILDER_DIR}/VERSIONS"
export PATH="$PATH:/usr/local/go/bin"
# Allow git to read the bind-mounted repo (different UID inside container).
git config --global safe.directory "${REPO_ROOT}"
mkdir -p "${DIST_DIR}"
mkdir -p "${CACHE_ROOT}"
: "${GOCACHE:=${CACHE_ROOT}/go-build}"
: "${GOMODCACHE:=${CACHE_ROOT}/go-mod}"
export GOCACHE GOMODCACHE
resolve_audit_version() {
if [ -n "${BEE_AUDIT_VERSION:-}" ]; then
echo "${BEE_AUDIT_VERSION}"
return 0
fi
tag="$(git -C "${REPO_ROOT}" describe --tags --match 'audit/v*' --abbrev=7 --dirty 2>/dev/null || true)"
if [ -z "${tag}" ]; then
tag="$(git -C "${REPO_ROOT}" describe --tags --match 'v[0-9]*' --abbrev=7 --dirty 2>/dev/null || true)"
fi
case "${tag}" in
audit/v*)
echo "${tag#audit/v}"
return 0
;;
v*)
echo "${tag#v}"
return 0
;;
"")
;;
*)
echo "${tag}"
return 0
;;
esac
if [ -n "${AUDIT_VERSION:-}" ]; then
echo "${AUDIT_VERSION}"
return 0
fi
date +%Y%m%d
}
# ISO image versioned separately from the audit binary (iso/v* tags).
resolve_iso_version() {
if [ -n "${BEE_ISO_VERSION:-}" ]; then
echo "${BEE_ISO_VERSION}"
return 0
fi
# Plain v* tags (e.g. v2.7) take priority — this is the current tagging scheme
tag="$(git -C "${REPO_ROOT}" describe --tags --match 'v[0-9]*' --abbrev=7 --dirty 2>/dev/null || true)"
case "${tag}" in
v*)
echo "${tag#v}"
return 0
;;
esac
# Legacy iso/v* tags fallback
tag="$(git -C "${REPO_ROOT}" describe --tags --match 'iso/v*' --abbrev=7 --dirty 2>/dev/null || true)"
case "${tag}" in
iso/v*)
echo "${tag#iso/v}"
return 0
;;
esac
# Fall back to audit version so the name is still meaningful
resolve_audit_version
}
AUDIT_VERSION_EFFECTIVE="$(resolve_audit_version)"
ISO_VERSION_EFFECTIVE="$(resolve_iso_version)"
# Auto-detect kernel ABI: refresh apt index, then query current linux-image-amd64 dependency.
# If headers for the detected ABI are not yet installed (kernel updated since image build),
# install them on the fly so NVIDIA modules and ISO kernel always match.
@@ -64,6 +134,7 @@ fi
echo "=== bee ISO build ==="
echo "Debian: ${DEBIAN_VERSION}, Kernel ABI: ${DEBIAN_KERNEL_ABI}, Go: ${GO_VERSION}"
echo "Audit version: ${AUDIT_VERSION_EFFECTIVE}, ISO version: ${ISO_VERSION_EFFECTIVE}"
echo ""
echo "=== syncing git submodules ==="
@@ -83,7 +154,7 @@ if [ "$NEED_BUILD" = "1" ]; then
cd "${REPO_ROOT}/audit"
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 \
go build \
-ldflags "-s -w -X main.Version=${AUDIT_VERSION:-$(date +%Y%m%d)}" \
-ldflags "-s -w -X main.Version=${AUDIT_VERSION_EFFECTIVE}" \
-o "$BEE_BIN" \
./cmd/bee
echo "binary: $BEE_BIN"
@@ -101,6 +172,16 @@ else
echo "=== bee binary up to date, skipping build ==="
fi
echo ""
echo "=== downloading cuBLAS/cuBLASLt/cudart ${NCCL_CUDA_VERSION} userspace ==="
sh "${BUILDER_DIR}/build-cublas.sh" \
"${CUBLAS_VERSION}" \
"${CUDA_USERSPACE_VERSION}" \
"${NCCL_CUDA_VERSION}" \
"${DIST_DIR}"
CUBLAS_CACHE="${DIST_DIR}/cublas-${CUBLAS_VERSION}+cuda${NCCL_CUDA_VERSION}"
GPU_STRESS_NEED_BUILD=1
if [ -f "$GPU_STRESS_BIN" ] && [ "${BUILDER_DIR}/bee-gpu-stress.c" -ot "$GPU_STRESS_BIN" ]; then
GPU_STRESS_NEED_BUILD=0
@@ -109,18 +190,37 @@ fi
if [ "$GPU_STRESS_NEED_BUILD" = "1" ]; then
echo "=== building bee-gpu-stress ==="
gcc -O2 -s -Wall -Wextra \
-I"${CUBLAS_CACHE}/include" \
-o "$GPU_STRESS_BIN" \
"${BUILDER_DIR}/bee-gpu-stress.c" \
-ldl
-ldl -lm
echo "binary: $GPU_STRESS_BIN"
else
echo "=== bee-gpu-stress up to date, skipping build ==="
fi
echo "=== preparing staged overlay ==="
rm -rf "${BUILD_WORK_DIR}" "${OVERLAY_STAGE_DIR}"
# Sync builder config into work dir, preserving lb cache (chroot + packages).
# We do NOT rm -rf BUILD_WORK_DIR so lb can reuse its chroot on repeat builds.
mkdir -p "${BUILD_WORK_DIR}" "${OVERLAY_STAGE_DIR}"
rsync -a "${BUILDER_DIR}/" "${BUILD_WORK_DIR}/"
rsync -a --delete \
--exclude='cache/' \
--exclude='chroot/' \
--exclude='.build/' \
--exclude='*.iso' \
--exclude='*.packages' \
--exclude='*.contents' \
--exclude='*.files' \
"${BUILDER_DIR}/" "${BUILD_WORK_DIR}/"
# Also persist package cache to CACHE_ROOT so it survives a manual wipe of BUILD_WORK_DIR.
LB_PKG_CACHE="${CACHE_ROOT}/lb-packages"
mkdir -p "${LB_PKG_CACHE}"
if [ -d "${BUILD_WORK_DIR}/cache/packages.chroot" ]; then
rsync -a --delete "${BUILD_WORK_DIR}/cache/packages.chroot/" "${LB_PKG_CACHE}/"
elif [ -d "${LB_PKG_CACHE}" ] && [ "$(ls -A "${LB_PKG_CACHE}" 2>/dev/null)" ]; then
mkdir -p "${BUILD_WORK_DIR}/cache/packages.chroot"
rsync -a "${LB_PKG_CACHE}/" "${BUILD_WORK_DIR}/cache/packages.chroot/"
fi
rsync -a "${OVERLAY_DIR}/" "${OVERLAY_STAGE_DIR}/"
rm -f \
"${OVERLAY_STAGE_DIR}/etc/bee-ssh-password-fallback" \
@@ -128,7 +228,8 @@ rm -f \
"${OVERLAY_STAGE_DIR}/root/.ssh/authorized_keys" \
"${OVERLAY_STAGE_DIR}/usr/local/bin/bee" \
"${OVERLAY_STAGE_DIR}/usr/local/bin/bee-gpu-stress" \
"${OVERLAY_STAGE_DIR}/usr/local/bin/bee-smoketest"
"${OVERLAY_STAGE_DIR}/usr/local/bin/bee-smoketest" \
"${OVERLAY_STAGE_DIR}/usr/local/bin/all_reduce_perf"
# --- inject authorized_keys for SSH access ---
AUTHORIZED_KEYS_FILE="${OVERLAY_STAGE_DIR}/root/.ssh/authorized_keys"
@@ -225,13 +326,33 @@ NCCL_CACHE="${DIST_DIR}/nccl-${NCCL_VERSION}+cuda${NCCL_CUDA_VERSION}"
cp "${NCCL_CACHE}/lib/"* "${OVERLAY_STAGE_DIR}/usr/lib/"
echo "=== NCCL: $(ls "${NCCL_CACHE}/lib/" | wc -l) files injected into /usr/lib/ ==="
# Inject cuBLAS/cuBLASLt/cudart runtime libs used by bee-gpu-stress tensor-core GEMM path
cp "${CUBLAS_CACHE}/lib/"* "${OVERLAY_STAGE_DIR}/usr/lib/"
echo "=== cuBLAS: $(ls "${CUBLAS_CACHE}/lib/" | wc -l) files injected into /usr/lib/ ==="
# --- build nccl-tests ---
echo ""
echo "=== building nccl-tests ${NCCL_TESTS_VERSION} ==="
sh "${BUILDER_DIR}/build-nccl-tests.sh" \
"${NCCL_TESTS_VERSION}" \
"${NCCL_VERSION}" \
"${NCCL_CUDA_VERSION}" \
"${DIST_DIR}" \
"${NVCC_VERSION}" \
"${DEBIAN_VERSION}"
NCCL_TESTS_CACHE="${DIST_DIR}/nccl-tests-${NCCL_TESTS_VERSION}"
cp "${NCCL_TESTS_CACHE}/bin/all_reduce_perf" "${OVERLAY_STAGE_DIR}/usr/local/bin/all_reduce_perf"
chmod +x "${OVERLAY_STAGE_DIR}/usr/local/bin/all_reduce_perf"
echo "=== all_reduce_perf injected ==="
# --- embed build metadata ---
mkdir -p "${OVERLAY_STAGE_DIR}/etc"
BUILD_DATE="$(date +%Y-%m-%d)"
GIT_COMMIT="$(git -C "${REPO_ROOT}" rev-parse --short HEAD 2>/dev/null || echo unknown)"
cat > "${OVERLAY_STAGE_DIR}/etc/bee-release" <<EOF
BEE_ISO_VERSION=${AUDIT_VERSION}
BEE_AUDIT_VERSION=${AUDIT_VERSION}
BEE_ISO_VERSION=${ISO_VERSION_EFFECTIVE}
BEE_AUDIT_VERSION=${AUDIT_VERSION_EFFECTIVE}
BUILD_DATE=${BUILD_DATE}
GIT_COMMIT=${GIT_COMMIT}
DEBIAN_VERSION=${DEBIAN_VERSION}
@@ -239,6 +360,9 @@ DEBIAN_KERNEL_ABI=${DEBIAN_KERNEL_ABI}
NVIDIA_DRIVER_VERSION=${NVIDIA_DRIVER_VERSION}
NCCL_VERSION=${NCCL_VERSION}
NCCL_CUDA_VERSION=${NCCL_CUDA_VERSION}
CUBLAS_VERSION=${CUBLAS_VERSION}
CUDA_USERSPACE_VERSION=${CUDA_USERSPACE_VERSION}
NCCL_TESTS_VERSION=${NCCL_TESTS_VERSION}
EOF
# Patch motd with build info
@@ -249,6 +373,14 @@ if [ -f "${OVERLAY_STAGE_DIR}/etc/motd" ]; then
mv "${OVERLAY_STAGE_DIR}/etc/motd.patched" "${OVERLAY_STAGE_DIR}/etc/motd"
fi
# --- substitute version placeholders in package list ---
sed -i \
-e "s/%%DCGM_VERSION%%/${DCGM_VERSION}/g" \
-e "s/%%ROCM_VERSION%%/${ROCM_VERSION}/g" \
-e "s/%%ROCM_SMI_VERSION%%/${ROCM_SMI_VERSION}/g" \
"${BUILD_WORK_DIR}/config/package-lists/bee.list.chroot" \
"${BUILD_WORK_DIR}/config/archives/rocm.list.chroot"
# --- sync overlay into live-build includes.chroot ---
LB_DIR="${BUILD_WORK_DIR}"
LB_INCLUDES="${LB_DIR}/config/includes.chroot"
@@ -272,7 +404,7 @@ lb build 2>&1
# live-build outputs live-image-amd64.hybrid.iso in LB_DIR
ISO_RAW="${LB_DIR}/live-image-amd64.hybrid.iso"
ISO_OUT="${DIST_DIR}/bee-debian${DEBIAN_VERSION}-v${AUDIT_VERSION}-amd64.iso"
ISO_OUT="${DIST_DIR}/bee-debian${DEBIAN_VERSION}-v${ISO_VERSION_EFFECTIVE}-amd64.iso"
if [ -f "$ISO_RAW" ]; then
cp "$ISO_RAW" "$ISO_OUT"
echo ""

View File

@@ -0,0 +1,29 @@
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v2.0.22 (GNU/Linux)
mQINBGJYmlEBEAC6nJmeqByeReM+MSy4palACCnfOg4pOxffrrkldxz4jrDOZNK4
q8KG+ZbXrkdP0e9qTFRvZzN+A6Jw3ySfoiKXRBw5l2Zp81AYkghV641OpWNjZOyL
syKEtST9LR1ttHv1ZI71pj8NVG/EnpimZPOblEJ1OpibJJCXLrbn+qcJ8JNuGTSK
6v2aLBmhR8VR/aSJpmkg7fFjcGklweTI8+Ibj72HuY9JRD/+dtUoSh7z037mWo56
ee02lPFRD0pHOEAlLSXxFO/SDqRVMhcgHk0a8roCF+9h5Ni7ZUyxlGK/uHkqN7ED
/U/ATpGKgvk4t23eTpdRC8FXAlBZQyf/xnhQXsyF/z7+RV5CL0o1zk1LKgo+5K32
5ka5uZb6JSIrEPUaCPEMXu6EEY8zSFnCrRS/Vjkfvc9ViYZWzJ387WTjAhMdS7wd
PmdDWw2ASGUP4FrfCireSZiFX+ZAOspKpZdh0P5iR5XSx14XDt3jNK2EQQboaJAD
uqksItatOEYNu4JsCbc24roJvJtGhpjTnq1/dyoy6K433afU0DS2ZPLthLpGqeyK
MKNY7a2WjxhRmCSu5Zok/fGKcO62XF8a3eSj4NzCRv8LM6mG1Oekz6Zz+tdxHg19
ufHO0et7AKE5q+5VjE438Xpl4UWbM/Voj6VPJ9uzywDcnZXpeOqeTQh2pQARAQAB
tCBjdWRhdG9vbHMgPGN1ZGF0b29sc0BudmlkaWEuY29tPokCOQQTAQIAIwUCYlia
UQIbAwcLCQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJEKS0aZY7+GPM1y4QALKh
BqSozrYbe341Qu7SyxHQgjRCGi4YhI3bHCMj5F6vEOHnwiFH6YmFkxCYtqcGjca6
iw7cCYMow/hgKLAPwkwSJ84EYpGLWx62+20rMM4OuZwauSUcY/kE2WgnQ74zbh3+
MHs56zntJFfJ9G+NYidvwDWeZn5HIzR4CtxaxRgpiykg0s3ps6X0U+vuVcLnutBF
7r81astvlVQERFbce/6KqHK+yj843Qrhb3JEolUoOETK06nD25bVtnAxe0QEyA90
9MpRNLfR6BdjPpxqhphDcMOhJfyubAroQUxG/7S+Yw+mtEqHrL/dz9iEYqodYiSo
zfi0b+HFI59sRkTfOBDBwb3kcARExwnvLJmqijiVqWkoJ3H67oA0XJN2nelucw+A
Hb+Jt9BWjyzKWlLFDnVHdGicyRJ0I8yqi32w8hGeXmu3tU58VWJrkXEXadBftmci
pemb6oZ/r5SCkW6kxr2PsNWcJoebUdynyOQGbVwpMtJAnjOYp0ObKOANbcIg+tsi
kyCIO5TiY3ADbBDPCeZK8xdcugXoW5WFwACGC0z+Cn0mtw8z3VGIPAMSCYmLusgW
t2+EpikwrP2inNp5Pc+YdczRAsa4s30Jpyv/UHEG5P9GKnvofaxJgnU56lJIRPzF
iCUGy6cVI0Fq777X/ME1K6A/bzZ4vRYNx8rUmVE5
=DO7z
-----END PGP PUBLIC KEY BLOCK-----

View File

@@ -0,0 +1 @@
deb https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /

Binary file not shown.

View File

@@ -0,0 +1 @@
deb https://repo.radeon.com/rocm/apt/%%ROCM_VERSION%% jammy main

View File

@@ -8,7 +8,7 @@ else
fi
if loadfont $font ; then
set gfxmode=800x600
set gfxmode=1920x1080,1280x1024,auto
set gfxpayload=keep
insmod efi_gop
insmod efi_uga

View File

@@ -10,12 +10,22 @@ echo " ╚══════╝╚═╝ ╚═╝╚══════╝
echo ""
menuentry "EASY-BEE" {
linux @KERNEL_LIVE@ @APPEND_LIVE@
linux @KERNEL_LIVE@ @APPEND_LIVE@ bee.nvidia.mode=normal
initrd @INITRD_LIVE@
}
menuentry "EASY-BEE (load to RAM)" {
linux @KERNEL_LIVE@ @APPEND_LIVE@ toram bee.nvidia.mode=normal
initrd @INITRD_LIVE@
}
menuentry "EASY-BEE (NVIDIA GSP=off)" {
linux @KERNEL_LIVE@ @APPEND_LIVE@ bee.nvidia.mode=gsp-off
initrd @INITRD_LIVE@
}
menuentry "EASY-BEE (fail-safe)" {
linux @KERNEL_LIVE@ @APPEND_LIVE@ memtest noapic noapm nodma nomce nolapic nosmp vga=normal
linux @KERNEL_LIVE@ @APPEND_LIVE@ bee.nvidia.mode=gsp-off memtest noapic noapm nodma nomce nolapic nosmp vga=normal
initrd @INITRD_LIVE@
}

View File

@@ -0,0 +1,24 @@
label live-@FLAVOUR@-normal
menu label ^EASY-BEE
menu default
linux @LINUX@
initrd @INITRD@
append @APPEND_LIVE@ bee.nvidia.mode=normal
label live-@FLAVOUR@-toram
menu label EASY-BEE (^load to RAM)
linux @LINUX@
initrd @INITRD@
append @APPEND_LIVE@ toram bee.nvidia.mode=normal
label live-@FLAVOUR@-gsp-off
menu label EASY-BEE (^NVIDIA GSP=off)
linux @LINUX@
initrd @INITRD@
append @APPEND_LIVE@ bee.nvidia.mode=gsp-off
label live-@FLAVOUR@-failsafe
menu label EASY-BEE (^fail-safe)
linux @LINUX@
initrd @INITRD@
append @APPEND_LIVE@ bee.nvidia.mode=gsp-off memtest noapic noapm nodma nomce nolapic nosmp vga=normal

View File

@@ -5,7 +5,23 @@ set -e
echo "=== bee chroot setup ==="
ensure_bee_console_user() {
if id bee >/dev/null 2>&1; then
usermod -d /home/bee -s /bin/sh bee 2>/dev/null || true
else
useradd -d /home/bee -m -s /bin/sh -U bee
fi
mkdir -p /home/bee
chown -R bee:bee /home/bee
echo "bee:eeb" | chpasswd
usermod -aG sudo,video,input bee 2>/dev/null || true
}
ensure_bee_console_user
# Enable bee services
systemctl enable nvidia-dcgm.service 2>/dev/null || true
systemctl enable bee-network.service
systemctl enable bee-nvidia.service
systemctl enable bee-preflight.service
@@ -13,20 +29,29 @@ systemctl enable bee-audit.service
systemctl enable bee-web.service
systemctl enable bee-sshsetup.service
systemctl enable ssh.service
systemctl enable lightdm.service 2>/dev/null || true
systemctl enable qemu-guest-agent.service 2>/dev/null || true
systemctl enable serial-getty@ttyS0.service 2>/dev/null || true
systemctl enable serial-getty@ttyS1.service 2>/dev/null || true
systemctl enable bee-journal-mirror@ttyS1.service 2>/dev/null || true
# Ensure scripts are executable
chmod +x /usr/local/bin/bee-network.sh 2>/dev/null || true
chmod +x /usr/local/bin/bee-nvidia-load 2>/dev/null || true
chmod +x /usr/local/bin/bee-sshsetup 2>/dev/null || true
chmod +x /usr/local/bin/bee-smoketest 2>/dev/null || true
chmod +x /usr/local/bin/bee-tui 2>/dev/null || true
chmod +x /usr/local/bin/bee 2>/dev/null || true
chmod +x /usr/local/bin/bee-log-run 2>/dev/null || true
# Reload udev rules
udevadm control --reload-rules 2>/dev/null || true
# rocm-smi symlink (package installs to /opt/rocm-*/bin/rocm-smi)
if [ ! -e /usr/local/bin/rocm-smi ]; then
smi_path="$(find /opt -path '*/bin/rocm-smi' -type f 2>/dev/null | sort | tail -1)"
[ -n "${smi_path}" ] && ln -sf "${smi_path}" /usr/local/bin/rocm-smi
fi
# Create export directory
mkdir -p /appdata/bee/export

View File

@@ -1,39 +0,0 @@
#!/bin/sh
# 9001-amd-rocm.hook.chroot — install AMD ROCm SMI tool for Instinct GPU monitoring.
# Runs inside the live-build chroot. Adds AMD's apt repository and installs
# rocm-smi-lib which provides the `rocm-smi` CLI (analogous to nvidia-smi).
set -e
ROCM_VERSION="6.4"
ROCM_KEYRING="/etc/apt/keyrings/rocm.gpg"
ROCM_LIST="/etc/apt/sources.list.d/rocm.list"
echo "=== AMD ROCm ${ROCM_VERSION}: adding repository ==="
mkdir -p /etc/apt/keyrings
# Download and import AMD GPG key
if ! wget -qO- "https://repo.radeon.com/rocm/rocm.gpg.key" \
| gpg --dearmor > "${ROCM_KEYRING}"; then
echo "WARN: failed to fetch AMD ROCm GPG key — skipping ROCm install"
exit 0
fi
cat > "${ROCM_LIST}" <<EOF
deb [arch=amd64 signed-by=${ROCM_KEYRING}] https://repo.radeon.com/rocm/apt/${ROCM_VERSION} bookworm main
EOF
apt-get update -qq
# rocm-smi-lib provides the rocm-smi CLI tool for GPU monitoring
if apt-get install -y --no-install-recommends rocm-smi-lib 2>/dev/null; then
echo "=== AMD ROCm: rocm-smi installed ==="
rocm-smi --version 2>/dev/null || true
else
echo "WARN: rocm-smi-lib install failed — GPU monitoring unavailable"
fi
# Clean up apt lists to keep ISO size down
rm -f "${ROCM_LIST}"
apt-get clean

View File

@@ -0,0 +1,32 @@
#!/bin/sh
# 9999-slim.hook.chroot — strip non-essential files to reduce squashfs size.
set -e
# ── Man pages and documentation ───────────────────────────────────────────────
find /usr/share/man -mindepth 1 -delete 2>/dev/null || true
find /usr/share/doc -mindepth 1 ! -name 'copyright' -delete 2>/dev/null || true
find /usr/share/info -mindepth 1 -delete 2>/dev/null || true
find /usr/share/groff -mindepth 1 -delete 2>/dev/null || true
find /usr/share/lintian -mindepth 1 -delete 2>/dev/null || true
# ── Locales — keep only C and en_US ──────────────────────────────────────────
find /usr/share/locale -mindepth 1 -maxdepth 1 \
! -name 'en' ! -name 'en_US' ! -name 'locale.alias' \
-exec rm -rf {} + 2>/dev/null || true
find /usr/share/i18n/locales -mindepth 1 \
! -name 'en_US' ! -name 'i18n' ! -name 'iso14651_t1' ! -name 'iso14651_t1_common' \
-delete 2>/dev/null || true
# ── Python cache ──────────────────────────────────────────────────────────────
find /usr /opt -name '__pycache__' -type d -exec rm -rf {} + 2>/dev/null || true
find /usr /opt -name '*.pyc' -delete 2>/dev/null || true
# ── APT cache and lists ───────────────────────────────────────────────────────
apt-get clean
rm -rf /var/lib/apt/lists/*
# ── Misc ──────────────────────────────────────────────────────────────────────
rm -rf /tmp/* /var/tmp/* 2>/dev/null || true
find /var/log -type f -delete 2>/dev/null || true
echo "=== slim: done ==="

View File

@@ -18,8 +18,15 @@ qemu-guest-agent
# SSH
openssh-server
# Disk installer
squashfs-tools
parted
grub-pc-bin
grub-efi-amd64-bin
# Filesystem support for USB export targets
exfatprogs
exfat-fuse
ntfs-3g
# Utilities
@@ -41,9 +48,33 @@ stress-ng
# QR codes (for displaying audit results)
qrencode
# Local desktop (openbox + chromium kiosk)
openbox
tint2
xorg
xterm
chromium
xserver-xorg-video-fbdev
xserver-xorg-video-vesa
lightdm
# Firmware
firmware-linux-free
firmware-linux-nonfree
firmware-misc-nonfree
firmware-amd-graphics
firmware-realtek
firmware-intel-sound
firmware-bnx2
firmware-bnx2x
firmware-cavium
firmware-qlogic
# NVIDIA DCGM (Data Center GPU Manager) — dcgmi diag for acceptance testing
datacenter-gpu-manager=1:%%DCGM_VERSION%%
# AMD ROCm SMI — GPU monitoring for Instinct cards (repo: rocm/apt/6.3.4 jammy)
rocm-smi-lib=%%ROCM_SMI_VERSION%%
# glibc compat helpers (for any external binaries that need it)
libc6

View File

@@ -26,6 +26,15 @@ echo ""
KVER=$(uname -r)
info "kernel: $KVER"
NVIDIA_BOOT_MODE="normal"
for arg in $(cat /proc/cmdline 2>/dev/null); do
case "$arg" in
bee.nvidia.mode=*)
NVIDIA_BOOT_MODE="${arg#*=}"
;;
esac
done
info "nvidia boot mode: ${NVIDIA_BOOT_MODE}"
# --- PATH & binaries ---
echo "-- PATH & binaries --"
@@ -53,17 +62,25 @@ else
fail "NVIDIA ko dir missing: $KO_DIR"
fi
for mod in nvidia nvidia_modeset nvidia_uvm; do
if /sbin/lsmod 2>/dev/null | grep -q "^nvidia "; then
ok "module loaded: nvidia"
else
fail "module NOT loaded: nvidia"
fi
for mod in nvidia_modeset nvidia_uvm; do
if /sbin/lsmod 2>/dev/null | grep -q "^$mod "; then
ok "module loaded: $mod"
elif [ "${NVIDIA_BOOT_MODE}" = "normal" ] || [ "${NVIDIA_BOOT_MODE}" = "full" ]; then
fail "module NOT loaded in normal mode: $mod"
else
fail "module NOT loaded: $mod"
warn "module not loaded in GSP-off mode: $mod"
fi
done
echo ""
echo "-- NVIDIA device nodes --"
for dev in nvidiactl nvidia0 nvidia-uvm; do
for dev in nvidiactl nvidia0; do
if [ -e "/dev/$dev" ]; then
ok "/dev/$dev exists"
else
@@ -71,6 +88,14 @@ for dev in nvidiactl nvidia0 nvidia-uvm; do
fi
done
if [ -e /dev/nvidia-uvm ]; then
ok "/dev/nvidia-uvm exists"
elif [ "${NVIDIA_BOOT_MODE}" = "normal" ] || [ "${NVIDIA_BOOT_MODE}" = "full" ]; then
fail "/dev/nvidia-uvm missing in normal mode"
else
warn "/dev/nvidia-uvm missing — CUDA stress path may be unavailable until loaded on demand"
fi
echo ""
echo "-- nvidia-smi --"
if PATH="/usr/local/bin:$PATH" command -v nvidia-smi >/dev/null 2>&1; then

View File

@@ -0,0 +1,2 @@
allowed_users=anybody
needs_root_rights=yes

View File

@@ -0,0 +1,11 @@
Section "Device"
Identifier "fbdev"
Driver "fbdev"
Option "fbdev" "/dev/fb0"
EndSection
Section "Screen"
Identifier "screen0"
Device "fbdev"
DefaultDepth 24
EndSection

View File

@@ -0,0 +1,5 @@
[Seat:*]
autologin-user=bee
autologin-user-timeout=0
autologin-session=openbox
user-session=openbox

View File

@@ -12,6 +12,6 @@
Export dir: /appdata/bee/export
Self-check: /appdata/bee/export/runtime-health.json
Open TUI: bee-tui
Web UI: http://<ip>/
SSH access: key auth (developers) or bee/eeb (password fallback)

View File

@@ -1,20 +1,18 @@
export PATH="$PATH:/usr/local/bin"
export PATH="$PATH:/usr/local/bin:/opt/rocm/bin:/opt/rocm/sbin"
menu() {
if [ -x /usr/local/bin/bee-tui ]; then
/usr/local/bin/bee-tui "$@"
else
echo "bee-tui is not installed"
return 1
fi
}
# On the local console, keep the shell visible and let the operator
# start the TUI explicitly. This avoids black-screen failures if the
# terminal implementation does not support the TUI well.
# Print web UI URLs on the local console at login.
if [ -z "${SSH_CONNECTION:-}" ] \
&& [ -z "${SSH_TTY:-}" ] \
&& [ "$(tty 2>/dev/null)" = "/dev/tty1" ]; then
&& [ -z "${SSH_TTY:-}" ]; then
echo "Bee live environment ready."
echo "Run 'menu' to open the TUI."
echo ""
echo " Web UI (local): http://localhost/"
# Print IP addresses for remote access
_ips=$(ip -4 addr show scope global 2>/dev/null | awk '/inet /{print $2}' | cut -d/ -f1)
for _ip in $_ips; do
echo " Web UI (remote): http://$_ip/"
done
unset _ips _ip
echo ""
echo " Network setup: netconf"
echo " Kernel logs: Alt+F2 | Extra shell: Alt+F3"
fi

View File

@@ -0,0 +1,4 @@
[Journal]
# Do not forward service logs to the console — prevents log spam on
# physical monitors and the local openbox desktop.
ForwardToConsole=no

View File

@@ -0,0 +1,4 @@
[Journal]
ForwardToConsole=yes
TTYPath=/dev/ttyS0
MaxLevelConsole=info

View File

@@ -5,9 +5,9 @@ Before=bee-web.service
[Service]
Type=oneshot
ExecStart=/bin/sh -c '/usr/local/bin/bee audit --runtime livecd --output file:/appdata/bee/export/bee-audit.json; rc=$?; if [ "$rc" -ne 0 ]; then echo "[bee-audit] WARN: audit exited with rc=$rc"; fi; exit 0'
StandardOutput=append:/appdata/bee/export/bee-audit.log
StandardError=append:/appdata/bee/export/bee-audit.log
ExecStart=/usr/local/bin/bee-log-run /appdata/bee/export/bee-audit.log /bin/sh -c '/usr/local/bin/bee audit --runtime livecd --output file:/appdata/bee/export/bee-audit.json; rc=$?; if [ "$rc" -ne 0 ]; then echo "[bee-audit] WARN: audit exited with rc=$rc"; fi; exit 0'
StandardOutput=journal
StandardError=journal
RemainAfterExit=yes
[Install]

View File

@@ -0,0 +1,16 @@
[Unit]
Description=Bee: mirror system journal to %I
After=systemd-journald.service
Requires=systemd-journald.service
ConditionPathExists=/dev/%I
[Service]
Type=simple
ExecStart=/bin/sh -c 'exec journalctl -f -n 200 -o short-monotonic > /dev/%I'
Restart=always
RestartSec=1
StandardOutput=null
StandardError=journal
[Install]
WantedBy=multi-user.target

View File

@@ -5,9 +5,9 @@ Before=network-online.target bee-audit.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/bee-network.sh
StandardOutput=append:/appdata/bee/export/bee-network.log
StandardError=append:/appdata/bee/export/bee-network.log
ExecStart=/usr/local/bin/bee-log-run /appdata/bee/export/bee-network.log /usr/local/bin/bee-network.sh
StandardOutput=journal
StandardError=journal
RemainAfterExit=yes
[Install]

View File

@@ -5,9 +5,9 @@ Before=bee-audit.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/bee-nvidia-load
StandardOutput=append:/appdata/bee/export/bee-nvidia.log
StandardError=append:/appdata/bee/export/bee-nvidia.log
ExecStart=/usr/local/bin/bee-log-run /appdata/bee/export/bee-nvidia.log /usr/local/bin/bee-nvidia-load
StandardOutput=journal
StandardError=journal
RemainAfterExit=yes
[Install]

View File

@@ -5,9 +5,9 @@ Before=bee-audit.service
[Service]
Type=oneshot
ExecStart=/bin/sh -c '/usr/local/bin/bee preflight --output file:/appdata/bee/export/runtime-health.json; rc=$?; if [ "$rc" -ne 0 ]; then echo "[bee-preflight] WARN: preflight exited with rc=$rc"; fi; exit 0'
StandardOutput=append:/appdata/bee/export/runtime-health.log
StandardError=append:/appdata/bee/export/runtime-health.log
ExecStart=/usr/local/bin/bee-log-run /appdata/bee/export/runtime-health.log /bin/sh -c '/usr/local/bin/bee preflight --output file:/appdata/bee/export/runtime-health.json; rc=$?; if [ "$rc" -ne 0 ]; then echo "[bee-preflight] WARN: preflight exited with rc=$rc"; fi; exit 0'
StandardOutput=journal
StandardError=journal
RemainAfterExit=yes
[Install]

View File

@@ -5,9 +5,9 @@ Before=ssh.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/bee-sshsetup
StandardOutput=append:/appdata/bee/export/bee-sshsetup.log
StandardError=append:/appdata/bee/export/bee-sshsetup.log
ExecStart=/usr/local/bin/bee-log-run /appdata/bee/export/bee-sshsetup.log /usr/local/bin/bee-sshsetup
StandardOutput=journal
StandardError=journal
RemainAfterExit=yes
[Install]

View File

@@ -5,11 +5,11 @@ Wants=bee-audit.service
[Service]
Type=simple
ExecStart=/usr/local/bin/bee web --listen :80 --audit-path /appdata/bee/export/bee-audit.json --export-dir /appdata/bee/export --title "Bee Hardware Audit"
ExecStart=/usr/local/bin/bee-log-run /appdata/bee/export/bee-web.log /usr/local/bin/bee web --listen :80 --audit-path /appdata/bee/export/bee-audit.json --export-dir /appdata/bee/export --title "Bee Hardware Audit"
Restart=always
RestartSec=2
StandardOutput=append:/appdata/bee/export/bee-web.log
StandardError=append:/appdata/bee/export/bee-web.log
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,6 @@
[Service]
# On server hardware without a usable framebuffer X may fail to start.
# Limit restarts so the console is not flooded on headless deployments.
RestartSec=10
StartLimitIntervalSec=60
StartLimitBurst=3

View File

@@ -1,13 +1 @@
export PATH="/usr/local/bin:$PATH"
if [ -z "${SSH_CONNECTION:-}" ] \
&& [ -z "${SSH_TTY:-}" ] \
&& [ "$(tty 2>/dev/null)" = "/dev/tty1" ]; then
if command -v menu >/dev/null 2>&1; then
menu
elif [ -x /usr/local/bin/bee-tui ]; then
/usr/local/bin/bee-tui
else
echo "Bee menu is unavailable."
fi
fi

View File

@@ -0,0 +1,189 @@
#!/bin/bash
# bee-install — install the live system to a local disk.
#
# Usage: bee-install <device> [logfile]
# device — target block device, e.g. /dev/sda (will be WIPED)
# logfile — optional path to write progress log (default: /tmp/bee-install.log)
#
# Layout (UEFI): GPT, /dev/sdX1=EFI 512MB vfat, /dev/sdX2=root ext4
# Layout (BIOS): MBR, /dev/sdX1=root ext4
#
# Squashfs source: /run/live/medium/live/filesystem.squashfs
set -euo pipefail
DEVICE="${1:-}"
LOGFILE="${2:-/tmp/bee-install.log}"
if [ -z "$DEVICE" ]; then
echo "Usage: bee-install <device> [logfile]" >&2
exit 1
fi
if [ ! -b "$DEVICE" ]; then
echo "ERROR: $DEVICE is not a block device" >&2
exit 1
fi
SQUASHFS="/run/live/medium/live/filesystem.squashfs"
if [ ! -f "$SQUASHFS" ]; then
echo "ERROR: squashfs not found at $SQUASHFS" >&2
exit 1
fi
MOUNT_ROOT="/mnt/bee-install-root"
# ------------------------------------------------------------------
log() { echo "[$(date +%H:%M:%S)] $*" | tee -a "$LOGFILE"; }
die() { log "ERROR: $*"; exit 1; }
# ------------------------------------------------------------------
# Detect UEFI
if [ -d /sys/firmware/efi ]; then
UEFI=1
log "Boot mode: UEFI"
else
UEFI=0
log "Boot mode: BIOS/legacy"
fi
# Determine partition names (nvme uses p-suffix: nvme0n1p1)
if echo "$DEVICE" | grep -qE 'nvme|mmcblk'; then
PART_PREFIX="${DEVICE}p"
else
PART_PREFIX="${DEVICE}"
fi
if [ "$UEFI" = "1" ]; then
PART_EFI="${PART_PREFIX}1"
PART_ROOT="${PART_PREFIX}2"
else
PART_ROOT="${PART_PREFIX}1"
fi
# ------------------------------------------------------------------
log "=== BEE DISK INSTALLER ==="
log "Target device : $DEVICE"
log "Root partition: $PART_ROOT"
[ "$UEFI" = "1" ] && log "EFI partition : $PART_EFI"
log "Squashfs : $SQUASHFS ($(du -sh "$SQUASHFS" | cut -f1))"
log "Log : $LOGFILE"
log ""
# ------------------------------------------------------------------
log "--- Step 1/7: Unmounting target device ---"
# Unmount any partitions on target device
for part in "${DEVICE}"* ; do
if [ "$part" = "$DEVICE" ]; then continue; fi
if mountpoint -q "$part" 2>/dev/null; then
log " umount $part"
umount "$part" || true
fi
done
# Also unmount our mount point if leftover
umount "${MOUNT_ROOT}" 2>/dev/null || true
umount "${MOUNT_ROOT}/boot/efi" 2>/dev/null || true
# ------------------------------------------------------------------
log "--- Step 2/7: Partitioning $DEVICE ---"
if [ "$UEFI" = "1" ]; then
parted -s "$DEVICE" mklabel gpt
parted -s "$DEVICE" mkpart EFI fat32 1MiB 513MiB
parted -s "$DEVICE" set 1 esp on
parted -s "$DEVICE" mkpart root ext4 513MiB 100%
else
parted -s "$DEVICE" mklabel msdos
parted -s "$DEVICE" mkpart primary ext4 1MiB 100%
parted -s "$DEVICE" set 1 boot on
fi
# Wait for kernel to see new partitions
sleep 1
partprobe "$DEVICE" 2>/dev/null || true
sleep 1
log " Partitioning done."
# ------------------------------------------------------------------
log "--- Step 3/7: Formatting ---"
if [ "$UEFI" = "1" ]; then
mkfs.vfat -F32 -n EFI "$PART_EFI"
log " $PART_EFI formatted as vfat (EFI)"
fi
mkfs.ext4 -F -L bee-root "$PART_ROOT"
log " $PART_ROOT formatted as ext4"
# ------------------------------------------------------------------
log "--- Step 4/7: Mounting target ---"
mkdir -p "$MOUNT_ROOT"
mount "$PART_ROOT" "$MOUNT_ROOT"
if [ "$UEFI" = "1" ]; then
mkdir -p "${MOUNT_ROOT}/boot/efi"
mount "$PART_EFI" "${MOUNT_ROOT}/boot/efi"
fi
log " Mounted."
# ------------------------------------------------------------------
log "--- Step 5/7: Unpacking filesystem (this takes 10-20 minutes) ---"
log " Source: $SQUASHFS"
log " Target: $MOUNT_ROOT"
unsquashfs -f -d "$MOUNT_ROOT" "$SQUASHFS" 2>&1 | \
grep -E '^\[|^inod|^created|^extract' | \
while read -r line; do log " $line"; done || true
log " Unpack complete."
# ------------------------------------------------------------------
log "--- Step 6/7: Configuring installed system ---"
# Write /etc/fstab
ROOT_UUID=$(blkid -s UUID -o value "$PART_ROOT")
log " Root UUID: $ROOT_UUID"
cat > "${MOUNT_ROOT}/etc/fstab" <<FSTAB
# Generated by bee-install
UUID=${ROOT_UUID} / ext4 defaults,errors=remount-ro 0 1
tmpfs /tmp tmpfs defaults,size=512m 0 0
FSTAB
if [ "$UEFI" = "1" ]; then
EFI_UUID=$(blkid -s UUID -o value "$PART_EFI")
echo "UUID=${EFI_UUID} /boot/efi vfat umask=0077 0 1" >> "${MOUNT_ROOT}/etc/fstab"
fi
log " fstab written."
# Remove live-boot persistence markers so installed system boots normally
rm -f "${MOUNT_ROOT}/etc/live/boot.conf" 2>/dev/null || true
rm -f "${MOUNT_ROOT}/etc/live/live.conf" 2>/dev/null || true
# Bind mount virtual filesystems for chroot
mount --bind /dev "${MOUNT_ROOT}/dev"
mount --bind /proc "${MOUNT_ROOT}/proc"
mount --bind /sys "${MOUNT_ROOT}/sys"
[ "$UEFI" = "1" ] && mount --bind /sys/firmware/efi/efivars "${MOUNT_ROOT}/sys/firmware/efi/efivars" 2>/dev/null || true
# ------------------------------------------------------------------
log "--- Step 7/7: Installing GRUB bootloader ---"
if [ "$UEFI" = "1" ]; then
chroot "$MOUNT_ROOT" grub-install \
--target=x86_64-efi \
--efi-directory=/boot/efi \
--bootloader-id=bee \
--recheck 2>&1 | while read -r line; do log " $line"; done || true
else
chroot "$MOUNT_ROOT" grub-install \
--target=i386-pc \
--recheck \
"$DEVICE" 2>&1 | while read -r line; do log " $line"; done || true
fi
chroot "$MOUNT_ROOT" update-grub 2>&1 | while read -r line; do log " $line"; done || true
log " GRUB installed."
# ------------------------------------------------------------------
# Cleanup
log "--- Cleanup ---"
umount "${MOUNT_ROOT}/sys/firmware/efi/efivars" 2>/dev/null || true
umount "${MOUNT_ROOT}/sys" 2>/dev/null || true
umount "${MOUNT_ROOT}/proc" 2>/dev/null || true
umount "${MOUNT_ROOT}/dev" 2>/dev/null || true
[ "$UEFI" = "1" ] && umount "${MOUNT_ROOT}/boot/efi" 2>/dev/null || true
umount "$MOUNT_ROOT" 2>/dev/null || true
rmdir "$MOUNT_ROOT" 2>/dev/null || true
log ""
log "=== INSTALLATION COMPLETE ==="
log "Remove the ISO and reboot to start the installed system."

View File

@@ -0,0 +1,29 @@
#!/bin/bash
# bee-log-run — run a command, append its output to a file, and keep stdout/stderr
# connected to systemd so journald and the serial console also receive the logs.
set -o pipefail
log_file="$1"
shift
if [ -z "$log_file" ] || [ "$#" -eq 0 ]; then
echo "usage: $0 <log-file> <command> [args...]" >&2
exit 2
fi
mkdir -p "$(dirname "$log_file")"
serial_sink() {
local tty="$1"
if [ -w "$tty" ]; then
cat > "$tty"
else
cat > /dev/null
fi
}
"$@" 2>&1 | tee -a "$log_file" \
>(serial_sink /dev/ttyS0) \
>(serial_sink /dev/ttyS1)
exit "${PIPESTATUS[0]}"

View File

@@ -22,24 +22,61 @@ fi
log "module dir: $NVIDIA_KO_DIR"
ls "$NVIDIA_KO_DIR"/*.ko 2>/dev/null | sed 's/^/ /' || true
# Some kernels expose backlight helper symbols only after loading `video`.
modprobe video >/dev/null 2>&1 && log "loaded helper module: video" || log "helper module unavailable: video"
cmdline_param() {
key="$1"
for token in $(cat /proc/cmdline 2>/dev/null); do
case "$token" in
"$key"=*)
echo "${token#*=}"
return 0
;;
esac
done
return 1
}
# Load modules via insmod (direct load — no depmod needed)
for mod in nvidia nvidia-modeset nvidia-uvm; do
nvidia_mode="$(cmdline_param bee.nvidia.mode || true)"
if [ -z "$nvidia_mode" ]; then
nvidia_mode="normal"
fi
log "boot mode: $nvidia_mode"
load_module() {
mod="$1"
shift
ko="$NVIDIA_KO_DIR/${mod}.ko"
[ -f "$ko" ] || ko="$NVIDIA_KO_DIR/${mod//-/_}.ko"
if [ -f "$ko" ]; then
if insmod "$ko"; then
log "loaded: $mod"
else
log "WARN: failed to load: $mod"
dmesg | tail -n 5 | sed 's/^/ dmesg: /' || true
fi
else
if [ ! -f "$ko" ]; then
log "WARN: not found: $ko"
return 1
fi
done
if insmod "$ko" "$@"; then
log "loaded: $mod $*"
return 0
fi
log "WARN: failed to load: $mod"
dmesg | tail -n 10 | sed 's/^/ dmesg: /' || true
return 1
}
case "$nvidia_mode" in
normal|full)
if ! load_module nvidia; then
exit 1
fi
load_module nvidia-modeset || true
load_module nvidia-uvm || true
;;
gsp-off|safe|*)
# NVIDIA documents that GSP firmware is enabled by default on newer GPUs and can
# be disabled via NVreg_EnableGpuFirmware=0. Safe mode keeps the live ISO on the
# conservative path for platforms where full boot-time GSP init is unstable.
if ! load_module nvidia NVreg_EnableGpuFirmware=0; then
exit 1
fi
log "GSP-off mode: skipping nvidia-modeset and nvidia-uvm during boot"
;;
esac
# Create /dev/nvidia* device nodes (udev rules absent since we use .run installer)
nvidia_major=$(grep -m1 ' nvidiactl$' /proc/devices | awk '{print $1}')
@@ -61,8 +98,11 @@ if [ -n "$uvm_major" ]; then
&& log "created /dev/nvidia-uvm (major $uvm_major)" \
|| log "WARN: /dev/nvidia-uvm already exists"
mknod -m 666 /dev/nvidia-uvm-tools c "$uvm_major" 1 || true
else
log "WARN: nvidia-uvm not in /proc/devices"
fi
# Refresh dynamic linker cache so that NVIDIA/NCCL libs injected into /usr/lib/
# are visible to dlopen() calls (libcuda, libnvidia-ptxjitcompiler, libnccl, etc.)
ldconfig 2>/dev/null || true
log "ldconfig refreshed"
log "done"

View File

@@ -0,0 +1,24 @@
#!/bin/sh
# openbox session: launch tint2 taskbar + chromium, then openbox as WM.
# This file is used as an xinitrc by bee-desktop.
# Wait for bee-web to be accepting connections (up to 15 seconds)
i=0
while [ $i -lt 15 ]; do
if curl -sf http://localhost/healthz >/dev/null 2>&1; then
break
fi
sleep 1
i=$((i+1))
done
tint2 &
chromium \
--disable-infobars \
--disable-translate \
--no-first-run \
--disable-session-crashed-bubble \
--disable-features=TranslateUI \
http://localhost/ &
exec openbox

View File

@@ -1,9 +0,0 @@
#!/bin/sh
clear
if [ "$(id -u)" -ne 0 ]; then
exec sudo -n /usr/local/bin/bee tui --runtime livecd "$@"
fi
exec /usr/local/bin/bee tui --runtime livecd "$@"

163
iso/overlay/usr/local/bin/netconf Executable file
View File

@@ -0,0 +1,163 @@
#!/bin/sh
# Quick network configurator for the local console.
# Type 'a' at any prompt to abort, 'b' to go back.
set -e
abort() { echo "Aborted."; exit 0; }
ask() {
# ask VARNAME "prompt" [default]
# Sets VARNAME. Returns 1 on 'b' (back), calls abort on 'a'.
_var="$1"; _prompt="$2"; _default="$3"
while true; do
if [ -n "$_default" ]; then
printf "%s [%s] (b=back a=abort): " "$_prompt" "$_default"
else
printf "%s (b=back a=abort): " "$_prompt"
fi
read _input
case "$_input" in
a|A) abort ;;
b|B) return 1 ;;
"")
if [ -n "$_default" ]; then
eval "$_var=\"\$_default\""
return 0
else
echo " Required — please enter a value."
fi
;;
*)
eval "$_var=\"\$_input\""
return 0
;;
esac
done
}
# ── Step 1: choose interface ───────────────────────────────────────────────────
choose_iface() {
IFACES=$(ip -o link show | awk -F': ' '$2 != "lo" {print $2}' | cut -d@ -f1)
if [ -z "$IFACES" ]; then
echo "No network interfaces found."
exit 1
fi
echo ""
echo "Interfaces:"
i=1
for iface in $IFACES; do
ip=$(ip -4 addr show "$iface" 2>/dev/null | awk '/inet /{print $2}' | head -1)
echo " $i) $iface ${ip:-no IP}"
i=$((i+1))
done
echo ""
FIRST=$(echo "$IFACES" | head -1)
while true; do
printf "Interface number or name [%s] (a=abort): " "$FIRST"
read INPUT
case "$INPUT" in
a|A) abort ;;
"")
IFACE="$FIRST"
break
;;
*)
if echo "$INPUT" | grep -qE '^[0-9]+$'; then
IFACE=$(echo "$IFACES" | awk "NR==$INPUT")
if [ -z "$IFACE" ]; then
echo " No interface #$INPUT — try again."
continue
fi
else
# Validate name exists
if ! echo "$IFACES" | grep -qx "$INPUT"; then
echo " Unknown interface '$INPUT' — try again."
continue
fi
IFACE="$INPUT"
fi
break
;;
esac
done
echo "Selected: $IFACE"
}
# ── Step 2: choose mode ────────────────────────────────────────────────────────
choose_mode() {
echo ""
echo " 1) DHCP"
echo " 2) Static IP"
echo ""
while true; do
printf "Mode [1] (b=back a=abort): "
read INPUT
case "$INPUT" in
a|A) abort ;;
b|B) return 1 ;;
""|1) MODE=dhcp; break ;;
2) MODE=static; break ;;
*) echo " Enter 1 or 2." ;;
esac
done
}
# ── Step 3a: DHCP ─────────────────────────────────────────────────────────────
run_dhcp() {
echo "Running DHCP on $IFACE..."
dhclient -v "$IFACE"
}
# ── Step 3b: Static ───────────────────────────────────────────────────────────
run_static() {
while true; do
ask ADDR "IP address (e.g. 192.168.1.100/24)" || return 1
# Basic format check: must contain a dot and a /
if ! echo "$ADDR" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/[0-9]+$'; then
echo " Invalid format — use x.x.x.x/prefix (e.g. 192.168.1.10/24)."
continue
fi
break
done
while true; do
ask GW "Gateway (e.g. 192.168.1.1)" || return 1
if ! echo "$GW" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
echo " Invalid IP address."
continue
fi
break
done
ask DNS "DNS server" "8.8.8.8" || return 1
ip addr flush dev "$IFACE"
ip addr add "$ADDR" dev "$IFACE"
ip link set "$IFACE" up
ip route add default via "$GW" 2>/dev/null || true
echo "nameserver $DNS" > /etc/resolv.conf
echo "Done."
}
# ── Main loop ─────────────────────────────────────────────────────────────────
choose_iface
while true; do
choose_mode || { choose_iface; continue; }
if [ "$MODE" = "dhcp" ]; then
run_dhcp && break
else
run_static && break || continue
fi
done
echo ""
ip -4 addr show "$IFACE"

Some files were not shown because too many files have changed in this diff Show More