210 Commits

Author SHA1 Message Date
9e55728053 feat(iso): replace --clean-cache with --clean-build (cleans + rebuilds)
--clean-build clears all caches (Go, NVIDIA, lb packages, work dir)
and rebuilds the Docker image, then proceeds with a full clean build.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v2.8
2026-03-28 10:12:21 +03:00
4b8023c1cb feat(iso): add --clean-cache option to build-in-container.sh
Removes all cached build artifacts: Go cache, NVIDIA/NCCL/cuBLAS
downloads, lb package cache, and live-build work dir. Use before
a clean rebuild or when switching Debian/kernel versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:11:31 +03:00
4c8417d20a feat(webui): add Install to Disk page
Expose the existing bee-install script through the web UI:
- platform/install.go: remove USB exclusion, add SizeBytes/MountedParts
  fields, add MinInstallBytes()/DiskWarnings() safety checks (size,
  mounted partitions, toram+low-RAM warning)
- webui: add GET /api/install/disks, POST /api/install/run,
  GET /api/install/stream endpoints
- webui: add Install to Disk page with disk table, warning badges,
  device-name confirmation gate, SSE progress terminal, reboot button

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:11:16 +03:00
0755374dd2 perf(iso): speed up builds — zstd squashfs + preserve lb chroot cache
- Switch squashfs compression from xz to zstd (3-5x faster compression,
  ~10-15% larger but decompresses faster at boot)
- Stop rm -rf BUILD_WORK_DIR on each build; rsync only config changes
  so lb can reuse its chroot across builds (skips apt install step)
- Keep lb-packages cache in CACHE_ROOT as fallback if work dir is wiped

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:10:29 +03:00
c70ae274fa revert(iso): remove apt-cacher-ng support, use lb package cache instead
apt-cacher-ng requires a separate container; lb's own package cache
persisted in --cache-dir is simpler and sufficient.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 10:02:34 +03:00
23ad7ff534 feat(iso): persist lb package cache across builds in cache dir
Saves cache/packages.chroot before wiping BUILD_WORK_DIR and
restores it after, so apt packages are not re-downloaded on every
build. Cache lives in --cache-dir (same place as Go/NVIDIA cache).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 09:59:55 +03:00
de130966f7 feat(iso): add APT_PROXY support to speed up builds via apt-cacher-ng
Pass APT_PROXY=http://host:3142 to build-in-container.sh to route
all apt traffic through a local cache. Also supports --apt-proxy flag.
Mirrors in auto/config are set from BEE_APT_PROXY env when present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 09:57:54 +03:00
c6fbfc8306 fix(boot): restore toram as menu option only, not default boot param
toram was incorrectly added to the default bootappend-live causing
every boot to copy the full ISO to RAM (slow on BMC virtual media).
Default boot reads squashfs from media; toram is available as a
separate menu entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 09:52:25 +03:00
35ad1c74d9 feat(iso): add slim hook to strip locales/man pages/apt cache from squashfs
Removes ~100-300MB from the squashfs: man pages, non-en locales,
python cache, apt lists and package cache, temp files and logs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:44:02 +03:00
4a02e74b17 fix(iso): add git safe.directory so git describe sees v* tags inside container
Without this, git refuses to read the bind-mounted repo (UID mismatch)
and describe returns empty, causing the version to fall back to iso/v1.0.20.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:23:37 +03:00
cd2853ad99 fix(webui): fix viewer static path so Reanimator Chart CSS loads correctly
Mount chart submodule static assets at /static/ (matching the template's
hardcoded href), fix nav to include Audit Snapshot tab, remove dead
renderViewerPage code and iframe from Dashboard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:19:17 +03:00
6caf771d6e fix(boot): restore toram kernel parameter
Without toram the squashfs is read from the physical medium at runtime.
Disconnecting the USB/CD after boot causes SQUASHFS I/O errors on any
uncached block, making all X11 apps crash with SIGBUS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 08:04:37 +03:00
14fa87b7d7 feat(netconf): add input validation, 'b' to go back, 'a' to abort
- All prompts accept 'a' = abort, 'b' = back to previous step
- Interface input: validate numeric range and name existence, re-prompt on bad input
- IP address: regex check x.x.x.x/prefix format
- Gateway: regex check x.x.x.x format
- Main loop: 'b' at mode selection goes back to interface list

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 07:31:23 +03:00
600ece911b fix(desktop): remove forced 1920x1080 modeline, limit LightDM restarts
On real server hardware (IPMI/BMC AST chip + nomodeset) the VESA
framebuffer is set by BIOS at whatever resolution it chooses (often
1024x768 or 1280x1024). The hardcoded 1920x1080 Modeline caused X to
fail → LightDM crash-loop → SOL console flooded with systemd messages.

- Remove Monitor section / Modeline from xorg.conf — fbdev now uses
  whatever framebuffer resolution the kernel provides
- Add lightdm.service.d/bee-limits.conf: RestartSec=10,
  max 3 restarts per 60s so headless hardware doesn't spam the console

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 07:30:51 +03:00
2d424c63cb fix(netconf): accept interface number as input, not just name
User sees a numbered list but could only type the name.
Now numeric input is resolved to the interface name via awk NR==N.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 07:27:49 +03:00
50f28d1ee6 chore: drop legacy TUI/dead code
- Delete audit/internal/app/panel.go (388 lines, zero callers — TUI panel remnant)
- Delete RenderGPULiveChart() from platform/gpu_metrics.go (~155 lines, never called)
- Move formatSATDetail/cleanSummaryKey helpers to app.go (still used)
- Update motd: replace bee-tui with Web UI hint
- Update journald.conf.d comment: remove bee-tui reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 07:27:30 +03:00
3579747ae3 fix(iso): prioritise v[0-9]* tags over iso/v* for ISO filename
Plain v2.x tags are now the active tagging scheme; iso/v1.0.x tags
are legacy. Swap priority in resolve_iso_version so the ISO is named
bee-debian12-v2.x-amd64.iso instead of v1.0.x-N-gHASH.
Also tighten the v* pattern to v[0-9]* to avoid accidentally matching
other prefixed tags in both resolve functions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:34:09 +03:00
09dc7d2613 feat(webui): apply light theme from chart submodule CSS
Replace dark #0f1117 theme with clean white/Semantic-UI-inspired
design matching the updated internal/chart submodule: white surface,
dark sidebar (#1b1c1d), Lato font, blue accent (#2185d0), subtle
borders. Also update submodule pointer to latest commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:31:29 +03:00
ec0b7f7ff9 feat(metrics): single chart engine + full-width stacked layout
- One engine: go-analyze/charts (grafana theme) for all live metrics
- Server chart: CPU temp, CPU load%, mem load%, power W, fan RPMs
- GPU charts: temp, load%, mem%, power W — one card per GPU, added dynamically
- Charts 1400x280px SVG, rendered at width:100% in single-column layout
- Add CPU load (from /proc/stat) and mem load (from /proc/meminfo) to LiveMetricSample
- Add GPU mem utilization to GPUMetricRow (nvidia-smi utilization.memory)
- Document charting architecture in bible-local/architecture/charting.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v2.7
2026-03-27 23:26:13 +03:00
e7a7ff54b9 chore: add Makefile with run/build/test targets
make run                          — starts web UI on :8080
make run LISTEN=:9090             — custom port
make run AUDIT_PATH=/tmp/bee.json — with audit data
make build / make test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:14:53 +03:00
b4371e291e fix(build): resolve ISO version from plain v* tags (e.g. v2.6)
resolve_iso_version only matched iso/v* pattern; GUI release tags
(v2, v2.1 ... v2.6) were ignored, falling back to the old v1.0.20
annotated tag via resolve_audit_version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v2.6
2026-03-27 23:11:33 +03:00
c22b53a406 feat(boot): set 1920x1080 resolution for framebuffer and GRUB
- Add video=1920x1080 to kernel cmdline (sets fbdev to Full HD)
- Update GRUB gfxmode to 1920x1080 (fallback to 1280x1024,auto)
- Add Xorg Monitor section with 1920x1080 Modeline and preferred mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 23:10:18 +03:00
ff0acc3698 feat(webui): server-side SVG charts + reanimator-chart viewer
Metrics:
- Replace canvas JS charts with server-side SVG via go-analyze/charts
- Add ring buffers (120 samples) for CPU temp and power
- /api/metrics/chart/{name}.svg endpoint serves live SVG, polled every 2s

Dashboard:
- Replace custom renderViewerPage with viewer.RenderHTML() from reanimator/chart submodule
- Mount chart static assets at /chart/static/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v2.5
2026-03-27 23:07:47 +03:00
d50760e7c6 fix(webui): remove emojis from nav, fix metrics chart sizing
- Remove all emojis from sidebar nav and logo (broken on server console fonts)
- Fix canvas chart: use parentElement.getBoundingClientRect() for width,
  set explicit H=120px — fixes empty charts when offsetWidth/Height is 0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:49:09 +03:00
ed4f8be019 fix(webui): services table — show state badge, full status on click
Replace raw systemctl output in table cell with:
- state badge (active/failed/inactive) — click to expand
- full systemctl status in collapsible pre block (max 200px scroll)
Fixes layout explosion from multi-line status text in table.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 22:47:59 +03:00
883592d029 feat(desktop): switch to LightDM for X startup (matches Ubuntu LiveCD)
startx from user shell has /dev/fb0 permission issues and is fragile.
LightDM starts Xorg as root — standard LiveCD approach that works
on server hardware / IPMI KVM with nomodeset + fbdev/vesa.

- Add lightdm package, configure autologin as bee/openbox session
- Add /usr/share/xsessions/openbox.desktop
- Remove startx from .profile (LightDM manages X lifecycle)
- Remove Xwrapper.config needs_root_rights workaround (no longer needed)
- Enable lightdm.service in setup hook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v2.3 v2.4
2026-03-27 22:17:59 +03:00
a6dcaf1c7e fix(desktop): fix X permissions for server hardware (IPMI KVM)
- Add bee user to video,input groups (fixes /dev/fb0 permission denied)
- Add Xwrapper.config: needs_root_rights=yes (X gets hw access)
- Add xserver-xorg-video-vesa as fallback driver
- Remove dead bee-tui chmod from setup hook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v2.2
2026-03-27 22:07:25 +03:00
88727fb590 fix(desktop): don't exec startx — fall back to shell on X failure
If X fails to start, the user gets a working shell prompt instead
of a dead session or autologin loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v2.1
2026-03-27 21:48:26 +03:00
c9f5224c42 feat(console): add netconf command for quick network setup
Interactive script: lists interfaces, DHCP or static IP config.
Shown as hint in tty1 welcome message.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v2
2026-03-27 21:07:14 +03:00
7cb5c02a9b fix(desktop): force fbdev Xorg driver for server framebuffer
Explicit xorg.conf.d config prevents Xorg from trying KMS/DRM
drivers that fail on server hardware with nomodeset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:05:42 +03:00
c1aa3cf491 fix(desktop): start X on vt1 from .profile for IPMI KVM compatibility
startx from autologin shell targets VT1 directly — KVM sees the
graphical UI without VT switching. Remove bee-desktop.service
(systemd-launched X defaults to VT7, invisible on KVM).
Add xserver-xorg-video-fbdev for server AST/VGA framebuffer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 21:03:59 +03:00
f7eb75c57c fix(iso): replace grub-pc/grub-efi-amd64 with -bin variants to fix package conflict
grub-pc and grub-efi-amd64 conflict with each other in Debian 12.
The -bin packages provide the same grub-install binaries without conflict.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 20:12:18 +03:00
004cc4910d feat(webui): replace TUI with full web UI + local openbox desktop
- Remove audit/internal/tui/ (~3000 LOC, bubbletea/lipgloss/reanimator deps)
- Add /api/* REST+SSE endpoints: audit, SAT (nvidia/memory/storage/cpu),
  services, network, export, tools, live metrics stream
- Add async job manager with SSE streaming for long-running operations
- Add platform.SampleLiveMetrics() for live fan/temp/power/GPU polling
- Add multi-page web UI (vanilla JS): Dashboard, Metrics charts, Tests,
  Burn-in, Network, Services, Export, Tools
- Add bee-desktop.service: openbox + Xorg + Chromium opening http://localhost/
- Add openbox/tint2/xorg/xinit/xterm/chromium to ISO package list
- Update .profile, bee.sh, and bible-local docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 19:21:14 +03:00
ed1cceed8c fix(boot): add nomodeset to fix black screen on server VGA/IPMI KVM (AST chip KMS) 2026-03-27 00:13:36 +03:00
9fe9f061f8 fix(nccl-tests): set LIBRARY_PATH so ld finds libnccl.so in nccl cache 2026-03-26 23:59:06 +03:00
837a1fb981 fix(nccl-tests): pin /usr/local/cuda→12.8 symlink, auto-detect gencode by nvcc version 2026-03-26 23:54:07 +03:00
1f43b4e050 fix(nccl-tests): pass NCCL_LIB from nccl cache to fix -lnccl link error 2026-03-26 23:52:25 +03:00
83bbc8a1bc fix(nccl-tests): upgrade to cuda-nvcc-12-8, add sm_100 (Blackwell B100/B200) 2026-03-26 23:51:26 +03:00
896bdb6ee8 fix(nccl-tests): use cuda-nvcc-12-6 to support Ampere/Volta (sm_70..sm_90) 2026-03-26 23:50:36 +03:00
5407c26e25 fix(nccl-tests): CUDA 13.0 supports only sm_90+ (Hopper/H100) 2026-03-26 23:49:45 +03:00
4fddaba9c5 fix(nccl-tests): limit CUDA gencode to sm_70+ (CUDA 13 dropped Pascal) 2026-03-26 23:48:40 +03:00
d2f384b6eb fix(nccl-tests): use plain make instead of non-existent all_reduce_perf target 2026-03-26 23:47:49 +03:00
25f0f30aaf fix(boot): fix black screen on monitor, stop log spam on console
- Add console=tty0 so VGA display gets kernel output (was serial-only)
- Change loglevel=7→3 (debug→errors only)
- Add quiet to suppress verbose kernel boot messages
- journald: ForwardToConsole=no so service logs don't flood tty1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:45:09 +03:00
a57b037a91 feat(installer): add 'Install to disk' in Tools submenu
Copies the live system to a local disk via unsquashfs — no debootstrap,
no network required. Supports UEFI (GPT+EFI) and BIOS (MBR) layouts.

ISO:
- Add squashfs-tools, parted, grub-pc, grub-efi-amd64 to package list
- New overlay script bee-install: partitions, formats, unsquashfs,
  writes fstab, runs grub-install+update-grub in chroot

Go TUI:
- Settings → Tools submenu (Install to disk, Check tools)
- Disk picker screen: lists non-USB, non-boot disks via lsblk
- Confirm screen warns about data loss
- Runs with live progress tail of /tmp/bee-install.log
- platform/install.go: ListInstallDisks, InstallToDisk, findLiveBootDevice

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:35:01 +03:00
5644231f9a feat(nccl): add nccl-tests all_reduce_perf for GPU bandwidth testing
- Dockerfile: install cuda-nvcc-13-0 from NVIDIA repo for compilation
- build-nccl-tests.sh: downloads libnccl-dev for nccl.h, builds all_reduce_perf
- build.sh: runs nccl-tests build, injects binary into /usr/local/bin/
- platform: RunNCCLTests() auto-detects GPU count, runs all_reduce_perf
- TUI: NCCL bandwidth test entry in Burn-in Tests screen [N] hotkey

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:22:19 +03:00
eea98e6d76 feat(dcgm): add NVIDIA DCGM diagnostics, fix KVM console
- Add 9002-nvidia-dcgm.hook.chroot: installs datacenter-gpu-manager
  from NVIDIA apt repo during live-build
- Enable nvidia-dcgm.service in chroot setup hook
- Replace bee-gpu-stress with dcgmi diag (levels 1-4) in NVIDIA SAT
- TUI: replace GPU checkbox + duration UI with DCGM level selection
- Remove console=tty2 from boot params: KVM/VGA now shows tty1
  where bee-tui runs, fixing unresponsive console

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 23:08:12 +03:00
967455194c feat(iso): make toram optional, add 'load to RAM' boot menu entry
Default boot no longer loads ISO to RAM (slow on BMC virtual media).
Separate menu entry added for toram in both GRUB and isolinux.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:45:04 +03:00
79dabf3efb fix(build): link bee-gpu-stress with -lm for sqrt()
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:55:14 +03:00
1336f5b95c fix(cublas): copy include dirs containing files without .h extension
nv/target has no .h suffix; use -type f instead of -name '*.h' to
detect non-empty include directories.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:53:08 +03:00
31486a31c1 fix(cublas): add cuda-cccl package for nv/target header
cuda_fp16.h (included by cublas_api.h) requires <nv/target> from
the CUDA C++ Core Libraries (cuda-cccl-13-0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 20:49:46 +03:00