nvtop is no longer shown during NVIDIA SAT runs.
[o] Open nvtop shortcut also removed from the running screen.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Apply the same pattern as NVIDIA SAT: launch nvtop via tea.ExecProcess
so it occupies the full terminal as a live GPU chart (temp, power, fan,
utilisation lines) while the stress test runs in the background.
- Add screenGPUStressRunning screen + dedicated running/render handlers
- startGPUStressTest: tea.Batch(stress goroutine, tea.ExecProcess(nvtop))
- [o] reopen nvtop at any time; [a] abort (cancels context)
- Graceful degradation: test still runs if nvtop is not on PATH
- gpuStressDoneMsg routes result to screenOutput on completion
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two-phase GPU thermal cycling test with per-second telemetry:
- Phases: baseline → load1 → pause (no cooldown) → load2 → cooldown
- Monitors: fan RPM (ipmitool sdr), CPU/server temps (ipmitool/sensors),
system power (ipmitool dcmi), GPU temp/power/usage/clock/throttle (nvidia-smi)
- Detects throttling via clocks_throttle_reasons.active bitmask
- Measures fan response lag from load start (validates case-04 ~2s lag)
- Exports metrics.csv (wide format, one row/sec) and fan-sensors.csv (long format)
- TUI: adds [F] Fan Stress Test to Health Check screen with Quick/Standard/Express modes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- New tui/sat_progress.go: polls {DefaultSATBaseDir}/{prefix}-*/verbose.log every 300ms and parses completed/in-progress steps
- Busy screen now shows each step as PASS lscpu (234ms) / FAIL stress-ng (60.0s) / ... sensors-after instead of just "Working..."
2. Test results shown on screen (instead of just "Archive written to /path")
- RunCPUAcceptancePackResult, RunMemoryAcceptancePackResult, RunStorageAcceptancePackResult, RunAMDAcceptancePackResult now read summary.txt from the run directory and return a formatted per-step result:
Run: 2025-03-25T10:00:00Z
PASS lscpu
PASS sensors-before
FAIL stress-ng
PASS sensors-after
Overall: FAILED (ok=3 failed=1)
3. AMD GPU SAT with auto-detection
- platform.System.DetectGPUVendor(): checks /dev/nvidia0 → "nvidia", /dev/kfd → "amd"
- platform.System.RunAMDAcceptancePack(): runs rocm-smi, rocm-smi --showallinfo, dmidecode
- GPU SAT (G key / GPU row enter) automatically routes to AMD or NVIDIA based on detected vendor
- "Run All" also auto-detects vendor
4. Panel detail view
- GPU detail now shows the most recent (NVIDIA or AMD) SAT result, whichever is newer
- All SAT detail views use the same human-readable formatSATDetail format
--start is not a valid nvme-cli flag; correct syntax is -s 1 (short test).
Add --wait so the command blocks until the test completes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
isGPUDevice matched all AMD vendor PCIe devices (SATA, crypto coprocessors,
PCIe dummies) because of a broad strings.Contains(vendor,"amd") check.
Remove it — AMD Instinct/Radeon GPUs are caught by ProcessingAccelerator /
DisplayController class. Also exclude ASPEED (BMC VGA adapter).
Add clear before bee-tui to avoid dirty terminal output.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BMC:
- collector/board.go: collectBMCFirmware() via ipmitool mc info, graceful skip if /dev/ipmi0 absent
- collector/collector.go: append BMC firmware record to snap.Firmware
- app/panel.go: show BMC version in TUI right-panel header alongside BIOS
CPU SAT:
- platform/sat.go: RunCPUAcceptancePack(baseDir, durationSec) — lscpu + sensors before/after + stress-ng
- app/app.go: RunCPUAcceptancePack + RunCPUAcceptancePackResult methods, satRunner interface updated
- app/panel.go: CPU row now reads real PASS/FAIL from cpu-*/summary.txt via satStatuses(); cpuDetailResult shows last SAT summary + audit data
- tui/types.go: actionRunCPUSAT, confirmBody for CPU test with mode label
- tui/screen_health_check.go: hcCPUDurations [60,300,900]s; hcRunSingle(CPU)→confirm screen; executeRunAll uses RunCPUAcceptancePackResult
- tui/forms.go: actionRunCPUSAT → RunCPUAcceptancePackResult with mode duration
- cmd/bee/main.go: bee sat cpu [--duration N] subcommand
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace 12-item flat menu with 4-item main menu: Health Check, Export support bundle, Settings, Exit
- Add Health Check screen (Lenovo-style): per-component checkboxes (GPU/MEM/DISK/CPU), Quick/Standard/Express modes, Run All, letter hotkeys G/M/S/C/R/A/1/2/3
- Add two-column main screen: left = menu, right = hardware panel with colored PASS/FAIL/CANCEL/N/A status per component; Tab/→ switches focus, Enter opens component detail
- Add app.LoadHardwarePanel() + ComponentDetailResult() reading audit JSON and SAT summary.txt files
- Move Network/Services/audit actions into Settings submenu
- Export: support bundle only (remove separate audit JSON export)
- Delete screen_acceptance.go; add screen_health_check.go, screen_settings.go, app/panel.go
- Add BMC + CPU stress-ng tests to backlog
- Update bible submodule
- Rewrite tui_test.go for new screen/action structure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TUI: duration presets (10m/1h/8h/24h), GPU multi-select checkboxes
- nvtop launched concurrently with SAT via tea.ExecProcess; can reopen or abort
- GPU metrics collected per-second during bee-gpu-stress (temp/usage/power/clock)
- Outputs: gpu-metrics.csv, gpu-metrics.html (offline SVG), gpu-metrics-term.txt
- Terminal chart: asciigraph-style line chart with box-drawing chars and ANSI colours
- AUDIT_VERSION bumped 0.1.1 → 1.0.0; nvtop added to ISO package list
- runtime-flows.md updated with full NVIDIA SAT TUI flow documentation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>