bee/audit/internal/platform at f4a19c0a00466b12c82e1fa00ad016a7b240a1bd - bee - MCHUS git PRO

reanimator/bee

Files

History

Michael Chus f4a19c0a00 Add power calibration step to benchmark; fix PowerSustainScore reference

Before the per-GPU compute phases, run `dcgmi diag -r targeted_power`
for 45 s while collecting nvidia-smi power metrics in parallel.
The p95 power per GPU is stored as calibrated_peak_power_w and used
as the denominator for PowerSustainScore instead of the hardware default
limit, which bee-gpu-burn cannot reach because it is compute-only.

Fallback chain: calibrated peak → default limit → enforced limit.
If dcgmi is absent or the run fails, calibration is skipped silently.

Adjust composite score weights to match the new honest power reference:
  base 0.35, thermal 0.25, stability 0.25, power 0.15, NCCL bonus 0.10.
Power weight reduced (0.20→0.15) because even with a calibrated reference
bee-gpu-burn reaches ~60-75% of TDP by design (no concurrent mem stress).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-12 22:06:46 +03:00

..

benchmark_report.go

Fix GPU model propagation, export filenames, PSU/service status, and chart perf

2026-04-11 10:05:27 +03:00

benchmark_test.go

Fix GPU clock lock normalization for Blackwell (clocks.max.* unsupported)

2026-04-12 13:33:54 +03:00

benchmark_types.go

Add power calibration step to benchmark; fix PowerSustainScore reference

2026-04-12 22:06:46 +03:00

benchmark.go

Add power calibration step to benchmark; fix PowerSustainScore reference

2026-04-12 22:06:46 +03:00

error_patterns.go

feat(watchdog): hardware error monitor + unified component status store

2026-04-02 19:20:59 +03:00

export_test.go

iso: improve burn-in, export, and live boot

2026-03-26 18:56:19 +03:00

export.go

iso: improve burn-in, export, and live boot

2026-03-26 18:56:19 +03:00

gpu_metrics.go

UI: amber accents, smaller wallpaper logo, new support bundle name, drop display resolution

2026-04-08 21:37:01 +03:00

install_to_ram_linux.go

Improve install-to-RAM verification for ISO boots

2026-04-07 20:21:06 +03:00

install_to_ram_other.go

Improve install-to-RAM verification for ISO boots

2026-04-07 20:21:06 +03:00

install_to_ram_test.go

Improve install-to-RAM verification for ISO boots

2026-04-07 20:21:06 +03:00

install_to_ram.go

Improve install-to-RAM verification for ISO boots

2026-04-07 20:21:06 +03:00

install.go

feat(webui): show current boot source

2026-04-02 15:36:32 +03:00

kill_workers.go

Fix nvidia-targeted-stress failing with DCGM_ST_IN_USE (-34)

2026-04-05 20:21:36 +03:00

live_metrics_test.go

fix(metrics): stabilize cpu and power sampling

2026-04-01 09:40:42 +03:00

live_metrics.go

fix(metrics): stabilize cpu and power sampling

2026-04-01 09:40:42 +03:00

network_test.go

release: v3.1

2026-03-28 22:51:36 +03:00

network.go

fix(network): strip linkdown/dead/onlink flags when restoring routes

2026-03-29 10:39:16 +03:00

nvidia_stress.go

Add staged NVIDIA burn ramp-up mode

2026-04-09 15:21:14 +03:00

parse.go

Refactor bee CLI and LiveCD integration

2026-03-13 16:52:16 +03:00

platform_stress_test.go

fix(stress): keep platform burn responsive under load

2026-03-31 22:28:26 +03:00

platform_stress.go

Fix GPU model propagation, export filenames, PSU/service status, and chart perf

2026-04-11 10:05:27 +03:00

runtime.go

Add USB export drive and LiveCD-in-RAM checks to Runtime Health

2026-04-11 10:05:27 +03:00

sat_fan_stress_test.go

fix(metrics): stabilize cpu and power sampling

2026-04-01 09:40:42 +03:00

sat_fan_stress.go

Fix GPU model propagation, export filenames, PSU/service status, and chart perf

2026-04-11 10:05:27 +03:00

sat_test.go

Refactor validate modes, fix benchmark report and IPMI power

2026-04-08 00:42:12 +03:00

sat.go

Fix GPU model propagation, export filenames, PSU/service status, and chart perf

2026-04-11 10:05:27 +03:00

services.go

Fix service control buttons: sudo, real error output, UX feedback

2026-04-05 20:25:41 +03:00

system_test.go

Refactor bee CLI and LiveCD integration

2026-03-13 16:52:16 +03:00

techdump_test.go

Tighten support bundles and fix AMD runtime checks

2026-03-25 19:35:25 +03:00

techdump.go

Warn on PCIe link speed degradation and collect lspci -vvv in techdump

2026-04-12 12:42:17 +03:00

tools.go

Refactor bee CLI and LiveCD integration

2026-03-13 16:52:16 +03:00

types_test.go

WIP: checkpoint current tree

2026-04-05 12:05:00 +03:00

types.go

Add staged NVIDIA burn ramp-up mode

2026-04-09 15:21:14 +03:00