reanimator/bee - bee - MCHUS git PRO

Go to file

Michael Chus f4a19c0a00 Add power calibration step to benchmark; fix PowerSustainScore reference

Before the per-GPU compute phases, run `dcgmi diag -r targeted_power`
for 45 s while collecting nvidia-smi power metrics in parallel.
The p95 power per GPU is stored as calibrated_peak_power_w and used
as the denominator for PowerSustainScore instead of the hardware default
limit, which bee-gpu-burn cannot reach because it is compute-only.

Fallback chain: calibrated peak → default limit → enforced limit.
If dcgmi is absent or the run fails, calibration is skipped silently.

Adjust composite score weights to match the new honest power reference:
  base 0.35, thermal 0.25, stability 0.25, power 0.15, NCCL bonus 0.10.
Power weight reduced (0.20→0.15) because even with a calibrated reference
bee-gpu-burn reaches ~60-75% of TDP by design (no concurrent mem stress).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-12 22:06:46 +03:00

audit

Add power calibration step to benchmark; fix PowerSustainScore reference

2026-04-12 22:06:46 +03:00

bible @ 1d89a4918e

Update bible submodule

2026-04-08 07:14:31 +03:00

bible-local

Fix GPU model propagation, export filenames, PSU/service status, and chart perf

2026-04-11 10:05:27 +03:00

internal

chore: commit pending repo changes