• v8.2 f87461ee4a

    Detect thermal throttle with fans below 100% as cooling misconfiguration

    mchus released this 2026-04-14 21:44:57 +03:00 | 177 commits to main since this release

    During power calibration: if a thermal throttle (sw_thermal/hw_thermal)
    causes ≥20% clock drop while server fans are below 98% P95 duty cycle,
    record a CoolingWarning on the GPU result and emit an actionable finding
    telling the operator to rerun with fans manually fixed at 100%.

    During steady-state benchmark: same signal enriches the existing
    thermal_limited finding with fan duty cycle and clock drift values.

    Covers both the main benchmark (buildBenchmarkFindings) and the power
    bench (NvidiaPowerBenchResult.Findings).

    Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

    Downloads