targeted_stress, targeted_power, and the Level 2/3 diag were dispatched
one GPU at a time from the UI, turning a single dcgmi command into 8
sequential ~350–450 s runs. DCGM supports -i with a comma-separated list
of GPU indices and runs the diagnostic on all of them in parallel.
Move nvidia, nvidia-targeted-stress, nvidia-targeted-power into
nvidiaAllGPUTargets so expandSATTarget passes all selected indices in one
API call. Simplify runNvidiaValidateSet to match runNvidiaFabricValidate.
Update sat.go constants and page_validate.go estimates to reflect all-GPU
simultaneous execution (remove n× multiplier from total time estimates).
Stress test on 8-GPU system: ~5.3 h → ~2.5 h.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 11:53:25 +03:00
2 changed files with 30 additions and 83 deletions
`Runs a controlled NVIDIA DCGM load to check stability under moderate stress.`,
`Runs a controlled NVIDIA DCGM load to check stability under moderate stress.`,
`<code>dcgmi diag targeted_stress</code>`,
`<code>dcgmi diag targeted_stress</code>`,
func()string{
"Skipped in Validate. Stress: "+validateFmtDur(platform.SATEstimatedNvidiaTargetedStressSec)+` (all GPUs simultaneously).<p id="sat-ts-mode-hint" style="color:var(--warn-fg);font-size:12px;margin:8px 0 0">Only runs in Stress mode. Switch mode above to enable in Run All.</p>`,
returns+`<p id="sat-ts-mode-hint" style="color:var(--warn-fg);font-size:12px;margin:8px 0 0">Only runs in Stress mode. Switch mode above to enable in Run All.</p>`
`Checks that the GPU can sustain its declared power delivery envelope. Pass/fail determined by DCGM.`,
`Checks that the GPU can sustain its declared power delivery envelope. Pass/fail determined by DCGM.`,
`<code>dcgmi diag targeted_power</code>`,
`<code>dcgmi diag targeted_power</code>`,
func()string{
"Skipped in Validate. Stress: "+validateFmtDur(platform.SATEstimatedNvidiaTargetedPowerSec)+` (all GPUs simultaneously).<p id="sat-tp-mode-hint" style="color:var(--warn-fg);font-size:12px;margin:8px 0 0">Only runs in Stress mode. Switch mode above to enable in Run All.</p>`,
returns+`<p id="sat-tp-mode-hint" style="color:var(--warn-fg);font-size:12px;margin:8px 0 0">Only runs in Stress mode. Switch mode above to enable in Run All.</p>`
}(),
))+
))+
`</div>`+
`</div>`+
`<div id="sat-card-nvidia-pulse">`+
`<div id="sat-card-nvidia-pulse">`+
@@ -382,8 +360,8 @@ function runSATWithOverrides(target, overrides) {
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.