Fix pulse_test: run all GPUs simultaneously, not per-GPU

pulse_test is a PSU/power-delivery test, not a per-GPU compute test.
Its purpose is to synchronously pulse all GPUs between idle and full
load to create worst-case transient spikes on the power supply.
Running it one GPU at a time would produce a fraction of the PSU load
and miss any PSU-level failures.

- Move nvidia-pulse from nvidiaPerGPUTargets to nvidiaAllGPUTargets
  (same dispatch path as NCCL and NVBandwidth)
- Change card onclick to runNvidiaFabricValidate (all selected GPUs at once)
- Update card title to "NVIDIA PSU Pulse Test" and description to
  explain why synchronous multi-GPU execution is required

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-08 00:19:11 +03:00
parent b9be93c213
commit 6937a4c6ec

View File

@@ -1112,11 +1112,11 @@ func renderValidate(opts HandlerOptions) string {
)) + )) +
`</div>` + `</div>` +
`<div id="sat-card-nvidia-pulse">` + `<div id="sat-card-nvidia-pulse">` +
renderSATCard("nvidia-pulse", "NVIDIA Pulse Test", "runNvidiaValidateSet('nvidia-pulse')", "", renderValidateCardBody( renderSATCard("nvidia-pulse", "NVIDIA PSU Pulse Test", "runNvidiaFabricValidate('nvidia-pulse')", "", renderValidateCardBody(
inv.NVIDIA, inv.NVIDIA,
`Verifies GPU transient power response using DCGM pulse load. Pass/fail determined by DCGM.`, `Tests power supply transient response by pulsing all GPUs simultaneously between idle and full load. Synchronous pulses across all GPUs create worst-case PSU load spikes — running per-GPU would miss PSU-level failures.`,
`<code>dcgmi diag pulse_test</code>`, `<code>dcgmi diag pulse_test</code>`,
`Skipped in Validate mode. Runs in Stress mode only. Runs one GPU at a time.<p id="sat-pt-mode-hint" style="color:var(--warn-fg);font-size:12px;margin:8px 0 0">Only runs in Stress mode. Switch mode above to enable in Run All.</p>`, `Skipped in Validate mode. Runs in Stress mode only. Runs all selected GPUs simultaneously — synchronous pulsing is required to stress the PSU.<p id="sat-pt-mode-hint" style="color:var(--warn-fg);font-size:12px;margin:8px 0 0">Only runs in Stress mode. Switch mode above to enable in Run All.</p>`,
)) + )) +
`</div>` + `</div>` +
`<div id="sat-card-nvidia-interconnect">` + `<div id="sat-card-nvidia-interconnect">` +
@@ -1321,8 +1321,9 @@ function runSATWithOverrides(target, overrides) {
return enqueueSATTarget(target, overrides) return enqueueSATTarget(target, overrides)
.then(d => streamSATTask(d.task_id, title, false)); .then(d => streamSATTask(d.task_id, title, false));
} }
const nvidiaPerGPUTargets = ['nvidia', 'nvidia-targeted-stress', 'nvidia-targeted-power', 'nvidia-pulse']; const nvidiaPerGPUTargets = ['nvidia', 'nvidia-targeted-stress', 'nvidia-targeted-power'];
const nvidiaAllGPUTargets = ['nvidia-interconnect', 'nvidia-bandwidth']; // pulse_test and fabric tests run on all selected GPUs simultaneously
const nvidiaAllGPUTargets = ['nvidia-pulse', 'nvidia-interconnect', 'nvidia-bandwidth'];
function expandSATTarget(target) { function expandSATTarget(target) {
if (nvidiaAllGPUTargets.indexOf(target) >= 0) { if (nvidiaAllGPUTargets.indexOf(target) >= 0) {
const selected = satSelectedGPUIndices(); const selected = satSelectedGPUIndices();