bee/audit/internal at aa284ae7542748d8d59b0c0951a4fbe47a89f905 - bee

Files

Mikhail Chusavitin 679aeb9947 Run NVIDIA DCGM diag tests on all selected GPUs simultaneously

targeted_stress, targeted_power, and the Level 2/3 diag were dispatched
one GPU at a time from the UI, turning a single dcgmi command into 8
sequential ~350–450 s runs. DCGM supports -i with a comma-separated list
of GPU indices and runs the diagnostic on all of them in parallel.

Move nvidia, nvidia-targeted-stress, nvidia-targeted-power into
nvidiaAllGPUTargets so expandSATTarget passes all selected indices in one
API call. Simplify runNvidiaValidateSet to match runNvidiaFabricValidate.
Update sat.go constants and page_validate.go estimates to reflect all-GPU
simultaneous execution (remove n× multiplier from total time estimates).

Stress test on 8-GPU system: ~5.3 h → ~2.5 h.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-20 11:53:25 +03:00

app

Globalize autotuned system power source

2026-04-20 07:02:12 +03:00

collector

Fix PSU slot regex: match MSI underscore format PSU1_POWER_IN

2026-04-19 19:03:02 +03:00

platform

Run NVIDIA DCGM diag tests on all selected GPUs simultaneously

2026-04-20 11:53:25 +03:00

runtimeenv

Refactor bee CLI and LiveCD integration