bee/audit/internal at d1a22d782dc6701e13b9badd55f47acbde5903c1 - bee

Files

Michael Chus d1a22d782d Move power diag tests to validate/stress; fix GPU burn power saturation

- bee-gpu-stress.c: remove per-wave cuCtxSynchronize barrier in both
  cuBLASLt and PTX hot loops; sync at most once/sec so the GPU queue
  stays continuously full — eliminates the CPU↔GPU ping-pong that
  prevented reaching full TDP
- sat_fan_stress.go: default SizeMB 0 (auto = 95% VRAM) instead of
  hardcoded 64 MB; tiny matrices caused <0.1 ms kernels where CPU
  re-queue overhead dominated
- pages.go: move nvidia-targeted-power and nvidia-pulse from Burn →
  Validate stress section alongside nvidia-targeted-stress; these are
  DCGM pass/fail diagnostics, not sustained burn loads; remove the
  Power Delivery / Power Budget card from Burn entirely

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-08 00:13:52 +03:00

app

Add NVIDIA self-heal tools and per-GPU SAT status

2026-04-07 20:20:05 +03:00

collector

Add NVIDIA self-heal tools and per-GPU SAT status

2026-04-07 20:20:05 +03:00

platform

Move power diag tests to validate/stress; fix GPU burn power saturation

2026-04-08 00:13:52 +03:00

runtimeenv

Refactor bee CLI and LiveCD integration