- sat.go: DetectGPUVendor lspci fallback now checks GPU device classes
([0300]/[0302]/[0380]) per line instead of scanning the whole output for
vendor name; AMD EPYC servers have dozens of AMD-branded PCIe entries
(Root Complex, IOMMU, Host Bridge) that were triggering the old check
- blackbox.go: fix deadlock in finishCycle — it held w.mu while calling
persistState(), which acquires rt.mu then re-acquires w.mu inside
persistStateLocked(); now w.mu is released before persistState()
- build.sh: remove NVIDIA-specific overlay files (bee-gpu-burn,
bee-john-gpu-stress, bee-nccl-gpu-stress, bee-nvidia-recover,
bee-dcgmproftester-staggered, bee-check-nvswitch,
nvidia-fabricmanager.service.d/) for non-nvidia build variants
- bee-selfheal: gate NVIDIA recovery on BEE_GPU_VENDOR=nvidia so the
script does not attempt to restart bee-nvidia.service on NOGPU builds
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>