1.2 KiB
1.2 KiB
Decision: Use NVIDIA proprietary driver, not open kernel modules
Date: 2026-03-05 Status: active
Context
bee needs to collect GPU serial numbers, VBIOS versions, and ECC telemetry via nvidia-smi.
Two options exist: NVIDIA open-gpu-kernel-modules (MIT/GPLv2, GitHub) or the official
proprietary .run installer.
Decision
Use the official proprietary NVIDIA .run installer for both kernel modules and nvidia-smi.
Consequences
- Kernel modules and nvidia-smi come from a single verified source.
- NVIDIA publishes
.sha256sumalongside each installer — download and verify before use. - Driver version pinned in
iso/builder/VERSIONSasNVIDIA_DRIVER_VERSION. - DCGM must track the CUDA user-mode driver major version exposed by
nvidia-smi. - For NVIDIA driver branch
590with CUDA13.x, use DCGM 4 package familydatacenter-gpu-manager-4-cuda13; legacydatacenter-gpu-manager3.x does not provide a working path for this stack. - Build process: download
.run, extract, compilekernel/sources againstlinux-lts-dev. - Modules cached in
dist/nvidia-<version>-<kver>/— rebuild only on version or kernel change. - ISO size increases by ~50MB for .ko files + nvidia-smi.