Files
bee/bible-local/decisions/2026-03-05-nvidia-proprietary-driver.md
2026-03-31 11:15:15 +03:00

1.2 KiB

Decision: Use NVIDIA proprietary driver, not open kernel modules

Date: 2026-03-05 Status: active

Context

bee needs to collect GPU serial numbers, VBIOS versions, and ECC telemetry via nvidia-smi. Two options exist: NVIDIA open-gpu-kernel-modules (MIT/GPLv2, GitHub) or the official proprietary .run installer.

Decision

Use the official proprietary NVIDIA .run installer for both kernel modules and nvidia-smi.

Consequences

  • Kernel modules and nvidia-smi come from a single verified source.
  • NVIDIA publishes .sha256sum alongside each installer — download and verify before use.
  • Driver version pinned in iso/builder/VERSIONS as NVIDIA_DRIVER_VERSION.
  • DCGM must track the CUDA user-mode driver major version exposed by nvidia-smi.
  • For NVIDIA driver branch 590 with CUDA 13.x, use DCGM 4 package family datacenter-gpu-manager-4-cuda13; legacy datacenter-gpu-manager 3.x does not provide a working path for this stack.
  • Build process: download .run, extract, compile kernel/ sources against linux-lts-dev.
  • Modules cached in dist/nvidia-<version>-<kver>/ — rebuild only on version or kernel change.
  • ISO size increases by ~50MB for .ko files + nvidia-smi.