26 lines
1.2 KiB
Markdown
26 lines
1.2 KiB
Markdown
# Decision: Use NVIDIA proprietary driver, not open kernel modules
|
|
|
|
**Date:** 2026-03-05
|
|
**Status:** active
|
|
|
|
## Context
|
|
|
|
bee needs to collect GPU serial numbers, VBIOS versions, and ECC telemetry via `nvidia-smi`.
|
|
Two options exist: NVIDIA open-gpu-kernel-modules (MIT/GPLv2, GitHub) or the official
|
|
proprietary `.run` installer.
|
|
|
|
## Decision
|
|
|
|
Use the official proprietary NVIDIA `.run` installer for both kernel modules and `nvidia-smi`.
|
|
|
|
## Consequences
|
|
|
|
- Kernel modules and nvidia-smi come from a single verified source.
|
|
- NVIDIA publishes `.sha256sum` alongside each installer — download and verify before use.
|
|
- Driver version pinned in `iso/builder/VERSIONS` as `NVIDIA_DRIVER_VERSION`.
|
|
- DCGM must track the CUDA user-mode driver major version exposed by `nvidia-smi`.
|
|
- For NVIDIA driver branch `590` with CUDA `13.x`, use DCGM 4 package family `datacenter-gpu-manager-4-cuda13`; legacy `datacenter-gpu-manager` 3.x does not provide a working path for this stack.
|
|
- Build process: download `.run`, extract, compile `kernel/` sources against `linux-lts-dev`.
|
|
- Modules cached in `dist/nvidia-<version>-<kver>/` — rebuild only on version or kernel change.
|
|
- ISO size increases by ~50MB for .ko files + nvidia-smi.
|