fix: create /dev/nvidia* nodes in bee-nvidia — mdev has no NVIDIA rules

Alpine uses mdev which has no rules for NVIDIA devices. Without /dev/nvidiactl
and /dev/nvidia{0-7}, nvidia-smi returns NVML_ERROR_LIBRARY_NOT_FOUND (exit 12)
even though kernel modules are loaded and libraries are present.

Fix: after insmod, read major numbers from /proc/devices and mknod the required
character devices (/dev/nvidiactl, /dev/nvidia{0-7}, /dev/nvidia-uvm).

Add /dev/nvidia* node checks to smoketest for earlier failure detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mikhail Chusavitin
2026-03-08 14:42:18 +03:00
parent 98f14b21c1
commit 5db3c3c74c
2 changed files with 31 additions and 0 deletions

View File

@@ -62,6 +62,16 @@ for mod in nvidia nvidia_modeset nvidia_uvm; do
fi
done
echo ""
echo "-- NVIDIA device nodes --"
for dev in nvidiactl nvidia0 nvidia-uvm; do
if [ -e "/dev/$dev" ]; then
ok "/dev/$dev exists"
else
fail "/dev/$dev missing — nvidia-smi will return NVML_ERROR_LIBRARY_NOT_FOUND"
fi
done
echo ""
echo "-- nvidia-smi --"
if PATH="/usr/local/bin:$PATH" command -v nvidia-smi >/dev/null 2>&1; then