Fix Runtime Health criteria: network, services, nvidia-fabricmanager

Network: green if at least one interface has IPv4 (drop PARTIAL state).

Bee Services: treat inactive as OK — oneshot services (bee-sshsetup,
bee-preflight, bee-network, bee-audit, etc.) complete successfully and
exit to inactive; only failed is a real problem.

nvidia-fabricmanager: add ExecCondition=bee-check-nvswitch drop-in so
the service is silently skipped (inactive, not failed) on systems
without NVSwitch hardware (e.g. H200 NVL with direct NVLink, no
NVSwitch chips). bee-check-nvswitch detects NVSwitch via lspci
(vendor 10de, class 0680).

bee-nvidia.service: add ConditionPathExists=/usr/local/bin/bee-nvidia-load
so the unit is a no-op if somehow present in a non-nvidia build.

bee-boot-status: read /etc/bee-gpu-vendor and exclude bee-nvidia from
CRITICAL/ALL on non-nvidia builds, preventing boot hang if the unit
is unexpectedly present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-14 05:20:25 +03:00
parent dc07580adc
commit 4f6579e040
7 changed files with 27 additions and 21 deletions

View File

@@ -69,6 +69,7 @@ chmod +x /usr/local/bin/bee-boot-status 2>/dev/null || true
chmod +x /usr/local/bin/bee-install 2>/dev/null || true
chmod +x /usr/local/bin/bee-gui-gate 2>/dev/null || true
chmod +x /usr/local/bin/bee-remount-medium 2>/dev/null || true
chmod +x /usr/local/bin/bee-check-nvswitch 2>/dev/null || true
if [ "$GPU_VENDOR" = "nvidia" ]; then
chmod +x /usr/local/bin/bee-nvidia-load 2>/dev/null || true
chmod +x /usr/local/bin/bee-gpu-burn 2>/dev/null || true

View File

@@ -2,6 +2,8 @@
Description=Bee: load NVIDIA kernel modules and create device nodes
After=local-fs.target udev.service bee-blackbox.service
Before=bee-audit.service
# Skip silently if bee-nvidia-load is absent (non-nvidia builds).
ConditionPathExists=/usr/local/bin/bee-nvidia-load
[Service]
Type=oneshot

View File

@@ -0,0 +1,4 @@
[Service]
# Skip fabricmanager on systems without NVSwitch hardware.
# ExecCondition exits 1-254 → unit is silently skipped (inactive, not failed).
ExecCondition=/usr/local/bin/bee-check-nvswitch

View File

@@ -3,8 +3,14 @@
# Shows live service status until all bee services are done or failed,
# then exits so getty can show the login prompt.
CRITICAL="bee-preflight bee-nvidia bee-audit"
ALL="bee-sshsetup ssh bee-network bee-nvidia bee-preflight bee-audit bee-web"
GPU_VENDOR="$(cat /etc/bee-gpu-vendor 2>/dev/null || echo nvidia)"
if [ "$GPU_VENDOR" = "nvidia" ]; then
CRITICAL="bee-preflight bee-nvidia bee-audit"
ALL="bee-sshsetup ssh bee-network bee-nvidia bee-preflight bee-audit bee-web"
else
CRITICAL="bee-preflight bee-audit"
ALL="bee-sshsetup ssh bee-network bee-preflight bee-audit bee-web"
fi
svc_state() { systemctl is-active "$1.service" 2>/dev/null || echo "inactive"; }

View File

@@ -0,0 +1,4 @@
#!/bin/sh
# Exit 0 if NVSwitch hardware is detected; exit 1 to skip fabricmanager on non-NVSwitch systems.
# NVSwitch appears in lspci as vendor 10de, class 0680 (Bridge, Other).
lspci -Dn 2>/dev/null | awk '$2 == "0680:" && $3 ~ /^10de:/ { found=1; exit } END { exit(found ? 0 : 1) }'