Fix NVMe SMART status always Unknown; fix GPU count including NVSwitches

nvme-cli emits smart-log counters as JSON strings and uses field names
avail_spare / percent_used instead of the prose names in the NVMe spec.
The nvmeSmartLog struct had int64 fields with wrong JSON tags — Unmarshal
returned an error and the whole health block was skipped, leaving every
NVMe drive with status=Unknown.

Fix: switch all numeric fields to jsonInt64 (already used for lsblk
block sizes) which accepts both bare numbers and quoted strings, and
correct the avail_spare / percent_used tag names.

Also fix validateIsVendorGPU for NVIDIA: previously counted any NVIDIA
PCIe device (including NVSwitch bridges) as a GPU, producing wrong
estimates (12 instead of 8 on an HGX H100 system). Now requires
device_class to be videocontroller or processingaccelerator, matching
the existing AMD filter logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-04 18:06:32 +03:00
parent 74a3c65f64
commit e169a7722c
3 changed files with 89 additions and 25 deletions

View File

@@ -642,7 +642,9 @@ func validateIsVendorGPU(dev schema.HardwarePCIeDevice, vendor string) bool {
}
switch vendor {
case "nvidia":
return strings.Contains(model, "nvidia") || strings.Contains(manufacturer, "nvidia")
isNVIDIAVendor := strings.Contains(model, "nvidia") || strings.Contains(manufacturer, "nvidia")
isGPUClass := class == "videocontroller" || class == "processingaccelerator" || class == "displaycontroller"
return isNVIDIAVendor && isGPUClass
case "amd":
isGPUClass := class == "processingaccelerator" || class == "displaycontroller" || class == "videocontroller"
isAMDVendor := strings.Contains(manufacturer, "advanced micro devices") || strings.Contains(manufacturer, "amd") || strings.Contains(manufacturer, "ati")