Files

Michael Chus ba16021cdb Fix GPU model propagation, export filenames, PSU/service status, and chart perf

- nvidia.go: add Name field to nvidiaGPUInfo, include model name in
  nvidia-smi query, set dev.Model in enrichPCIeWithNVIDIAData
- pages.go: fix duplicate GPU count in validate card summary (4 GPU: 4 x …
  → 4 x … GPU); fix PSU UNKNOWN fallback from hw.PowerSupplies; treat
  activating/deactivating/reloading service states as OK in Runtime Health
- support_bundle.go: use "150405" time format (no colons) for exFAT compat
- sat.go / benchmark.go / platform_stress.go / sat_fan_stress.go: remove
  .tar.gz archive creation from export dirs — export packs everything itself
- charts_svg.go: add min-max downsampling (1400 pt cap) for SVG chart perf
- benchmark_report.go / sat.go: normalize GPU fallback to "Unknown GPU"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-11 10:05:27 +03:00

4.6 KiB

Raw Blame History

GPU Model Name Propagation

How GPU model names are detected, stored, and displayed throughout the project.

Detection Sources

There are two separate pipelines for GPU model names — they use different structs and don't share state.

Pipeline A — Live / SAT (nvidia-smi query at runtime)

File: audit/internal/platform/sat.go

ListNvidiaGPUs() → NvidiaGPU.Name (field: name, from nvidia-smi --query-gpu=index,name,...)
ListNvidiaGPUStatuses() → NvidiaGPUStatus.Name
Used by: GPU selection UI, live metrics labels, burn/stress test logic

Pipeline B — Benchmark results

File: audit/internal/platform/benchmark.go, line 124

queryBenchmarkGPUInfo(selected) → benchmarkGPUInfo.Name
Stored in BenchmarkGPUResult.Name (json:"name,omitempty")
Used by: benchmark history table, benchmark report

Pipeline C — Hardware audit JSON (PCIe schema)

File: audit/internal/schema/hardware.go

HardwarePCIeDevice.Model *string (field name is Model, not Name)
For AMD GPUs: populated by audit/internal/collector/amdgpu.go from info.Product
For NVIDIA GPUs: NOT populated by audit/internal/collector/nvidia.go — the NVIDIA enricher sets telemetry/status but skips the Model field
Used by: hardware summary page (hwDescribeGPU in pages.go:487)

Key Inconsistency: NVIDIA PCIe Model is Never Set

audit/internal/collector/nvidia.go — enrichPCIeWithNVIDIAData() enriches NVIDIA PCIe devices with telemetry and status but does not populate HardwarePCIeDevice.Model.

This means:

Hardware summary page shows "Unknown GPU" for all NVIDIA devices (falls back at pages.go:486)
AMD GPUs do have their model populated

The fix would be: copy gpu.Name from the SAT pipeline into dev.Model inside enrichPCIeWithNVIDIAData.

Benchmark History "Unknown GPU" Issue

Symptom: Benchmark history table shows "GPU #N — Unknown GPU" columns instead of real GPU model names.

Root cause: BenchmarkGPUResult.Name has tag json:"name,omitempty". If queryBenchmarkGPUInfo() fails (warns at benchmark.go:126) or returns empty names, the Name field is never set and is omitted from JSON. Loaded results have empty Name → falls back to "Unknown GPU" at pages.go:2226, 2237.

This happens for:

Older result files saved before the Name field was added
Runs where nvidia-smi query failed before the benchmark started

Fallback Strings — Current State

Location	File	Fallback string
Hardware summary (PCIe)	`pages.go:486`	`"Unknown GPU"`
Benchmark report summary	`benchmark_report.go:43`	`"Unknown GPU"`
Benchmark report scorecard	`benchmark_report.go:93`	`"Unknown"` ← inconsistent
Benchmark report detail	`benchmark_report.go:122`	`"Unknown GPU"`
Benchmark history per-GPU col	`pages.go:2226`	`"Unknown GPU"`
Benchmark history parallel col	`pages.go:2237`	`"Unknown GPU"`
SAT status file write	`sat.go:922`	`"unknown"` ← lowercase, inconsistent
GPU selection API	`api.go:163`	`"GPU N"` (no "Unknown")

Rule: all UI fallbacks should use "Unknown GPU". The two outliers are benchmark_report.go:93 ("Unknown") and sat.go:922 ("unknown").

GPU Selection UI

File: audit/internal/webui/pages.go

Source: GET /api/gpus → api.go → ListNvidiaGPUs() → live nvidia-smi
Render: 'GPU ' + gpu.index + ' — ' + gpu.name + ' · ' + mem
Fallback: gpu.name || 'GPU ' + idx (JS, line ~1432)

This always shows the correct model because it queries nvidia-smi live. It is not connected to benchmark result data.

Data Flow Summary

nvidia-smi (live)
  └─ ListNvidiaGPUs() → NvidiaGPU.Name
       ├─ GPU selection UI (always correct)
       ├─ Live metrics labels (charts_svg.go)
       └─ SAT/burn status file (sat.go)

nvidia-smi (at benchmark start)
  └─ queryBenchmarkGPUInfo() → benchmarkGPUInfo.Name
       └─ BenchmarkGPUResult.Name (json:"name,omitempty")
            ├─ Benchmark report
            └─ Benchmark history table columns

nvidia-smi / lspci (audit collection)
  └─ HardwarePCIeDevice.Model (NVIDIA: NOT populated; AMD: populated)
       └─ Hardware summary page hwDescribeGPU()

What Needs Fixing

NVIDIA PCIe Model — enrichPCIeWithNVIDIAData() should set dev.Model = &gpu.Name
Fallback consistency — benchmark_report.go:93 should say "Unknown GPU" not "Unknown"; sat.go:922 should say "Unknown GPU" not "unknown"
Old benchmark JSONs — no fix possible for already-saved results with missing names (display-only issue)

4.6 KiB Raw Blame History