Stability hardening (webui/app): - readFileLimited(): защита от OOM при чтении audit JSON (100 MB), component-status DB (10 MB) и лога задачи (50 MB) - jobs.go: буферизованный лог задачи — один открытый fd на задачу вместо open/write/close на каждую строку (устраняет тысячи syscall/сек при GPU стресс-тестах) - stability.go: экспоненциальный backoff в goRecoverLoop (2s→4s→…→60s), сброс при успешном прогоне >30s, счётчик перезапусков в slog - kill_workers.go: таймаут 5s на скан /proc, warn при срабатывании - bee-web.service: MemoryMax=3G — OOM killer защищён Build script: - build.sh: удалён блок генерации grub-pc/grub.cfg + live.cfg.in — мёртвый код с v8.25; grub-pc игнорируется live-build, а генерируемый live.cfg.in перезаписывал правильный статический файл устаревшей версией без tuning-параметров ядра и пунктов gsp-off/kms+gsp-off - build.sh: dump_memtest_debug теперь логирует grub-efi/grub.cfg вместо grub-pc/grub.cfg (было всегда "missing") GRUB: - live-theme/bee-logo.png: логотип пчелы 400×400px на чёрном фоне - live-theme/theme.txt: + image компонент по центру в верхней трети экрана; меню сдвинуто с 62% до 65% Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.8 KiB
GPU Model Name Propagation
How GPU model names are detected, stored, and displayed throughout the project.
Detection Sources
There are two separate pipelines for GPU model names — they use different structs and don't share state.
Pipeline A — Live / SAT (nvidia-smi query at runtime)
File: audit/internal/platform/sat.go
ListNvidiaGPUs()→NvidiaGPU.Name(field:name, fromnvidia-smi --query-gpu=index,name,...)ListNvidiaGPUStatuses()→NvidiaGPUStatus.Name- Used by: GPU selection UI, live metrics labels, burn/stress test logic
Pipeline B — Benchmark results
File: audit/internal/platform/benchmark.go, line 124
queryBenchmarkGPUInfo(selected)→benchmarkGPUInfo.Name- Stored in
BenchmarkGPUResult.Name(json:"name,omitempty") - Used by: benchmark history table, benchmark report
Pipeline C — Hardware audit JSON (PCIe schema)
File: audit/internal/schema/hardware.go
HardwarePCIeDevice.Model *string(field name is Model, not Name)- For AMD GPUs: populated by
audit/internal/collector/amdgpu.gofrominfo.Product - For NVIDIA GPUs: NOT populated by
audit/internal/collector/nvidia.go— the NVIDIA enricher sets telemetry/status but skips the Model field - Used by: hardware summary page (
hwDescribeGPUinpages.go:487)
Key Inconsistency: NVIDIA PCIe Model is Never Set
audit/internal/collector/nvidia.go — enrichPCIeWithNVIDIAData() enriches NVIDIA PCIe devices with telemetry and status but does not populate HardwarePCIeDevice.Model.
This means:
- Hardware summary page shows "Unknown GPU" for all NVIDIA devices (falls back at
pages.go:486) - AMD GPUs do have their model populated
The fix would be: copy gpu.Name from the SAT pipeline into dev.Model inside enrichPCIeWithNVIDIAData.
Benchmark History "Unknown GPU" Issue
Symptom: Benchmark history table shows "GPU #N — Unknown GPU" columns instead of real GPU model names.
Root cause: BenchmarkGPUResult.Name has tag json:"name,omitempty". If queryBenchmarkGPUInfo() fails (warns at benchmark.go:126) or returns empty names, the Name field is never set and is omitted from JSON. Loaded results have empty Name → falls back to "Unknown GPU" at pages.go:2226, 2237.
This happens for:
- Older result files saved before the
Namefield was added - Runs where nvidia-smi query failed before the benchmark started
Fallback Strings — Current State
| Location | File | Fallback string |
|---|---|---|
| Hardware summary (PCIe) | pages.go:486 |
"Unknown GPU" |
| Benchmark report summary | benchmark_report.go:43 |
"Unknown GPU" |
| Benchmark report scorecard | benchmark_report.go:93 |
"Unknown" ← inconsistent |
| Benchmark report detail | benchmark_report.go:122 |
"Unknown GPU" |
| Benchmark history per-GPU col | pages.go:2226 |
"Unknown GPU" |
| Benchmark history parallel col | pages.go:2237 |
"Unknown GPU" |
| SAT status file write | sat.go:922 |
"unknown" ← lowercase, inconsistent |
| GPU selection API | api.go:163 |
"GPU N" (no "Unknown") |
Rule: all UI fallbacks should use "Unknown GPU". The two outliers are benchmark_report.go:93 ("Unknown") and sat.go:922 ("unknown").
GPU Selection UI
File: audit/internal/webui/pages.go
- Source:
GET /api/gpus→api.go→ListNvidiaGPUs()→ live nvidia-smi - Render:
'GPU ' + gpu.index + ' — ' + gpu.name + ' · ' + mem - Fallback:
gpu.name || 'GPU ' + idx(JS, line ~1432)
This always shows the correct model because it queries nvidia-smi live. It is not connected to benchmark result data.
Data Flow Summary
nvidia-smi (live)
└─ ListNvidiaGPUs() → NvidiaGPU.Name
├─ GPU selection UI (always correct)
├─ Live metrics labels (charts_svg.go)
└─ SAT/burn status file (sat.go)
nvidia-smi (at benchmark start)
└─ queryBenchmarkGPUInfo() → benchmarkGPUInfo.Name
└─ BenchmarkGPUResult.Name (json:"name,omitempty")
├─ Benchmark report
└─ Benchmark history table columns
nvidia-smi / lspci (audit collection)
└─ HardwarePCIeDevice.Model (NVIDIA: NOT populated; AMD: populated)
└─ Hardware summary page hwDescribeGPU()
Fixed Issues
All previously open items are resolved:
- NVIDIA PCIe Model —
enrichPCIeWithNVIDIAData()setsdev.Model = &v(nvidia.go:78). - Fallback consistency —
sat.goandbenchmark_report.goboth use"Unknown GPU". tops_per_sm_per_ghz— computed inbenchmark.goand stored inBenchmarkGPUScore.TOPSPerSMPerGHz.MultiprocessorCount,PowerLimitW,DefaultPowerLimitW— present inbenchmark_types.go.- Old benchmark JSONs — no fix possible for already-saved results with missing names (display-only issue).