63 Commits

Author SHA1 Message Date
8d80048117 redfish: MSI support, fix zero dates, BMC MAC, Assembly FRU, crawler cleanup
- Add MSI CG480-S5063 (H100 SXM5) support:
  - collectGPUsFromProcessors: find GPUs via Processors/ProcessorType=GPU,
    resolve serials from Chassis/<GpuId>
  - looksLikeGPU: skip Description="Display Device" PCIe sidecars
  - isVirtualStorageDrive: filter AMI virtual USB drives (0-byte)
  - enrichNICMACsFromNetworkDeviceFunctions: pull MACs for MSI NICs
  - parseCPUs: filter by ProcessorType, parse Socket, L1/L2/L3 from ProcessorMemory
  - parseMemory: Location.PartLocation.ServiceLabel slot fallback
  - shouldCrawlPath: block /SubProcessors subtrees
- Fix status_checked_at/status_changed_at serializing as 0001-01-01:
  change all StatusCheckedAt/StatusChangedAt fields to *time.Time
- Redfish crawler cleanup:
  - Block non-inventory branches: AccountService, CertificateService,
    EventService, Registries, SessionService, TaskService, manager config paths,
    OperatingConfigs, BootOptions, HostPostCode, Bios/Settings, OEM KVM paths
  - Add Assembly to critical endpoints (FRU data)
  - Remove BootOptions from priority seeds
- collectBMCMAC: read BMC MAC from Managers/*/EthernetInterfaces
- collectAssemblyFRU: extract FRU serial/part from Chassis/*/Assembly
- Firmware: remove NetworkProtocol noise, fix SecureBoot field,
  filter BMCImageN redundant backup slots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 08:12:17 +03:00
21ea129933 misc: sds format support, convert limits, dell dedup, supermicro removal, bible updates
Parser / archive:
- Add .sds extension as tar-format alias (archive.go)
- Add tests for multipart upload size limits (multipart_limits_test.go)
- Remove supermicro crashdump parser (ADL-015)

Dell parser:
- Remove GPU duplicates from PCIeDevices (DCIM_VideoView vs DCIM_PCIDeviceView
  both list the same GPU; VideoView record is authoritative)

Server:
- Add LOGPILE_CONVERT_MAX_MB env var for independent convert batch size limit
- Improve "file too large" error message with current limit value

Web:
- Add CONVERT_MAX_FILES_PER_BATCH = 1000 cap
- Minor UI copy and CSS fixes

Bible:
- bible-local/06-parsers.md: add pci.ids enrichment rule (enrich model from
  pciids when name is empty but vendor_id+device_id are present)
- Sync bible submodule and local overview/architecture docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:23:44 +03:00
9c5512d238 dell: strip MAC from model names; fix device-bound firmware in dell/inspur
- Dell NICView: strip " - XX:XX:XX:XX:XX:XX" suffix from ProductName
  (Dell TSR embeds MAC in this field for every NIC port)
- Dell SoftwareIdentity: same strip applied to ElementName; store FQDD
  in FirmwareInfo.Description so exporter can filter device-bound entries
- Exporter: add isDeviceBoundFirmwareFQDD() to filter firmware entries
  whose Description matches NIC./PSU./Disk./RAID.Backplane./GPU. FQDD
  prefixes (prevents device firmware from appearing in hardware.firmware)
- Exporter: extend isDeviceBoundFirmwareName() to filter HGX GPU/NVSwitch
  firmware inventory IDs (_fw_gpu_, _fw_nvswitch_, _inforom_gpu_)
- Inspur: remove HDD firmware from Hardware.Firmware — already present
  in Storage.Firmware, duplicating it violates ADL-016
- bible-local/06-parsers.md: document firmware and MAC stripping rules
- bible-local/10-decisions.md: add ADL-016 (device-bound firmware) and
  ADL-017 (vendor-embedded MAC in model name fields)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:07:53 +03:00
206496efae unraid: parse dimm/nic/pcie and annotate duplicate serials 2026-03-01 18:14:45 +03:00
7d1a02cb72 Add H3C G5/G6 parsers with PSU and NIC extraction 2026-03-01 17:08:11 +03:00
070971685f Update bible paths kit/ → rules/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 16:57:52 +03:00
78806f9fa0 Add shared bible submodule, rename local bible to bible-local
- Add bible.git as submodule at bible/
- Move docs/bible/ → bible-local/ (project-specific architecture)
- Update CLAUDE.md to reference both bible/ and bible-local/
- Add AGENTS.md for Codex with same structure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 16:38:57 +03:00
4940cd9645 sync file-type support across upload/convert and fix collected_at timezone handling 2026-02-28 23:27:49 +03:00
736b77f055 server: infer archive collected_at from source events 2026-02-28 22:18:47 +03:00
0252264ddc parser: fallback zone-less source timestamps to Europe/Moscow 2026-02-28 22:17:00 +03:00
25e3b8bb42 Add convert mode batch workflow with full progress 2026-02-28 21:44:36 +03:00
bb4505a249 docs: track collection speed/metrics by server model 2026-02-28 19:27:53 +03:00
2fa4a1235a collector/redfish: make prefetch/post-probe adaptive with metrics 2026-02-28 19:05:34 +03:00
fe5da1dbd7 Fix NIC port count handling and apply pending exporter updates 2026-02-28 18:42:01 +03:00
612058ed16 redfish: optimize snapshot/plan-b crawl and add timing diagnostics 2026-02-28 17:56:04 +03:00
e0146adfff Improve Redfish recovery flow and raw export timing diagnostics 2026-02-28 16:55:58 +03:00
9a30705c9a improve redfish collection progress and robust hardware dedup/serial parsing 2026-02-28 16:07:42 +03:00
8dbbec3610 optimize redfish post-probe and add eta progress 2026-02-28 15:41:44 +03:00
4c60ebbf1d collector/redfish: remove pre-snapshot critical duplicate pass 2026-02-28 15:28:24 +03:00
c52fea2fec collector/redfish: emit critical warmup branch and eta progress 2026-02-28 15:21:49 +03:00
dae4744eb3 ui: show latest collect branch/eta message instead of generic running text 2026-02-28 15:19:36 +03:00
b6ff47fea8 collector/redfish: skip deep DIMM subresources and remove memory from critical warmup 2026-02-28 15:16:04 +03:00
1d282c4196 collector/redfish: collect and parse platform model fallback 2026-02-28 14:54:55 +03:00
f35cabac48 collector/redfish: fix server model fallback and GPU/NVMe regressions 2026-02-28 14:50:02 +03:00
a2c9e9a57f collector/redfish: add ETA estimates to snapshot and plan-B progress 2026-02-28 14:36:18 +03:00
b918363252 collector/redfish: dedupe model-only GPU rows from graphics controllers 2026-02-28 13:04:34 +03:00
6c19a58b24 collector/redfish: expand endpoint coverage and timestamp collect logs 2026-02-28 12:59:57 +03:00
9aadf2f1e9 collector/redfish: improve GPU SN/model fallback and warnings 2026-02-28 12:52:22 +03:00
Mikhail Chusavitin
ddab93a5ee Add release notes for v1.7.0 2026-02-25 13:31:54 +03:00
Mikhail Chusavitin
000199fbdc Add parse errors tab and improve error diagnostics UI 2026-02-25 13:28:19 +03:00
Mikhail Chusavitin
68592da9f5 Harden Redfish collection for slow BMC endpoints 2026-02-25 12:42:43 +03:00
Mikhail Chusavitin
b1dde592ae Expand Redfish best-effort snapshot crawling 2026-02-25 12:24:06 +03:00
Mikhail Chusavitin
693b7346ab Update docs and add release artifacts 2026-02-25 12:17:17 +03:00
Mikhail Chusavitin
a4a1a19a94 Improve Redfish raw replay recovery and GUI diagnostics 2026-02-25 12:16:31 +03:00
Mikhail Chusavitin
66fb90233f Unify Redfish analysis through raw replay and add storage volumes 2026-02-24 18:34:13 +03:00
Mikhail Chusavitin
7a1285db99 Expand Redfish storage fallback for enclosure Disk.Bay paths 2026-02-24 18:25:00 +03:00
Mikhail Chusavitin
144d298efa Show total current PSU power and rely on server voltage status 2026-02-24 18:22:38 +03:00
Mikhail Chusavitin
a6c90b6e77 Probe Supermicro NVMe Disk.Bay endpoints for drive inventory 2026-02-24 18:22:02 +03:00
Mikhail Chusavitin
2e348751f3 Use 230V nominal range for PSU voltage sensor highlighting 2026-02-24 18:07:34 +03:00
Mikhail Chusavitin
15dc86a0e4 Add PSU voltage sensors with 220V range highlighting 2026-02-24 18:05:26 +03:00
Mikhail Chusavitin
752b063613 Increase upload multipart limit for raw export bundles 2026-02-24 17:42:49 +03:00
Mikhail Chusavitin
6f66a8b2a1 Raise Redfish snapshot crawl limit and prioritize PCIe paths 2026-02-24 17:41:37 +03:00
Mikhail Chusavitin
ce30f943df Export raw bundles with collection logs and parser field snapshot 2026-02-24 17:36:44 +03:00
Mikhail Chusavitin
810c4b5ff9 Add raw export reanalyze flow for Redfish snapshots 2026-02-24 17:23:26 +03:00
Mikhail Chusavitin
5d9e9d73de Fix Redfish snapshot crawl deadlock and add debug progress 2026-02-24 16:22:37 +03:00
38cc051f23 docs: consolidate architecture docs into bible 2026-02-23 17:51:25 +03:00
Mikhail Chusavitin
fcd57c1ba9 docs: introduce project Bible and consolidate all architecture documentation
- Create docs/bible/ with 10 structured chapters (overview, architecture,
  API, data models, collectors, parsers, exporters, build, testing, decisions)
- All documentation in English per ADL-007
- Record all existing architectural decisions in docs/bible/10-decisions.md
- Slim README.md to user-facing quick start only
- Replace CLAUDE.md with a single directive to read and follow the Bible
- Remove absorbed files: REANIMATOR_EXPORT.md, docs/INTEGRATION_GUIDE.md,
  and all vendor parser README.md files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 14:15:35 +03:00
Mikhail Chusavitin
82ee513835 Add release build script 2026-02-20 14:04:21 +03:00
de5521a4e5 Introduce canonical hardware.devices repository and align UI/Reanimator exports 2026-02-17 19:07:18 +03:00
a82b55b144 docs: add release notes for v1.5.0 2026-02-17 18:11:18 +03:00
758fa66282 feat: improve inspur parsing and pci.ids integration 2026-02-17 18:09:36 +03:00
b33cca5fcc nvidia: improve component mapping, firmware, statuses and check times 2026-02-16 23:17:13 +03:00
514da76ddb Update Inspur parsing and align release docs 2026-02-15 23:13:47 +03:00
c13788132b Add release script and release notes (no artifacts) 2026-02-15 22:23:53 +03:00
5e49adaf05 Update parser and project changes 2026-02-15 22:02:07 +03:00
c7b2a7ab29 Fix NVIDIA GPU/NVSwitch parsing and Reanimator export statuses 2026-02-15 21:00:30 +03:00
0af3cee9b6 Add integration guide, example generator, and built binary 2026-02-15 20:08:46 +03:00
8715fcace4 Align Reanimator export with updated integration guide 2026-02-15 20:06:36 +03:00
1b1bc74fc7 Add Reanimator format export support
Implement export to Reanimator format for asset tracking integration.

Features:
- New API endpoint: GET /api/export/reanimator
- Web UI button "Экспорт Reanimator" in Configuration tab
- Auto-detect CPU manufacturer (Intel/AMD/ARM/Ampere)
- Generate PCIe serial numbers if missing
- Merge GPUs and NetworkAdapters into pcie_devices
- Filter components without serial numbers
- RFC3339 timestamp format
- Full compliance with Reanimator specification

Changes:
- Add reanimator_models.go: data models for Reanimator format
- Add reanimator_converter.go: conversion functions
- Add reanimator_converter_test.go: unit tests
- Add reanimator_integration_test.go: integration tests
- Update handlers.go: add handleExportReanimator
- Update server.go: register /api/export/reanimator route
- Update index.html: add export button
- Update CLAUDE.md: document export behavior
- Add REANIMATOR_EXPORT.md: implementation summary

Tests: All tests passing (15+ new tests)
Format spec: example/docs/INTEGRATION_GUIDE.md

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-12 21:54:37 +03:00
77e25ddc02 Fix NVIDIA GPU serial number format extraction
Extract decimal serial numbers from devname parameters (e.g., "SXM5_SN_1653925027099")
instead of hex PCIe Device Serial Numbers. This provides the correct GPU serial
numbers as they appear in NVIDIA diagnostics tooling.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 22:57:50 +03:00
bcce975fd6 Add GPU serial number extraction for NVIDIA diagnostics
Parse inventory/output.log to extract GPU serial numbers from lspci output,
expose them via serials API, and add GPU category to web UI.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 22:50:46 +03:00
8b065c6cca Harden zip reader and syslog scan 2026-02-06 00:03:25 +03:00
aa22034944 Add Unraid diagnostics parser and fix zip upload support
Implements comprehensive parser for Unraid diagnostics archives with support for:
- System information (OS version, BIOS, motherboard)
- CPU details from lscpu (model, cores, threads, frequency)
- Memory information
- Storage devices with SMART data integration
- Temperature sensors from disk array
- System event logs

Parser intelligently merges data from multiple sources:
- SMART files provide detailed disk information (model, S/N, firmware)
- vars.txt provides disk configuration and filesystem types
- Deduplication ensures clean results

Also fixes critical bug where zip archives could not be uploaded via web interface
due to missing extractZipFromReader implementation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 23:54:55 +03:00
131 changed files with 72482 additions and 2439 deletions

7
.gitignore vendored
View File

@@ -62,3 +62,10 @@ go.work.sum
# Distribution binaries
dist/
# Release artifacts
release/
releases/
releases/**/SHA256SUMS.txt
releases/**/*.tar.gz
releases/**/*.zip

6
.gitmodules vendored Normal file
View File

@@ -0,0 +1,6 @@
[submodule "third_party/pciids"]
path = third_party/pciids
url = https://github.com/pciutils/pciids.git
[submodule "bible"]
path = bible
url = https://git.mchus.pro/mchus/bible.git

11
AGENTS.md Normal file
View File

@@ -0,0 +1,11 @@
# LOGPile — Instructions for Codex
## Shared Engineering Rules
Read `bible/` — shared rules for all projects (CSV, logging, DB, tables, background tasks, code style).
Start with `bible/rules/patterns/` for specific contracts.
## Project Architecture
Read `bible-local/` — LOGPile specific architecture.
Read order: `bible-local/README.md``01-overview.md` → relevant files for the task.
Every architectural decision specific to this project must be recorded in `bible-local/10-decisions.md`.

100
CLAUDE.md
View File

@@ -1,95 +1,11 @@
# LOGPile - Engineering Notes (for Claude/Codex)
# LOGPile — Instructions for Claude
## Project summary
## Shared Engineering Rules
Read `bible/` — shared rules for all projects (CSV, logging, DB, tables, background tasks, code style).
Start with `bible/rules/patterns/` for specific contracts.
LOGPile is a standalone Go app for BMC diagnostics analysis with embedded web UI.
## Project Architecture
Read `bible-local/` — LOGPile specific architecture.
Read order: `bible-local/README.md``01-overview.md` → relevant files for the task.
Current product modes:
1. Upload and parse vendor archives / JSON snapshots.
2. Collect live data via Redfish and analyze/export it.
## Runtime architecture
- Go + `net/http` (`http.ServeMux`)
- Embedded UI (`web/embed.go`, `//go:embed templates static`)
- In-memory state (`Server.result`, `Server.detectedVendor`)
- Job manager for live collect status/logs
Default port: `8082`.
## Key flows
### Upload flow (`POST /api/upload`)
- Accepts multipart file field `archive`.
- If file looks like JSON, parsed as `models.AnalysisResult` snapshot.
- Otherwise passed to archive parser (`parser.NewBMCParser().ParseFromReader(...)`).
- Result stored in memory and exposed by API/UI.
### Live flow (`POST /api/collect`)
- Validates request (`host/protocol/port/username/auth_type/tls_mode`).
- Runs collector asynchronously with progress callback.
- On success:
- source metadata set (`source_type=api`, protocol/host/date),
- result becomes current in-memory dataset.
- On failed/canceled previous dataset stays unchanged.
## Collectors
Registry: `internal/collector/registry.go`
- `redfish` (real collector):
- dynamic discovery of Systems/Chassis/Managers,
- CPU/RAM/Storage/GPU/PSU/NIC/PCIe/Firmware mapping,
- raw Redfish snapshot (`result.RawPayloads["redfish_tree"]`) for offline future analysis,
- progress logs include active collection stage and snapshot progress.
- `ipmi` is currently a mock collector scaffold.
## Export behavior
Endpoints:
- `/api/export/csv`
- `/api/export/json`
- `/api/export/txt`
Filename pattern for all exports:
`YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
Notes:
- JSON export contains full `AnalysisResult`, including `raw_payloads`.
- TXT export is tabular and mirrors UI sections (no raw JSON section).
## CLI flags (`cmd/logpile/main.go`)
- `--port`
- `--file` (reserved/preload, not active workflow)
- `--version`
- `--no-browser`
- `--hold-on-crash` (default true on Windows) — keeps console open on fatal crash for debugging.
## Build / release
- `make build` -> single local binary (`CGO_ENABLED=0`).
- `make build-all` -> cross-platform binaries.
- Tags/releases are published with `tea`.
- Release notes live in `docs/releases/<tag>.md`.
## Testing expectations
Before merge:
```bash
go test ./...
```
If touching collectors/handlers, prefer adding or updating tests in:
- `internal/collector/*_test.go`
- `internal/server/*_test.go`
## Practical coding guidance
- Keep API contracts stable with frontend (`web/static/js/app.js`).
- When adding Redfish mappings, prefer tolerant/fallback parsing:
- alternate collection paths,
- `@odata.id` references and embedded members,
- deduping by serial/BDF/slot+model.
- Avoid breaking snapshot backward compatibility (`AnalysisResult` JSON shape).
Every architectural decision specific to this project must be recorded in `bible-local/10-decisions.md`.

View File

@@ -1,4 +1,4 @@
.PHONY: build run clean test build-all
.PHONY: build run clean test build-all update-pci-ids
BINARY_NAME=logpile
VERSION=$(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
@@ -6,6 +6,7 @@ COMMIT=$(shell git rev-parse --short HEAD 2>/dev/null || echo "none")
LDFLAGS=-ldflags "-X main.version=$(VERSION) -X main.commit=$(COMMIT)"
build:
@if [ "$(SKIP_PCI_IDS_UPDATE)" != "1" ]; then ./scripts/update-pci-ids.sh --best-effort; fi
CGO_ENABLED=0 go build $(LDFLAGS) -o bin/$(BINARY_NAME) ./cmd/logpile
run: build
@@ -19,6 +20,7 @@ test:
# Cross-platform builds
build-all: clean
@if [ "$(SKIP_PCI_IDS_UPDATE)" != "1" ]; then ./scripts/update-pci-ids.sh --best-effort; fi
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build $(LDFLAGS) -o bin/$(BINARY_NAME)-linux-amd64 ./cmd/logpile
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build $(LDFLAGS) -o bin/$(BINARY_NAME)-linux-arm64 ./cmd/logpile
CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build $(LDFLAGS) -o bin/$(BINARY_NAME)-darwin-amd64 ./cmd/logpile
@@ -33,3 +35,6 @@ fmt:
lint:
golangci-lint run
update-pci-ids:
./scripts/update-pci-ids.sh --sync-submodule

150
README.md
View File

@@ -1,151 +1,11 @@
# LOGPile
LOGPile — standalone Go-приложение для анализа диагностических данных BMC.
Standalone Go application for BMC diagnostics analysis with an embedded web UI.
Поддерживает два сценария:
1. Загрузка архивов/снапшотов и оффлайн-анализ в веб-интерфейсе.
2. Live-сбор через Redfish API с последующим экспортом и повторной загрузкой оффлайн.
## Documentation
## Что умеет
- Architecture and technical documentation (single source of truth): [`docs/bible/README.md`](docs/bible/README.md)
- Standalone бинарник с embedded UI (без внешних статических файлов).
- Парсинг vendor-архивов (Supermicro, Inspur/Kaytus, NVIDIA, fallback generic).
- Live-сбор по Redfish (`/api/collect`) с прогрессом и журналом шагов.
- Расширенный Redfish snapshot:
- нормализованные данные (CPU/RAM/Storage/GPU/PSU/NIC/PCIe/Firmware),
- сырой `redfish_tree` для будущего анализа.
- Загрузка JSON snapshot обратно через `/api/upload` для оффлайн-работы.
- Экспорт в CSV / JSON / TXT.
## License
## Требования
- Go 1.22+
## Сборка
```bash
make build
```
Бинарник будет в `bin/logpile`.
Для кросс-сборки:
```bash
make build-all
```
Артефакты:
- `bin/logpile-linux-amd64`
- `bin/logpile-linux-arm64`
- `bin/logpile-darwin-amd64`
- `bin/logpile-darwin-arm64`
- `bin/logpile-windows-amd64.exe`
## Запуск
```bash
./bin/logpile
./bin/logpile --port 8082
./bin/logpile --no-browser
./bin/logpile --version
```
Отладка падений (чтобы консоль не закрывалась):
```bash
./bin/logpile --hold-on-crash
```
> На Windows `--hold-on-crash` включён по умолчанию.
## Форматы загрузки
`POST /api/upload` принимает:
- архивы: `.tar`, `.tar.gz`, `.tgz`
- JSON snapshot (`AnalysisResult`)
## Live Redfish
Запуск live-сбора:
```http
POST /api/collect
```
Пример body:
```json
{
"host": "bmc01.example.local",
"protocol": "redfish",
"port": 443,
"username": "admin",
"auth_type": "password",
"password": "secret",
"tls_mode": "insecure"
}
```
Жизненный цикл задачи:
`queued -> running -> success|failed|canceled`
Статус и прогресс:
- `GET /api/collect/{id}`
- `POST /api/collect/{id}/cancel`
## Экспорт
- `GET /api/export/csv` — серийные номера
- `GET /api/export/json` — полный `AnalysisResult` (включая `raw_payloads`)
- `GET /api/export/txt` — табличный отчёт по разделам UI
Имена экспортируемых файлов:
`YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
Пример:
`2026-02-04 (SYS-421GE-TNHR2) - C8X123456789.json`
## API
```text
POST /api/upload
POST /api/collect
GET /api/collect/{id}
POST /api/collect/{id}/cancel
GET /api/status
GET /api/parsers
GET /api/events
GET /api/sensors
GET /api/config
GET /api/serials
GET /api/firmware
GET /api/export/csv
GET /api/export/json
GET /api/export/txt
DELETE /api/clear
POST /api/shutdown
```
`/api/status` и `/api/config` содержат метаданные источника:
- `source_type`: `archive` | `api`
- `protocol`: `redfish` | `ipmi` (для архивов может быть пустым)
- `target_host`
- `collected_at`
## Структура
```text
cmd/logpile/main.go # entrypoint
internal/collector/ # live collectors (redfish, ipmi mock)
internal/parser/ # archive parsers
internal/server/ # HTTP handlers
internal/exporter/ # CSV/JSON/TXT export
internal/models/ # data contracts
web/ # embedded templates/static
```
## Лицензия
MIT — см. `LICENSE`.
MIT (see `LICENSE`)

1
bible Submodule

Submodule bible added at 0c829182a1

View File

@@ -0,0 +1,35 @@
# 01 — Overview
## What is LOGPile?
LOGPile is a standalone Go application for BMC (Baseboard Management Controller)
diagnostics analysis with an embedded web UI.
It runs as a single binary with no external file dependencies.
## Operating modes
| Mode | Entry point | Description |
|------|-------------|-------------|
| **Offline / archive** | `POST /api/upload` | Upload a vendor diagnostic archive or a JSON snapshot; parse and display in UI |
| **Live / Redfish** | `POST /api/collect` | Connect to a live BMC via Redfish API, collect hardware inventory, display and export |
Both modes produce the same in-memory `AnalysisResult` structure and expose it
through the same API and UI.
## Key capabilities
- Single self-contained binary with embedded HTML/JS/CSS (no static file serving required).
- Vendor archive parsing: Inspur/Kaytus, Dell TSR, NVIDIA HGX Field Diagnostics,
NVIDIA Bug Report, Unraid, XigmaNAS, Generic text fallback.
- Live Redfish collection with async progress tracking.
- Normalized hardware inventory: CPU / RAM / Storage / GPU / PSU / NIC / PCIe / Firmware.
- Raw `redfish_tree` snapshot stored in `RawPayloads` for future offline re-analysis.
- Re-upload of a JSON snapshot for offline work (`/api/upload` accepts `AnalysisResult` JSON).
- Export in CSV, JSON (full `AnalysisResult`), and Reanimator format.
- PCI device model resolution via embedded `pci.ids` (no hardcoded model strings).
## Non-goals (current scope)
- No persistent storage — all state is in-memory per process lifetime.
- IPMI collector is a mock scaffold only; real IPMI support is not implemented.
- No authentication layer on the HTTP server.

View File

@@ -0,0 +1,115 @@
# 02 — Architecture
## Runtime stack
| Layer | Technology |
|-------|------------|
| Language | Go 1.22+ |
| HTTP | `net/http`, `http.ServeMux` |
| UI | Embedded via `//go:embed` in `web/embed.go` (templates + static assets) |
| State | In-memory only — no database |
| Build | `CGO_ENABLED=0`, single static binary |
Default port: **8082**
## Directory structure
```
cmd/logpile/main.go # Binary entry point, CLI flag parsing
internal/
collector/ # Live data collectors
registry.go # Collector registration
redfish.go # Redfish connector (real implementation)
ipmi_mock.go # IPMI mock connector (scaffold)
types.go # Connector request/progress contracts
parser/ # Archive parsers
parser.go # BMCParser (dispatcher) + parse orchestration
archive.go # Archive extraction helpers
registry.go # Parser registry + detect/selection
interface.go # VendorParser interface
vendors/ # Vendor-specific parser modules
vendors.go # Import-side-effect registrations
dell/
inspur/
nvidia/
nvidia_bug_report/
unraid/
xigmanas/
generic/
pciids/ # PCI IDs lookup (embedded pci.ids)
server/ # HTTP layer
server.go # Server struct, route registration
handlers.go # All HTTP handler functions
exporter/ # Export formatters
exporter.go # CSV + JSON exporters
reanimator_models.go
reanimator_converter.go
models/ # Shared data contracts
web/
embed.go # go:embed directive
templates/ # HTML templates
static/ # JS / CSS
js/app.js # Frontend — API contract consumer
```
## In-memory state
The `Server` struct in `internal/server/server.go` holds:
| Field | Type | Description |
|-------|------|-------------|
| `result` | `*models.AnalysisResult` | Current parsed/collected dataset |
| `detectedVendor` | `string` | Vendor identifier from last parse |
| `jobManager` | `*JobManager` | Tracks live collect job status/logs |
| `collectors` | `*collector.Registry` | Registered live collection connectors |
State is replaced atomically on successful upload or collect.
On a failed/canceled collect, the previous `result` is preserved unchanged.
## Upload flow (`POST /api/upload`)
```
multipart form field: "archive"
├─ file looks like JSON?
│ └─ parse as models.AnalysisResult snapshot → store in Server.result
└─ otherwise
└─ parser.NewBMCParser().ParseFromReader(...)
├─ try all registered vendor parsers (highest confidence wins)
└─ result → store in Server.result
```
## Live collect flow (`POST /api/collect`)
```
validate request (host / protocol / port / username / auth_type / tls_mode)
└─ launch async job
├─ progress callback → job log (queryable via GET /api/collect/{id})
├─ success:
│ set source metadata (source_type=api, protocol, host, date)
│ store result in Server.result
└─ failure / cancel:
previous Server.result unchanged
```
Job lifecycle states: `queued → running → success | failed | canceled`
## PCI IDs lookup
Load/override order (`LOGPILE_PCI_IDS_PATH` has highest priority because it is loaded last):
1. Embedded `internal/parser/vendors/pciids/pci.ids` (base dataset compiled into binary)
2. `./pci.ids`
3. `/usr/share/hwdata/pci.ids`
4. `/usr/share/misc/pci.ids`
5. `/opt/homebrew/share/pciids/pci.ids`
6. Paths from `LOGPILE_PCI_IDS_PATH` (colon-separated on Unix; later loaded, override same IDs)
This means unknown GPU/NIC model strings can be updated by refreshing `pci.ids`
without any code change.

184
bible-local/03-api.md Normal file
View File

@@ -0,0 +1,184 @@
# 03 — API Reference
## Conventions
- All endpoints under `/api/`.
- Request bodies: `application/json` or `multipart/form-data` where noted.
- Responses: `application/json` unless file download.
- Export filenames follow pattern: `YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
---
## Upload & Data Input
### `POST /api/upload`
Upload a vendor diagnostic archive or a JSON snapshot.
**Request:** `multipart/form-data`, field name `archive`.
Server-side multipart limit: **100 MiB**.
Accepted inputs:
- `.tar`, `.tar.gz`, `.tgz` — vendor diagnostic archives
- `.txt` — plain text files
- JSON file containing a serialized `AnalysisResult` — re-loaded as-is
**Response:** `200 OK` with parsed result summary, or `4xx`/`5xx` on error.
---
## Live Collection
### `POST /api/collect`
Start a live collection job (`redfish` or `ipmi`).
**Request body:**
```json
{
"host": "bmc01.example.local",
"protocol": "redfish",
"port": 443,
"username": "admin",
"auth_type": "password",
"password": "secret",
"tls_mode": "insecure"
}
```
Supported values:
- `protocol`: `redfish` | `ipmi`
- `auth_type`: `password` | `token`
- `tls_mode`: `strict` | `insecure`
**Response:** `202 Accepted`
```json
{
"job_id": "job_a1b2c3d4e5f6",
"status": "queued",
"message": "Collection job accepted",
"created_at": "2026-02-23T12:00:00Z"
}
```
Validation behavior:
- `400 Bad Request` for invalid JSON
- `422 Unprocessable Entity` for semantic validation errors (missing/invalid fields)
### `GET /api/collect/{id}`
Poll job status and progress log.
**Response:**
```json
{
"job_id": "job_a1b2c3d4e5f6",
"status": "running",
"progress": 55,
"logs": ["..."],
"created_at": "2026-02-23T12:00:00Z",
"updated_at": "2026-02-23T12:00:10Z"
}
```
Status values: `queued` | `running` | `success` | `failed` | `canceled`
### `POST /api/collect/{id}/cancel`
Cancel a running job.
---
## Data Queries
### `GET /api/status`
Returns source metadata for the current dataset.
```json
{
"loaded": true,
"filename": "redfish://bmc01.example.local",
"vendor": "redfish",
"source_type": "api",
"protocol": "redfish",
"target_host": "bmc01.example.local",
"collected_at": "2026-02-10T15:30:00Z",
"stats": { "events": 0, "sensors": 0, "fru": 0 }
}
```
`source_type`: `archive` | `api`
When no dataset is loaded, response is `{ "loaded": false }`.
### `GET /api/config`
Returns source metadata plus:
- `hardware.board`
- `hardware.firmware`
- canonical `hardware.devices`
- computed `specification` summary lines
### `GET /api/events`
Returns parsed diagnostic events.
### `GET /api/sensors`
Returns sensor readings (temperatures, voltages, fan speeds).
### `GET /api/serials`
Returns serial numbers built from canonical `hardware.devices`.
### `GET /api/firmware`
Returns firmware versions built from canonical `hardware.devices`.
### `GET /api/parsers`
Returns list of registered vendor parsers with their identifiers.
---
## Export
### `GET /api/export/csv`
Download serial numbers as CSV.
### `GET /api/export/json`
Download full `AnalysisResult` as JSON (includes `raw_payloads`).
### `GET /api/export/reanimator`
Download hardware data in Reanimator format for asset tracking integration.
See [`07-exporters.md`](07-exporters.md) for full format spec.
---
## Management
### `DELETE /api/clear`
Clear current in-memory dataset.
### `POST /api/shutdown`
Gracefully shut down the server process.
This endpoint terminates the current process after responding.
---
## Source metadata fields
Fields present in `/api/status` and `/api/config`:
| Field | Values |
|-------|--------|
| `source_type` | `archive` \| `api` |
| `protocol` | `redfish` \| `ipmi` (may be empty for archive uploads) |
| `target_host` | IP or hostname |
| `collected_at` | RFC3339 timestamp |

View File

@@ -0,0 +1,104 @@
# 04 — Data Models
## AnalysisResult
`internal/models/` — the central data contract shared by parsers, collectors, exporters, and the HTTP layer.
**Stability rule:** Never break the JSON shape of `AnalysisResult`.
Backward-compatible additions are allowed; removals or renames are not.
Key top-level fields:
| Field | Type | Description |
|-------|------|-------------|
| `filename` | `string` | Uploaded filename or generated live source identifier |
| `source_type` | `string` | `archive` or `api` |
| `protocol` | `string` | `redfish`, `ipmi`, or empty for archive uploads |
| `target_host` | `string` | BMC host for live collection |
| `collected_at` | `time.Time` | Upload/collection timestamp |
| `hardware` | `*HardwareConfig` | All normalized hardware inventory |
| `events` | `[]Event` | Diagnostic events from parsers |
| `fru` | `[]FRUInfo` | FRU/SDR-derived inventory details |
| `sensors` | `[]SensorReading` | Sensor readings |
| `raw_payloads` | `map[string]any` | Raw vendor data (e.g. `redfish_tree`) |
`raw_payloads` is the durable source for offline re-analysis (especially for Redfish).
Normalized fields should be treated as derivable output from raw source data.
### Hardware sub-structure
```
HardwareConfig
├── board BoardInfo — server/motherboard identity
├── devices []HardwareDevice — CANONICAL INVENTORY (see below)
├── cpus []CPU
├── memory []MemoryDIMM
├── storage []Storage
├── volumes []StorageVolume — logical RAID/VROC volumes
├── pcie_devices []PCIeDevice
├── gpus []GPU
├── network_adapters []NetworkAdapter
├── network_cards []NIC (legacy/alternate source field)
├── power_supplies []PSU
└── firmware []FirmwareInfo
```
---
## Canonical Device Repository (`hardware.devices`)
`hardware.devices` is the **single source of truth** for hardware inventory.
### Rules — must not be violated
1. All UI tabs displaying hardware components **must read from `hardware.devices`**.
2. The Device Inventory tab shows kinds: `pcie`, `storage`, `gpu`, `network`.
3. The Reanimator exporter **must use the same `hardware.devices`** as the UI.
4. Any discrepancy between UI data and Reanimator export data is a **bug**.
5. New hardware attributes must be added to the canonical device schema **first**,
then mapped to Reanimator/UI — never the other way around.
6. The exporter should group/filter canonical records by section, not rebuild data
from multiple sources.
### Deduplication logic (applied once by repository builder)
| Priority | Key used |
|----------|----------|
| 1 | `serial_number` — usable (not empty, not `N/A`, `NA`, `NONE`, `NULL`, `UNKNOWN`, `-`) |
| 2 | `bdf` — PCI Bus:Device.Function address |
| 3 | No merge — records remain distinct if both serial and bdf are absent |
### Device schema alignment
Keep `hardware.devices` schema as close as possible to Reanimator JSON field names.
This minimizes translation logic in the exporter and prevents drift.
---
## Source metadata fields (stored directly on `AnalysisResult`)
Carried by both `/api/status` and `/api/config`:
```json
{
"source_type": "api",
"protocol": "redfish",
"target_host": "10.0.0.1",
"collected_at": "2026-02-10T15:30:00Z"
}
```
Valid `source_type` values: `archive`, `api`
Valid `protocol` values: `redfish`, `ipmi` (empty is allowed for archive uploads)
---
## Raw Export Package (reopenable artifact)
`Export Raw Data` does not merely dump `AnalysisResult`; it emits a reopenable raw package
(JSON or ZIP bundle) that carries source data required for re-analysis.
Design rules:
- raw source is authoritative (`redfish_tree` or original file bytes)
- imports must re-analyze from raw source
- parsed field snapshots included in bundles are diagnostic artifacts, not the source of truth

View File

@@ -0,0 +1,109 @@
# 05 — Collectors
Collectors live in `internal/collector/`.
Core files:
- `internal/collector/registry.go` — connector registry (`redfish`, `ipmi`)
- `internal/collector/redfish.go` — real Redfish connector
- `internal/collector/ipmi_mock.go` — IPMI mock connector scaffold
- `internal/collector/types.go` — request/progress contracts
---
## Redfish Collector (`redfish`)
**Status:** Production-ready.
### Request contract (from server)
Passed through from `/api/collect` after validation:
- `host`, `port`, `username`
- `auth_type=password|token` (+ matching credential field)
- `tls_mode=strict|insecure`
### Discovery
Dynamic — does not assume fixed paths. Discovers:
- `Systems` collection → per-system resources
- `Chassis` collection → enclosure/board data
- `Managers` collection → BMC/firmware info
### Collected data
| Category | Notes |
|----------|-------|
| CPU | Model, cores, threads, socket, status |
| Memory | DIMM slot, size, type, speed, serial, manufacturer |
| Storage | Slot, type, model, serial, firmware, interface, status |
| GPU | Detected via PCIe class + NVIDIA vendor ID |
| PSU | Model, serial, wattage, firmware, telemetry (input/output power, voltage) |
| NIC | Model, serial, port count, BDF |
| PCIe | Slot, vendor_id, device_id, BDF, link width/speed |
| Firmware | BIOS, BMC versions |
### Raw snapshot
Full Redfish response tree is stored in `result.RawPayloads["redfish_tree"]`.
This allows future offline re-analysis without re-collecting from a live BMC.
### Unified Redfish analysis pipeline (live == replay)
LOGPile uses a **single Redfish analyzer path**:
1. Live collector crawls the Redfish API and builds `raw_payloads.redfish_tree`
2. Parsed result is produced by replaying that tree through the same analyzer used by raw import
This guarantees that live collection and `Export Raw Data` re-open/re-analyze produce the same
normalized output for the same `redfish_tree`.
### Snapshot crawler behavior (important)
The Redfish snapshot crawler is intentionally:
- **bounded** (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
- **prioritized** (PCIe, Fabrics, FirmwareInventory, Storage, PowerSubsystem, ThermalSubsystem)
- **tolerant** (skips noisy expected failures, strips `#fragment` from `@odata.id`)
Design notes:
- Queue capacity is sized to snapshot cap to avoid worker deadlocks on large trees.
- UI progress is coarse and human-readable; detailed per-request diagnostics are available via debug logs.
- `LOGPILE_REDFISH_DEBUG=1` and `LOGPILE_REDFISH_SNAPSHOT_DEBUG=1` enable console diagnostics.
### Parsing guidelines
When adding Redfish mappings, follow these principles:
- Support alternate collection paths (resources may appear at different odata URLs).
- Follow `@odata.id` references and handle embedded `Members` arrays.
- Prefer **raw-tree replay compatibility**: if live collector adds a fallback/probe, replay analyzer must mirror it.
- Deduplicate by serial / BDF / slot+model (in that priority order).
- Prefer tolerant/fallback parsing — missing fields should be silently skipped,
not cause the whole collection to fail.
### Vendor-specific storage fallbacks (Supermicro and similar)
When standard `Storage/.../Drives` collections are empty, collector/replay may recover drives via:
- `Storage.Links.Enclosures[*] -> .../Drives`
- direct probing of finite `Disk.Bay` candidates (`Disk.Bay.0`, `Disk.Bay0`, `.../0`)
This is required for some BMCs that publish drive inventory in vendor-specific paths while leaving
standard collections empty.
### PSU source preference (newer Redfish)
PSU inventory source order:
1. `Chassis/*/PowerSubsystem/PowerSupplies` (preferred on X14+/newer Redfish)
2. `Chassis/*/Power` (legacy fallback)
### Progress reporting
The collector emits progress log entries at each stage (connecting, enumerating systems,
collecting CPUs, etc.) so the UI can display meaningful status.
Current progress message strings are user-facing and may be localized.
---
## IPMI Collector (`ipmi`)
**Status:** Mock scaffold only — not implemented.
Registered in the collector registry but returns placeholder data.
Real IPMI support is a future work item.

341
bible-local/06-parsers.md Normal file
View File

@@ -0,0 +1,341 @@
# 06 — Parsers
## Framework
### Registration
Each vendor parser registers itself via Go's `init()` side-effect import pattern.
All registrations are collected in `internal/parser/vendors/vendors.go`:
```go
import (
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
// etc.
)
```
### VendorParser interface
```go
type VendorParser interface {
Name() string // human-readable name
Vendor() string // vendor identifier string
Version() string // parser version (increment on logic changes)
Detect(files []ExtractedFile) int // confidence 0100
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
}
```
### Selection logic
All registered parsers run `Detect()` against the uploaded archive's file list.
The parser with the **highest confidence score** is selected.
Multiple parsers may return >0; only the top scorer is used.
### Adding a new vendor parser
1. `mkdir -p internal/parser/vendors/VENDORNAME`
2. Copy `internal/parser/vendors/template/parser.go.template` as starting point.
3. Implement `Detect()` and `Parse()`.
4. Add blank import to `vendors/vendors.go`.
`Detect()` tips:
- Look for unique filenames or directory names.
- Check file content for vendor-specific markers.
- Return 70+ only when confident; return 0 if clearly not a match.
### Parser versioning
Each parser file contains a `parserVersion` constant.
Increment the version whenever parsing logic changes — this helps trace which
version produced a given result.
---
## Parser data quality rules
### FirmwareInfo — system-level only
`Hardware.Firmware` must contain **only system-level firmware**: BIOS, BMC/iDRAC,
Lifecycle Controller, CPLD, storage controllers, BOSS adapters.
**Device-bound firmware** (NIC, GPU, PSU, disk, backplane) **must NOT be added to
`Hardware.Firmware`**. It belongs to the device's own `Firmware` field and is already
present there. Duplicating it in `Hardware.Firmware` causes double entries in Reanimator.
The Reanimator exporter filters by `FirmwareInfo.DeviceName` prefix and by
`FirmwareInfo.Description` (FQDD prefix). Parsers must cooperate:
- Store the device's FQDD (or equivalent slot identifier) in `FirmwareInfo.Description`
for all firmware entries that come from a per-device inventory source (e.g. Dell
`DCIM_SoftwareIdentity`).
- FQDD prefixes that are device-bound: `NIC.`, `PSU.`, `Disk.`, `RAID.Backplane.`, `GPU.`
### NIC/device model names — strip embedded MAC addresses
Some vendors (confirmed: Dell TSR) embed the MAC address in the device model name field,
e.g. `ProductName = "NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08"`.
**Rule:** Strip any ` - XX:XX:XX:XX:XX:XX` suffix from model/name strings before storing
them in `FirmwareInfo.DeviceName`, `NetworkAdapter.Model`, or any other model field.
Use `nicMACInModelRE` (defined in the Dell parser) or an equivalent regex:
```
\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$
```
This applies to **all** string fields used as device names or model identifiers.
### PCI device name enrichment via pci.ids
If a PCIe device, GPU, NIC, or any hardware component has a `vendor_id` + `device_id`
but its model/name field is **empty or generic** (e.g. blank, equals the description,
or is just a raw hex ID), the parser **must** attempt to resolve the human-readable
model name from the embedded `pci.ids` database before storing the result.
**Rule:** When `Model` (or equivalent name field) is empty and both `VendorID` and
`DeviceID` are non-zero, call the pciids lookup and use the result as the model name.
```go
// Example pattern — use in any parser that handles PCIe/GPU/NIC devices:
if strings.TrimSpace(device.Model) == "" && device.VendorID != 0 && device.DeviceID != 0 {
if name := pciids.Lookup(device.VendorID, device.DeviceID); name != "" {
device.Model = name
}
}
```
This rule applies to all vendor parsers. The pciids package is available at
`internal/parser/vendors/pciids`. See ADL-005 for the rationale.
**Do not hardcode model name strings.** If a device is unknown today, it will be
resolved automatically once `pci.ids` is updated.
---
## Vendor parsers
### Inspur / Kaytus (`inspur`)
**Status:** Ready. Tested on KR4268X2 (onekeylog format).
**Archive format:** `.tar.gz` onekeylog
**Primary source files:**
| File | Content |
|------|---------|
| `asset.json` | Base hardware inventory |
| `component.log` | Component list |
| `devicefrusdr.log` | FRU and SDR data |
| `onekeylog/runningdata/redis-dump.rdb` | Runtime enrichment (optional) |
**Redis RDB enrichment** (applied conservatively — fills missing fields only):
- GPU: `serial_number`, `firmware` (VBIOS/FW), runtime telemetry
- NIC: firmware, serial, part number (when text logs leave fields empty)
**Module structure:**
```
inspur/
parser.go — main parser + registration
sdr.go — sensor/SDR parsing
fru.go — FRU serial parsing
asset.go — asset.json parsing
syslog.go — syslog parsing
```
---
### Dell TSR (`dell`)
**Status:** Ready (v3.0). Tested on nested TSR archives with embedded `*.pl.zip`.
**Archive format:** `.zip` (outer archive + nested `*.pl.zip`)
**Primary source files:**
- `tsr/metadata.json`
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml`
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml`
- `tsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xml`
- `tsr/hardware/sysinfo/lcfiles/curr_lclog.xml`
**Extracted data:**
- Board/system identity and BIOS/iDRAC firmware
- CPU, memory, physical disks, virtual disks, PSU, NIC, PCIe
- GPU inventory (`DCIM_VideoView`) + GPU sensor enrichment (`DCIM_GPUSensor`)
- Controller/backplane inventory (`DCIM_ControllerView`, `DCIM_EnclosureView`)
- Sensor readings (temperature/voltage/current/power/fan/utilization)
- Lifecycle events (`curr_lclog.xml`)
---
### NVIDIA HGX Field Diagnostics (`nvidia`)
**Status:** Ready (v1.1.0). Works with any server vendor.
**Archive format:** `.tar` / `.tar.gz`
**Confidence scoring:**
| File | Score |
|------|-------|
| `unified_summary.json` with "HGX Field Diag" marker | +40 |
| `summary.json` | +20 |
| `summary.csv` | +15 |
| `gpu_fieldiag/` directory | +15 |
**Source files:**
| File | Content |
|------|---------|
| `output.log` | dmidecode — server manufacturer, model, serial number |
| `unified_summary.json` | GPU details, NVSwitch devices, PCI addresses |
| `summary.json` | Diagnostic test results and error codes |
| `summary.csv` | Alternative test results format |
**Extracted data:**
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
**Severity mapping:**
- `info` — tests passed
- `warning` — e.g. "Row remapping failed"
- `critical` — error codes 300+
**Known limitations:**
- Detailed logs in `gpu_fieldiag/*.log` are not parsed.
- No CPU, memory, or storage extraction (not present in field diag archives).
---
### NVIDIA Bug Report (`nvidia_bug_report`)
**Status:** Ready (v1.0.0).
**File format:** `nvidia-bug-report-*.log.gz` (gzip-compressed text)
**Confidence:** 85 (high priority for matching filename pattern)
**Source sections parsed:**
| dmidecode section | Extracts |
|-------------------|---------|
| System Information | server serial, UUID, manufacturer, product name |
| Processor Information | CPU model, serial, core/thread count, frequency |
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
| Other source | Extracts |
|--------------|---------|
| `lspci -vvv` (Ethernet/Network/IB) | NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
| `/proc/driver/nvidia/gpus/*/information` | GPU model, BDF, UUID, VBIOS version, IRQ |
| NVRM version line | NVIDIA driver version |
**Known limitations:**
- Driver error/warning log lines not yet extracted.
- GPU temperature/utilization metrics require additional parsing sections.
---
### XigmaNAS (`xigmanas`)
**Status:** Ready.
**Archive format:** Plain log files (FreeBSD-based NAS system)
**Detection:** Files named `xigmanas`, `system`, or `dmesg`; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
**Extracted data:**
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
- Populates: `Hardware.Firmware`, `Hardware.CPUs`, `Hardware.Memory`, `Hardware.Storage`, `Sensors`
---
### Unraid (`unraid`)
**Status:** Ready (v1.0.0).
**Archive format:** Unraid diagnostics archive contents (text-heavy diagnostics directories).
**Detection:** Combines filename/path markers (`diagnostics-*`, `unraid-*.txt`, `vars.txt`)
with content markers (e.g. `Unraid kernel build`, parity data markers).
**Extracted data (current):**
- Board / BIOS metadata (from motherboard/system files)
- CPU summary (from `lscpu.txt`)
- Memory modules (from diagnostics memory file)
- Storage devices (from `vars.txt` + SMART files)
- Syslog events
---
### H3C SDS G5 (`h3c_g5`)
**Status:** Ready (v1.0.0). Tested on H3C UniServer R4900 G5 SDS archives.
**Archive format:** `.sds` (tar archive)
**Detection:** `hardware_info.ini`, `hardware.info`, `firmware_version.ini`, `user/test*.csv`, plus H3C markers.
**Extracted data (current):**
- Board/FRU inventory (`FRUInfo.ini`, `board_info.ini`)
- Firmware list (`firmware_version.ini`)
- CPU inventory (`hardware_info.ini`)
- Memory DIMM inventory (`hardware_info.ini`)
- Storage inventory (`hardware.info`, `storage_disk.ini`, `NVMe_info.txt`, RAID text enrichments)
- Logical RAID volumes (`raid.json`, `Storage_RAID-*.txt`)
- Sensor snapshot (`sensor_info.ini`)
- SEL events (`user/test.csv`, `user/test1.csv`, fallback `Sel.json` / `sel_list.txt`)
---
### H3C SDS G6 (`h3c_g6`)
**Status:** Ready (v1.0.0). Tested on H3C UniServer R4700 G6 SDS archives.
**Archive format:** `.sds` (tar archive)
**Detection:** `CPUDetailInfo.xml`, `MemoryDetailInfo.xml`, `firmware_version.json`, `Sel.json`, plus H3C markers.
**Extracted data (current):**
- Board/FRU inventory (`FRUInfo.ini`, `board_info.ini`)
- Firmware list (`firmware_version.json`)
- CPU inventory (`CPUDetailInfo.xml`)
- Memory DIMM inventory (`MemoryDetailInfo.xml`)
- Storage inventory + capacity/model/interface (`storage_disk.ini`, `Storage_RAID-*.txt`, `NVMe_info.txt`)
- Logical RAID volumes (`raid.json`, fallback from `Storage_RAID-*.txt` when available)
- Sensor snapshot (`sensor_info.ini`)
- SEL events (`user/Sel.json`, fallback `user/sel_list.txt`)
---
### Generic text fallback (`generic`)
**Status:** Ready (v1.0.0).
**Confidence:** 15 (lowest — only matches if no other parser scores higher)
**Purpose:** Fallback for any text file or single `.gz` file not matching a specific vendor.
**Behavior:**
- If filename matches `nvidia-bug-report-*.log.gz`: extracts driver version and GPU list.
- Otherwise: confirms file is text (not binary) and records a basic "Text File" event.
---
## Supported vendor matrix
| Vendor | ID | Status | Tested on |
|--------|----|--------|-----------|
| Dell TSR | `dell` | Ready | TSR nested zip archives |
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
| Unraid | `unraid` | Ready | Unraid diagnostics archives |
| XigmaNAS | `xigmanas` | Ready | FreeBSD NAS logs |
| H3C SDS G5 | `h3c_g5` | Ready | H3C UniServer R4900 G5 SDS archives |
| H3C SDS G6 | `h3c_g6` | Ready | H3C UniServer R4700 G6 SDS archives |
| Generic fallback | `generic` | Ready | Any text file |

366
bible-local/07-exporters.md Normal file
View File

@@ -0,0 +1,366 @@
# 07 — Exporters & Reanimator Integration
## Export endpoints summary
| Endpoint | Format | Filename pattern |
|----------|--------|-----------------|
| `GET /api/export/csv` | CSV — serial numbers | `YYYY-MM-DD (MODEL) - SN.csv` |
| `GET /api/export/json` | **Raw export package** (JSON or ZIP bundle) for reopen/re-analysis | `YYYY-MM-DD (MODEL) - SN.(json|zip)` |
| `GET /api/export/reanimator` | Reanimator hardware JSON | `YYYY-MM-DD (MODEL) - SN.json` |
---
## Raw Export (`Export Raw Data`)
### Purpose
Preserve enough source data to reproduce parsing later after parser fixes, without requiring
another live collection from the target system.
### Format
`/api/export/json` returns a **raw export package**:
- JSON package (machine-readable), or
- ZIP bundle containing:
- `raw_export.json` — machine-readable package
- `collect.log` — human-readable collection + parsing summary
- `parser_fields.json` — structured parsed field snapshot for diffs between parser versions
### Import / reopen behavior
When a raw export package is uploaded back into LOGPile:
- the app **re-analyzes from raw source**
- it does **not** trust embedded parsed output as source of truth
For Redfish, this means replay from `raw_payloads.redfish_tree`.
### Design rule
Raw export is a **re-analysis artifact**, not a final report dump. Keep it self-contained and
forward-compatible where possible (versioned package format, additive fields only).
---
## Reanimator Export
### Purpose
Exports hardware inventory data in the format expected by the Reanimator asset tracking
system. Enables one-click push from LOGPile to an external asset management platform.
### Implementation files
| File | Role |
|------|------|
| `internal/exporter/reanimator_models.go` | Go structs for Reanimator JSON |
| `internal/exporter/reanimator_converter.go` | `ConvertToReanimator()` and helpers |
| `internal/server/handlers.go` | `handleExportReanimator()` HTTP handler |
### Conversion rules
- Source: canonical `hardware.devices` repository (see [`04-data-models.md`](04-data-models.md))
- CPU manufacturer inferred from model string (Intel / AMD / ARM / Ampere)
- PCIe serial number generated when absent: `{board_serial}-PCIE-{slot}`
- Status values normalized to: `OK`, `Warning`, `Critical`, `Unknown` (`Empty` only for memory slots)
- Timestamps in RFC3339 format
- `target_host` derived from `filename` field (`redfish://…`, `ipmi://…`) if not in source; omitted if undeterminable
- `board.manufacturer` and `board.product_name` values of `"NULL"` treated as absent
### LOGPile → Reanimator field mapping
| LOGPile type | Reanimator section | Notes |
|---|---|---|
| `BoardInfo` | `board` | Direct mapping |
| `CPU` | `cpus` | + manufacturer (inferred) |
| `MemoryDIMM` | `memory` | Direct; empty slots included (`present=false`) |
| `Storage` | `storage` | Excluded if no `serial_number` |
| `PCIeDevice` | `pcie_devices` | Serial generated if missing |
| `GPU` | `pcie_devices` | `device_class=DisplayController` |
| `NetworkAdapter` | `pcie_devices` | `device_class=NetworkController` |
| `PSU` | `power_supplies` | Excluded if no serial or `present=false` |
| `FirmwareInfo` | `firmware` | Direct mapping |
### Inclusion / exclusion rules
**Included:**
- Memory slots with `present=false` (as Empty slots)
- PCIe devices without serial number (serial is generated)
**Excluded:**
- Storage without `serial_number`
- PSU without `serial_number` or with `present=false`
- NetworkAdapters with `present=false`
---
## Reanimator Integration Guide
This section documents the Reanimator receiver-side JSON format (what the Reanimator
system expects when it ingests a LOGPile export).
> **Important:** The Reanimator endpoint uses a strict JSON decoder (`DisallowUnknownFields`).
> Any unknown field — including nested ones — causes `400 Bad Request`.
> Use only `snake_case` keys listed here.
### Top-level structure
```json
{
"filename": "redfish://10.10.10.103",
"source_type": "api",
"protocol": "redfish",
"target_host": "10.10.10.103",
"collected_at": "2026-02-10T15:30:00Z",
"hardware": {
"board": {...},
"firmware": [...],
"cpus": [...],
"memory": [...],
"storage": [...],
"pcie_devices": [...],
"power_supplies": [...]
}
}
```
**Required:** `collected_at`, `hardware.board.serial_number`
**Optional:** `target_host`, `source_type`, `protocol`, `filename`
`source_type` values: `api`, `logfile`, `manual`
`protocol` values: `redfish`, `ipmi`, `snmp`, `ssh`
### Component status fields (all component sections)
Each component may carry:
| Field | Type | Description |
|-------|------|-------------|
| `status` | string | `OK`, `Warning`, `Critical`, `Unknown`, `Empty` |
| `status_checked_at` | RFC3339 | When status was last verified |
| `status_changed_at` | RFC3339 | When status last changed |
| `status_at_collection` | object | `{ "status": "...", "at": "..." }` — snapshot-time status |
| `status_history` | array | `[{ "status": "...", "changed_at": "...", "details": "..." }]` |
| `error_description` | string | Human-readable error for Warning/Critical |
### Board
```json
{
"board": {
"manufacturer": "Supermicro",
"product_name": "X12DPG-QT6",
"serial_number": "21D634101",
"part_number": "X12DPG-QT6-REV1.01",
"uuid": "d7ef2fe5-2fd0-11f0-910a-346f11040868"
}
}
```
`serial_number` required. `manufacturer` / `product_name` of `"NULL"` treated as absent.
### CPUs
```json
{
"socket": 0,
"model": "INTEL(R) XEON(R) GOLD 6530",
"cores": 32,
"threads": 64,
"frequency_mhz": 2100,
"max_frequency_mhz": 4000,
"manufacturer": "Intel",
"status": "OK"
}
```
`socket` (int) and `model` required. Serial generated: `{board_serial}-CPU-{socket}`.
LOT format: `CPU_{VENDOR}_{MODEL_NORMALIZED}` → e.g. `CPU_INTEL_XEON_GOLD_6530`
### Memory
```json
{
"slot": "CPU0_C0D0",
"location": "CPU0_C0D0",
"present": true,
"size_mb": 32768,
"type": "DDR5",
"max_speed_mhz": 4800,
"current_speed_mhz": 4800,
"manufacturer": "Hynix",
"serial_number": "80AD032419E17CEEC1",
"part_number": "HMCG88AGBRA191N",
"status": "OK"
}
```
`slot` and `present` required. `serial_number` required when `present=true`.
Empty slots (`present=false`, `status="Empty"`) are included but no component created.
LOT format: `DIMM_{TYPE}_{SIZE_GB}GB` → e.g. `DIMM_DDR5_32GB`
### Storage
```json
{
"slot": "OB01",
"type": "NVMe",
"model": "INTEL SSDPF2KX076T1",
"size_gb": 7680,
"serial_number": "BTAX41900GF87P6DGN",
"manufacturer": "Intel",
"firmware": "9CV10510",
"interface": "NVMe",
"present": true,
"status": "OK"
}
```
`slot`, `model`, `serial_number`, `present` required.
LOT format: `{TYPE}_{INTERFACE}_{SIZE_TB}TB` → e.g. `SSD_NVME_07.68TB`
### Power Supplies
```json
{
"slot": "0",
"present": true,
"model": "GW-CRPS3000LW",
"vendor": "Great Wall",
"wattage_w": 3000,
"serial_number": "2P06C102610",
"part_number": "V0310C9000000000",
"firmware": "00.03.05",
"status": "OK",
"input_power_w": 137,
"output_power_w": 104,
"input_voltage": 215.25
}
```
`slot`, `present` required. `serial_number` required when `present=true`.
Telemetry fields (`input_power_w`, `output_power_w`, `input_voltage`) stored in observation only.
LOT format: `PSU_{WATTAGE}W_{VENDOR_NORMALIZED}` → e.g. `PSU_3000W_GREAT_WALL`
### PCIe Devices
```json
{
"slot": "PCIeCard1",
"vendor_id": 32902,
"device_id": 2912,
"bdf": "0000:18:00.0",
"device_class": "MassStorageController",
"manufacturer": "Intel",
"model": "RAID Controller RSP3DD080F",
"link_width": 8,
"link_speed": "Gen3",
"max_link_width": 8,
"max_link_speed": "Gen3",
"serial_number": "RAID-001-12345",
"firmware": "50.9.1-4296",
"status": "OK"
}
```
`slot` required. Serial generated if absent: `{board_serial}-PCIE-{slot}`.
`device_class` values: `NetworkController`, `MassStorageController`, `DisplayController`, etc.
LOT format: `PCIE_{DEVICE_CLASS}_{MODEL_NORMALIZED}` → e.g. `PCIE_NETWORK_CONNECTX5`
### Firmware
```json
[
{ "device_name": "BIOS", "version": "06.08.05" },
{ "device_name": "BMC", "version": "5.17.00" }
]
```
Both fields required. Changes trigger `FIRMWARE_CHANGED` timeline events.
---
### Import process (Reanimator side)
1. Validate `collected_at` (RFC3339) and `hardware.board.serial_number`.
2. Find or create Asset by `board.serial_number``vendor_serial`.
3. For each component: filter `present=false`, auto-determine LOT, find or create Component,
create Observation, update Installations.
4. Detect removed components (present in previous snapshot, absent in current) → close Installation.
5. Generate timeline events: `LOG_COLLECTED`, `INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`.
**Idempotency:** Repeated import of the same snapshot (same content hash) returns `200 OK`
with `"duplicate": true` and does not create duplicate records.
### Reanimator API endpoint
```http
POST /ingest/hardware
Content-Type: application/json
```
**Success (201):**
```json
{
"status": "success",
"bundle_id": "lb_01J...",
"asset_id": "mach_01J...",
"collected_at": "2026-02-10T15:30:00Z",
"duplicate": false,
"summary": {
"parts_observed": 15,
"parts_created": 2,
"installations_created": 2,
"timeline_events_created": 9
}
}
```
**Duplicate (200):**
```json
{ "status": "success", "duplicate": true, "message": "LogBundle with this content hash already exists" }
```
**Error (400):**
```json
{ "status": "error", "error": "validation_failed", "details": { "field": "...", "message": "..." } }
```
Common `400` causes:
- Unknown JSON field (strict decoder)
- Wrong key name (e.g. `targetHost` instead of `target_host`)
- Invalid `collected_at` format (must be RFC3339)
- Empty `hardware.board.serial_number`
### LOT normalization rules
1. Remove special chars `( ) - ® ™`; replace spaces with `_`
2. Uppercase all
3. Collapse multiple underscores to one
4. Strip common prefixes like `MODEL:`, `PN:`
### Status values
| Value | Meaning | Action |
|-------|---------|--------|
| `OK` | Normal | — |
| `Warning` | Degraded | Create `COMPONENT_WARNING` event (optional) |
| `Critical` | Failed | Auto-create `failure_event`, create `COMPONENT_FAILED` event |
| `Unknown` | Not determinable | Treat as working |
| `Empty` | Slot unpopulated | No component created (memory/PCIe only) |
### Missing field handling
| Field | Fallback |
|-------|---------|
| CPU serial | Generated: `{board_serial}-CPU-{socket}` |
| PCIe serial | Generated: `{board_serial}-PCIE-{slot}` |
| Other serial | Component skipped if absent |
| manufacturer (PCIe) | Looked up from `vendor_id` (8086→Intel, 10de→NVIDIA, 15b3→Mellanox…) |
| status | Treated as `Unknown` |
| firmware | No `FIRMWARE_CHANGED` event |

View File

@@ -0,0 +1,89 @@
# 08 — Build & Release
## CLI flags
Defined in `cmd/logpile/main.go`:
| Flag | Default | Description |
|------|---------|-------------|
| `--port` | `8082` | HTTP server port |
| `--file` | — | Reserved for archive preload (not active) |
| `--version` | — | Print version and exit |
| `--no-browser` | — | Do not open browser on start |
| `--hold-on-crash` | `true` on Windows | Keep console open on fatal crash for debugging |
## Build
```bash
# Local binary (current OS/arch)
make build
# Output: bin/logpile
# Cross-platform binaries
make build-all
# Output:
# bin/logpile-linux-amd64
# bin/logpile-linux-arm64
# bin/logpile-darwin-amd64
# bin/logpile-darwin-arm64
# bin/logpile-windows-amd64.exe
```
Both `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort`
before compilation to sync `pci.ids` from the submodule.
To skip PCI IDs update:
```bash
SKIP_PCI_IDS_UPDATE=1 make build
```
Build flags: `CGO_ENABLED=0` — fully static binary, no C runtime dependency.
## PCI IDs submodule
Source: `third_party/pciids` (git submodule → `github.com/pciutils/pciids`)
Local copy embedded at build time: `internal/parser/vendors/pciids/pci.ids`
```bash
# Manual update
make update-pci-ids
# Init submodule after fresh clone
git submodule update --init third_party/pciids
```
## Release process
```bash
scripts/release.sh
```
What it does:
1. Reads version from `git describe --tags`
2. Validates clean working tree (override: `ALLOW_DIRTY=1`)
3. Sets stable `GOPATH` / `GOCACHE` / `GOTOOLCHAIN` env
4. Creates `releases/{VERSION}/` directory
5. Generates `RELEASE_NOTES.md` template if not present
6. Builds `darwin-arm64` and `windows-amd64` binaries
7. Packages all binaries found in `bin/` as `.tar.gz` / `.zip`
8. Generates `SHA256SUMS.txt`
9. Prints next steps (tag, push, create release manually)
Release notes template is created in `releases/{VERSION}/RELEASE_NOTES.md`.
## Running
```bash
./bin/logpile
./bin/logpile --port 9090
./bin/logpile --no-browser
./bin/logpile --version
./bin/logpile --hold-on-crash # keep console open on crash (default on Windows)
```
## macOS Gatekeeper
After downloading a binary, remove the quarantine attribute:
```bash
xattr -d com.apple.quarantine /path/to/logpile-darwin-arm64
```

43
bible-local/09-testing.md Normal file
View File

@@ -0,0 +1,43 @@
# 09 — Testing
## Required before merge
```bash
go test ./...
```
All tests must pass before any change is merged.
## Where to add tests
| Change area | Test location |
|-------------|---------------|
| Collectors | `internal/collector/*_test.go` |
| HTTP handlers | `internal/server/*_test.go` |
| Exporters | `internal/exporter/*_test.go` |
| Parsers | `internal/parser/vendors/<vendor>/*_test.go` |
## Exporter tests
The Reanimator exporter has comprehensive coverage:
| Test file | Coverage |
|-----------|----------|
| `reanimator_converter_test.go` | Unit tests per conversion function |
| `reanimator_integration_test.go` | Full export with realistic `AnalysisResult` |
Run exporter tests only:
```bash
go test ./internal/exporter/...
go test ./internal/exporter/... -v -run Reanimator
go test ./internal/exporter/... -cover
```
## Guidelines
- Prefer table-driven tests for parsing logic (multiple input variants).
- Do not rely on network access in unit tests.
- Test both the happy path and edge cases (missing fields, empty collections).
- When adding a new vendor parser, include at minimum:
- `Detect()` test with a positive and a negative sample file list.
- `Parse()` test with a minimal but representative archive.

256
bible-local/10-decisions.md Normal file
View File

@@ -0,0 +1,256 @@
# 10 — Architectural Decision Log (ADL)
> **Rule:** Every significant architectural decision **must be recorded here** before or alongside
> the code change. This applies to humans and AI assistants alike.
>
> Format: date · title · context · decision · consequences
---
## ADL-001 — In-memory only state (no database)
**Date:** project start
**Context:** LOGPile is designed as a standalone diagnostic tool, not a persistent service.
**Decision:** All parsed/collected data lives in `Server.result` (in-memory). No database, no files written.
**Consequences:**
- Data is lost on process restart — intentional.
- Simple deployment: single binary, no setup required.
- JSON export is the persistence mechanism for users who want to save results.
---
## ADL-002 — Vendor parser auto-registration via init()
**Date:** project start
**Context:** Need an extensible parser registry without a central factory function.
**Decision:** Each vendor parser registers itself in its package's `init()` function.
`vendors/vendors.go` holds blank imports to trigger registration.
**Consequences:**
- Adding a new parser requires only: implement interface + add one blank import.
- No central list to maintain (other than the import file).
- `go test ./...` will include new parsers automatically.
---
## ADL-003 — Highest-confidence parser wins
**Date:** project start
**Context:** Multiple parsers may partially match an archive (e.g. generic + specific vendor).
**Decision:** Run all parsers' `Detect()`, select the one returning the highest score (0100).
**Consequences:**
- Generic fallback (score 15) only activates when no vendor parser scores higher.
- Parsers must be conservative with high scores (70+) to avoid false positives.
---
## ADL-004 — Canonical hardware.devices as single source of truth
**Date:** v1.5.0
**Context:** UI tabs and Reanimator exporter were reading from different sub-fields of
`AnalysisResult`, causing potential drift.
**Decision:** Introduce `hardware.devices` as the canonical inventory repository.
All UI tabs and all exporters must read exclusively from this repository.
**Consequences:**
- Any UI vs Reanimator discrepancy is classified as a bug, not a "known difference".
- Deduplication logic runs once in the repository builder (serial → bdf → distinct).
- New hardware attributes must be added to canonical schema first, then mapped to consumers.
---
## ADL-005 — No hardcoded PCI model strings; use pci.ids
**Date:** v1.5.0
**Context:** NVIDIA and other vendors release new GPU models frequently; hardcoded maps
required code changes for each new model ID.
**Decision:** Use the `pciutils/pciids` database (git submodule, embedded at build time).
PCI vendor/device ID → human-readable model name via lookup.
**Consequences:**
- New GPU models can be supported by updating `pci.ids` without code changes.
- `make build` auto-syncs `pci.ids` from submodule before compilation.
- External override via `LOGPILE_PCI_IDS_PATH` env var.
---
## ADL-006 — Reanimator export uses canonical hardware.devices (not raw sub-fields)
**Date:** v1.5.0
**Context:** Early Reanimator exporter read from `Hardware.GPUs`, `Hardware.NICs`, etc.
directly, diverging from UI data.
**Decision:** Reanimator exporter must use `hardware.devices` — the same source as the UI.
Exporter groups/filters canonical records by section; does not rebuild from sub-fields.
**Consequences:**
- Guarantees UI and export consistency.
- Exporter code is simpler — mainly a filter+map, not a data reconstruction.
---
## ADL-007 — Documentation language is English
**Date:** 2026-02-20
**Context:** Codebase documentation was mixed Russian/English, reducing clarity for
international contributors and AI assistants.
**Decision:** All maintained project documentation (`docs/bible/`, `README.md`,
`CLAUDE.md`, and new technical docs) must be written in English.
**Consequences:**
- Bible is authoritative in English.
- AI assistants get consistent, unambiguous context.
---
## ADL-008 — Bible is the single source of truth for architecture docs
**Date:** 2026-02-23
**Context:** Architecture information was duplicated across `README.md`, `CLAUDE.md`,
and the Bible, creating drift risk and stale guidance for humans and AI agents.
**Decision:** Keep architecture and technical design documentation only in `docs/bible/`.
Top-level `README.md` and `CLAUDE.md` must remain minimal pointers/instructions.
**Consequences:**
- Reduces documentation drift and duplicate updates.
- AI assistants are directed to one authoritative source before making changes.
- Documentation updates that affect architecture must include Bible changes (and ADL entries when significant).
---
## ADL-009 — Redfish analysis is performed from raw snapshot replay (unified tunnel)
**Date:** 2026-02-24
**Context:** Live Redfish collection and raw export re-analysis used different parsing paths,
which caused drift and made bug fixes difficult to validate consistently.
**Decision:** Redfish live collection must produce a `raw_payloads.redfish_tree` snapshot first,
then run the same replay analyzer used for imported raw exports.
**Consequences:**
- Same `redfish_tree` input produces the same parsed result in live and offline modes.
- Debugging parser issues can be done against exported raw bundles without live BMC access.
- Snapshot completeness becomes critical; collector seeds/limits are part of analyzer correctness.
---
## ADL-010 — Raw export is a self-contained re-analysis package (not a final result dump)
**Date:** 2026-02-24
**Context:** Exporting only normalized `AnalysisResult` loses raw source fidelity and prevents
future parser improvements from being applied to already collected data.
**Decision:** `Export Raw Data` produces a self-contained raw package (JSON or ZIP bundle)
that the application can reopen and re-analyze. Parsed data in the package is optional and not
the source of truth on import.
**Consequences:**
- Re-opening an export always re-runs analysis from raw source (`redfish_tree` or uploaded file bytes).
- Raw bundles include collection context and diagnostics for debugging (`collect.log`, `parser_fields.json`).
- Endpoint compatibility is preserved (`/api/export/json`) while actual payload format may be a bundle.
---
## ADL-011 — Redfish snapshot crawler is bounded, prioritized, and failure-tolerant
**Date:** 2026-02-24
**Context:** Full Redfish trees on modern GPU systems are large, noisy, and contain many
vendor-specific or non-fetchable links. Unbounded crawling and naive queue design caused hangs
and incomplete snapshots.
**Decision:** Use a bounded snapshot crawler with:
- explicit document cap (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
- priority seed paths (PCIe/Fabrics/Firmware/Storage/PowerSubsystem/ThermalSubsystem)
- normalized `@odata.id` paths (strip `#fragment`)
- noisy expected error filtering (404/405/410/501 hidden from UI)
- queue capacity sized to crawl cap to avoid producer/consumer deadlock
**Consequences:**
- Snapshot collection remains stable on large BMC trees.
- Most high-value inventory paths are reached before the cap.
- UI progress remains useful while debug logs retain low-level fetch failures.
---
## ADL-012 — Vendor-specific storage inventory probing is allowed as fallback
**Date:** 2026-02-24
**Context:** Some Supermicro BMCs expose empty standard `Storage/.../Drives` collections while
real disk inventory exists under vendor-specific `Disk.Bay` endpoints and enclosure links.
**Decision:** When standard drive collections are empty, collector/replay may probe vendor-style
`.../Drives/Disk.Bay.*` endpoints and follow `Storage.Links.Enclosures[*]` to recover physical drives.
**Consequences:**
- Higher storage inventory coverage on Supermicro HBA/HA-RAID/MRVL/NVMe backplane implementations.
- Replay must mirror the same probing behavior to preserve deterministic results.
- Probing remains bounded (finite candidate set) to avoid runaway requests.
---
## ADL-013 — PowerSubsystem is preferred over legacy Power on newer Redfish implementations
**Date:** 2026-02-24
**Context:** X14+/newer Redfish implementations increasingly expose authoritative PSU data in
`PowerSubsystem/PowerSupplies`, while legacy `/Power` may be incomplete or schema-shifted.
**Decision:** Prefer `Chassis/*/PowerSubsystem/PowerSupplies` as the primary PSU source and use
legacy `Chassis/*/Power` as fallback.
**Consequences:**
- Better compatibility with newer BMC firmware generations.
- Legacy systems remain supported without special-case collector selection.
- Snapshot priority seeds must include `PowerSubsystem` resources.
---
## ADL-014 — Threshold logic lives on the server; UI reflects status only
**Date:** 2026-02-24
**Context:** Duplicating threshold math in frontend and backend creates drift and inconsistent
highlighting (e.g. PSU mains voltage range checks).
**Decision:** Business threshold evaluation (e.g. PSU voltage nominal range) must be computed on
the server; frontend only renders status/flags returned by the API.
**Consequences:**
- Single source of truth for threshold policies.
- UI can evolve visually without re-implementing domain logic.
- API payloads may carry richer status semantics over time.
---
## ADL-015 — Supermicro crashdump archive parser removed from active registry
**Date:** 2026-03-01
**Context:** The Supermicro crashdump parser (`SMC Crash Dump Parser`) produced low-value
results for current workflows and was explicitly rejected as a supported archive path.
**Decision:** Remove `supermicro` vendor parser from active registration and project source.
Do not include it in `/api/parsers` output or parser documentation matrix.
**Consequences:**
- Supermicro crashdump archives (`CDump.txt` format) are no longer parsed by a dedicated vendor parser.
- Such archives fall back to other matching parsers (typically `generic`) unless a new replacement parser is added.
- Reintroduction requires a new parser package and an explicit registry import in `vendors/vendors.go`.
---
## ADL-016 — Device-bound firmware must not appear in hardware.firmware
**Date:** 2026-03-01
**Context:** Dell TSR `DCIM_SoftwareIdentity` lists firmware for every component (NICs,
PSUs, disks, backplanes) in addition to system-level firmware. Naively importing all entries
into `Hardware.Firmware` caused device firmware to appear twice in Reanimator: once in the
device's own record and again in the top-level firmware list.
**Decision:**
- `Hardware.Firmware` contains only system-level firmware (BIOS, BMC/iDRAC, CPLD,
Lifecycle Controller, storage controllers, BOSS).
- Device-bound entries (NIC, PSU, Disk, Backplane, GPU) must not be added to
`Hardware.Firmware`.
- Parsers must store the FQDD (or equivalent slot identifier) in `FirmwareInfo.Description`
so the Reanimator exporter can filter by FQDD prefix.
- The exporter's `isDeviceBoundFirmwareFQDD()` function performs this filter.
**Consequences:**
- Any new parser that ingests a per-device firmware inventory must follow the same rule.
- Device firmware is accessible only via the device's own record, not the firmware list.
---
## ADL-017 — Vendor-embedded MAC addresses must be stripped from model name fields
**Date:** 2026-03-01
**Context:** Dell TSR embeds MAC addresses directly in `ProductName` and `ElementName`
fields (e.g. `"NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08"`).
This caused model names to contain MAC addresses in NIC model, NIC firmware device name,
and potentially other fields.
**Decision:** Strip any ` - XX:XX:XX:XX:XX:XX` suffix from all model/name string fields
at parse time before storing in any model struct. Use the regex
`\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$`.
**Consequences:**
- Model names are clean and consistent across all devices.
- All parsers must apply this stripping to any field used as a device name or model.
- Confirmed affected fields in Dell: `DCIM_NICView.ProductName`, `DCIM_SoftwareIdentity.ElementName`.
---
<!-- Add new decisions below this line using the format above -->

59
bible-local/README.md Normal file
View File

@@ -0,0 +1,59 @@
# LOGPile Bible
> **Documentation language:** English only. All maintained project documentation must be written in English.
>
> **Architectural decisions:** Every significant architectural decision **must** be recorded in
> [`10-decisions.md`](10-decisions.md) before or alongside the code change.
>
> **Single source of truth:** Architecture and technical design documentation belongs in `docs/bible/`.
> Keep `README.md` and `CLAUDE.md` minimal to avoid duplicate documentation.
This directory is the single source of truth for LOGPile's architecture, design, and integration contracts.
It is structured so that both humans and AI assistants can navigate it quickly.
---
## Reading Map (Hierarchical)
### 1. Foundations (read first)
| File | What it covers |
|------|----------------|
| [01-overview.md](01-overview.md) | Product purpose, operating modes, scope |
| [02-architecture.md](02-architecture.md) | Runtime structure, control flow, in-memory state |
| [04-data-models.md](04-data-models.md) | Core contracts (`AnalysisResult`, canonical `hardware.devices`) |
### 2. Runtime Interfaces
| File | What it covers |
|------|----------------|
| [03-api.md](03-api.md) | HTTP API contracts and endpoint behavior |
| [05-collectors.md](05-collectors.md) | Live collection connectors (Redfish, IPMI mock) |
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor parsers |
| [07-exporters.md](07-exporters.md) | CSV / JSON / Reanimator exports and integration mapping |
### 3. Delivery & Quality
| File | What it covers |
|------|----------------|
| [08-build-release.md](08-build-release.md) | Build, packaging, release workflow |
| [09-testing.md](09-testing.md) | Testing expectations and verification guidance |
### 4. Governance (always current)
| File | What it covers |
|------|----------------|
| [10-decisions.md](10-decisions.md) | Architectural Decision Log (ADL) |
---
## Quick orientation for AI assistants
- Read order for most changes: `01``02``04` → relevant interface doc(s) → `10`
- Entry point: `cmd/logpile/main.go`
- HTTP server: `internal/server/` — handlers in `handlers.go`, routes in `server.go`
- Data contracts: `internal/models/` — never break `AnalysisResult` JSON shape
- Frontend contract: `web/static/js/app.js` — keep API responses stable
- Canonical inventory: `hardware.devices` in `AnalysisResult` — source of truth for UI and exports
- Parser registry: `internal/parser/vendors/``init()` auto-registration pattern
- Collector registry: `internal/collector/registry.go`

View File

@@ -40,6 +40,8 @@ func main() {
cfg := server.Config{
Port: *port,
PreloadFile: *file,
AppVersion: version,
AppCommit: commit,
}
srv := server.New(cfg)

View File

@@ -0,0 +1,28 @@
# Test Server Collection Memory
Keep this table updated after each test-server run.
Definition:
- `Collection Time` = total Redfish collection duration from `collect.log`.
- `Speed` = `Documents / seconds`.
- `Metrics Collected` = sum of `Counts` fields (`cpus + memory + storage + pcie + gpus + nics + psus + firmware`).
- `n/a` means the log does not contain enough timestamp metadata to calculate duration/speed.
## Server Model: `NF5688M7`
| Date (UTC) | App Version | Collection Time | Documents | Speed | Metrics Collected | Notes |
|---|---|---:|---:|---:|---:|---|
| 2026-02-28 | `v1.7.1-12-g612058e` (`612058e`) | 10m10s (610s) | 228 | 0.37 docs/s | 98 | 2026-02-28 (SERVER MODEL) - 23E100043.zip |
| 2026-02-28 | `v1.7.1-11-ge0146ad` (`e0146ad`) | 9m36s (576s) | 138 | 0.24 docs/s | 110 | 2026-02-28 (SERVER MODEL) - 23E100042.zip |
| 2026-02-28 | `v1.7.1-10-g9a30705` (`9a30705`) | 20m47s (1247s) | 106 | 0.09 docs/s | 97 | 2026-02-28 (SERVER MODEL) - 23E100053.zip |
| 2026-02-28 | `v1.7.1` (`6c19a58`) | 15m08s (908s) | 184 | 0.20 docs/s | 96 | 2026-02-28 (DDR5 DIMM) - 23E100051.zip |
| 2026-02-28 | `v1.7.0` (`ddab93a`) | n/a | 193 | n/a | 61 | 2026-02-28 (NULL) - 23E100051.zip |
| 2026-02-28 | `v1.7.0` (`ddab93a`) | n/a | 291 | n/a | 61 | 2026-02-28 (NULL) - 23E100206.zip |
## Server Model: `KR1280-X2-A0-R0-00`
| Date (UTC) | App Version | Collection Time | Documents | Speed | Metrics Collected | Notes |
|---|---|---:|---:|---:|---:|---|
| 2026-02-28 | `v1.7.1-12-g612058e` (`612058e`) | 6m15s (375s) | 185 | 0.49 docs/s | 46 | 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657.zip |
| 2026-02-28 | `v1.7.1-9-g8dbbec3-dirty` (`8dbbec3`) | 6m16s (376s) | 165 | 0.44 docs/s | 46 | 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657-2.zip |
| 2026-02-28 | `v1.7.1-7-gc52fea2` (`c52fea2`) | 10m51s (651s) | 227 | 0.35 docs/s | 40 | 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657 copy.zip |

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,40 @@
package collector
import (
"strings"
"testing"
)
func TestParseNIC_ResolvesModelFromPCIIDs(t *testing.T) {
doc := map[string]interface{}{
"Id": "NIC1",
"VendorId": "0x8086",
"DeviceId": "0x1521",
"Model": "0x1521",
}
nic := parseNIC(doc)
if nic.Model == "" {
t.Fatalf("expected model resolved from pci.ids")
}
if !strings.Contains(strings.ToUpper(nic.Model), "I350") {
t.Fatalf("expected I350 in model, got %q", nic.Model)
}
}
func TestParsePCIeFunction_ResolvesDeviceClassFromPCIIDs(t *testing.T) {
doc := map[string]interface{}{
"Id": "PCIE1",
"VendorId": "0x9005",
"DeviceId": "0x028f",
"ClassCode": "0x010700",
}
dev := parsePCIeFunction(doc, 0)
if dev.DeviceClass == "" || strings.EqualFold(dev.DeviceClass, "PCIe device") {
t.Fatalf("expected device class resolved from pci.ids, got %q", dev.DeviceClass)
}
if strings.HasPrefix(strings.ToLower(strings.TrimSpace(dev.DeviceClass)), "0x") {
t.Fatalf("expected resolved name instead of raw hex, got %q", dev.DeviceClass)
}
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -3,9 +3,8 @@ package exporter
import (
"encoding/csv"
"encoding/json"
"fmt"
"io"
"text/tabwriter"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
@@ -36,7 +35,7 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
// FRU data
for _, fru := range e.result.FRU {
if fru.SerialNumber == "" {
if !hasUsableSerial(fru.SerialNumber) {
continue
}
name := fru.ProductName
@@ -55,9 +54,36 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
// Hardware data
if e.result.Hardware != nil {
// Board
if hasUsableSerial(e.result.Hardware.BoardInfo.SerialNumber) {
if err := writer.Write([]string{
e.result.Hardware.BoardInfo.ProductName,
strings.TrimSpace(e.result.Hardware.BoardInfo.SerialNumber),
e.result.Hardware.BoardInfo.Manufacturer,
"Board",
}); err != nil {
return err
}
}
// CPUs
for _, cpu := range e.result.Hardware.CPUs {
if !hasUsableSerial(cpu.SerialNumber) {
continue
}
if err := writer.Write([]string{
cpu.Model,
strings.TrimSpace(cpu.SerialNumber),
"",
"CPU",
}); err != nil {
return err
}
}
// Memory
for _, mem := range e.result.Hardware.Memory {
if mem.SerialNumber == "" {
if !hasUsableSerial(mem.SerialNumber) {
continue
}
location := mem.Location
@@ -66,7 +92,7 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
}
if err := writer.Write([]string{
mem.PartNumber,
mem.SerialNumber,
strings.TrimSpace(mem.SerialNumber),
mem.Manufacturer,
location,
}); err != nil {
@@ -76,12 +102,12 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
// Storage
for _, stor := range e.result.Hardware.Storage {
if stor.SerialNumber == "" {
if !hasUsableSerial(stor.SerialNumber) {
continue
}
if err := writer.Write([]string{
stor.Model,
stor.SerialNumber,
strings.TrimSpace(stor.SerialNumber),
stor.Manufacturer,
stor.Slot,
}); err != nil {
@@ -89,20 +115,88 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
}
}
// GPUs
for _, gpu := range e.result.Hardware.GPUs {
if !hasUsableSerial(gpu.SerialNumber) {
continue
}
component := gpu.Model
if component == "" {
component = "GPU"
}
if err := writer.Write([]string{
component,
strings.TrimSpace(gpu.SerialNumber),
gpu.Manufacturer,
gpu.Slot,
}); err != nil {
return err
}
}
// PCIe devices
for _, pcie := range e.result.Hardware.PCIeDevices {
if pcie.SerialNumber == "" {
if !hasUsableSerial(pcie.SerialNumber) {
continue
}
if err := writer.Write([]string{
pcie.DeviceClass,
pcie.SerialNumber,
strings.TrimSpace(pcie.SerialNumber),
pcie.Manufacturer,
pcie.Slot,
}); err != nil {
return err
}
}
// Network adapters
for _, nic := range e.result.Hardware.NetworkAdapters {
if !hasUsableSerial(nic.SerialNumber) {
continue
}
location := nic.Location
if location == "" {
location = nic.Slot
}
if err := writer.Write([]string{
nic.Model,
strings.TrimSpace(nic.SerialNumber),
nic.Vendor,
location,
}); err != nil {
return err
}
}
// Legacy network cards
for _, nic := range e.result.Hardware.NetworkCards {
if !hasUsableSerial(nic.SerialNumber) {
continue
}
if err := writer.Write([]string{
nic.Model,
strings.TrimSpace(nic.SerialNumber),
"",
"Network",
}); err != nil {
return err
}
}
// Power supplies
for _, psu := range e.result.Hardware.PowerSupply {
if !hasUsableSerial(psu.SerialNumber) {
continue
}
if err := writer.Write([]string{
psu.Model,
strings.TrimSpace(psu.SerialNumber),
psu.Vendor,
psu.Slot,
}); err != nil {
return err
}
}
}
return nil
@@ -115,220 +209,15 @@ func (e *Exporter) ExportJSON(w io.Writer) error {
return encoder.Encode(e.result)
}
// ExportTXT exports a human-readable text report
func (e *Exporter) ExportTXT(w io.Writer) error {
fmt.Fprintln(w, "LOGPile Analysis Report - mchus.pro")
fmt.Fprintln(w, "====================================")
fmt.Fprintln(w)
if e.result == nil {
fmt.Fprintln(w, "No data loaded.")
return nil
func hasUsableSerial(serial string) bool {
s := strings.TrimSpace(serial)
if s == "" {
return false
}
fmt.Fprintf(w, "File:\t%s\n", e.result.Filename)
fmt.Fprintf(w, "Source:\t%s\n", e.result.SourceType)
fmt.Fprintf(w, "Protocol:\t%s\n", e.result.Protocol)
fmt.Fprintf(w, "Target:\t%s\n", e.result.TargetHost)
fmt.Fprintln(w)
// Server model and serial number
if e.result.Hardware != nil && e.result.Hardware.BoardInfo.ProductName != "" {
fmt.Fprintf(w, "Server Model:\t%s\n", e.result.Hardware.BoardInfo.ProductName)
fmt.Fprintf(w, "Serial Number:\t%s\n", e.result.Hardware.BoardInfo.SerialNumber)
switch strings.ToUpper(s) {
case "N/A", "NA", "NONE", "NULL", "UNKNOWN", "-":
return false
default:
return true
}
fmt.Fprintln(w)
// Hardware summary
if e.result.Hardware != nil {
hw := e.result.Hardware
// Firmware tab
if len(hw.Firmware) > 0 {
fmt.Fprintln(w, "FIRMWARE VERSIONS")
fmt.Fprintln(w, "-----------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Component\tVersion\tBuild Time")
for _, fw := range hw.Firmware {
fmt.Fprintf(tw, "%s\t%s\t%s\n", fw.DeviceName, fw.Version, fw.BuildTime)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// CPU tab
if len(hw.CPUs) > 0 {
fmt.Fprintln(w, "PROCESSORS")
fmt.Fprintln(w, "----------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Socket\tModel\tCores\tThreads\tFreq MHz\tTurbo MHz\tTDP W\tPPIN/SN")
for _, cpu := range hw.CPUs {
id := cpu.SerialNumber
if id == "" {
id = cpu.PPIN
}
fmt.Fprintf(tw, "CPU%d\t%s\t%d\t%d\t%d\t%d\t%d\t%s\n",
cpu.Socket, cpu.Model, cpu.Cores, cpu.Threads, cpu.FrequencyMHz, cpu.MaxFreqMHz, cpu.TDP, id)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Memory tab
if len(hw.Memory) > 0 {
fmt.Fprintln(w, "MEMORY")
fmt.Fprintln(w, "------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tPresent\tSize MB\tType\tSpeed MHz\tVendor\tModel/PN\tSerial\tStatus")
for _, mem := range hw.Memory {
location := mem.Location
if location == "" {
location = mem.Slot
}
fmt.Fprintf(tw, "%s\t%t\t%d\t%s\t%d\t%s\t%s\t%s\t%s\n",
location, mem.Present, mem.SizeMB, mem.Type, mem.CurrentSpeedMHz, mem.Manufacturer, mem.PartNumber, mem.SerialNumber, mem.Status)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Power tab
if len(hw.PowerSupply) > 0 {
fmt.Fprintln(w, "POWER SUPPLIES")
fmt.Fprintln(w, "--------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tPresent\tVendor\tModel\tWattage W\tInput W\tOutput W\tInput V\tTemp C\tStatus\tSerial")
for _, psu := range hw.PowerSupply {
fmt.Fprintf(tw, "%s\t%t\t%s\t%s\t%d\t%d\t%d\t%.0f\t%d\t%s\t%s\n",
psu.Slot, psu.Present, psu.Vendor, psu.Model, psu.WattageW, psu.InputPowerW, psu.OutputPowerW, psu.InputVoltage, psu.TemperatureC, psu.Status, psu.SerialNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Storage tab
if len(hw.Storage) > 0 {
fmt.Fprintln(w, "STORAGE")
fmt.Fprintln(w, "-------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tPresent\tType\tInterface\tModel\tSize GB\tVendor\tFirmware\tSerial")
for _, stor := range hw.Storage {
fmt.Fprintf(tw, "%s\t%t\t%s\t%s\t%s\t%d\t%s\t%s\t%s\n",
stor.Slot, stor.Present, stor.Type, stor.Interface, stor.Model, stor.SizeGB, stor.Manufacturer, stor.Firmware, stor.SerialNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// GPU tab
if len(hw.GPUs) > 0 {
fmt.Fprintln(w, "GPUS")
fmt.Fprintln(w, "----")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tModel\tVendor\tBDF\tPCIe\tSerial\tStatus")
for _, gpu := range hw.GPUs {
link := fmt.Sprintf("x%d %s", gpu.CurrentLinkWidth, gpu.CurrentLinkSpeed)
if gpu.MaxLinkWidth > 0 || gpu.MaxLinkSpeed != "" {
link = fmt.Sprintf("%s / x%d %s", link, gpu.MaxLinkWidth, gpu.MaxLinkSpeed)
}
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%s\t%s\t%s\n",
gpu.Slot, gpu.Model, gpu.Manufacturer, gpu.BDF, link, gpu.SerialNumber, gpu.Status)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Network tab
if len(hw.NetworkAdapters) > 0 {
fmt.Fprintln(w, "NETWORK ADAPTERS")
fmt.Fprintln(w, "----------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tLocation\tModel\tVendor\tPorts\tType\tStatus\tSerial")
for _, nic := range hw.NetworkAdapters {
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%d\t%s\t%s\t%s\n",
nic.Slot, nic.Location, nic.Model, nic.Vendor, nic.PortCount, nic.PortType, nic.Status, nic.SerialNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Device inventory tab
if len(hw.PCIeDevices) > 0 {
fmt.Fprintln(w, "PCIE DEVICES")
fmt.Fprintln(w, "------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tBDF\tClass\tVendor\tVID:DID\tLink\tSerial")
for _, pcie := range hw.PCIeDevices {
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%04x:%04x\tx%d %s / x%d %s\t%s\n",
pcie.Slot, pcie.BDF, pcie.DeviceClass, pcie.Manufacturer, pcie.VendorID, pcie.DeviceID,
pcie.LinkWidth, pcie.LinkSpeed, pcie.MaxLinkWidth, pcie.MaxLinkSpeed, pcie.SerialNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
}
// Sensors tab
if len(e.result.Sensors) > 0 {
fmt.Fprintln(w, "SENSOR READINGS")
fmt.Fprintln(w, "---------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Type\tName\tValue\tUnit\tRaw\tStatus")
for _, s := range e.result.Sensors {
fmt.Fprintf(tw, "%s\t%s\t%.0f\t%s\t%s\t%s\n", s.Type, s.Name, s.Value, s.Unit, s.RawValue, s.Status)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Serials/FRU tab
if len(e.result.FRU) > 0 {
fmt.Fprintln(w, "FRU COMPONENTS")
fmt.Fprintln(w, "--------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Description\tManufacturer\tProduct\tSerial\tPart Number")
for _, fru := range e.result.FRU {
name := fru.ProductName
if name == "" {
name = fru.Description
}
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%s\n", fru.Description, fru.Manufacturer, name, fru.SerialNumber, fru.PartNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Events tab
fmt.Fprintf(w, "EVENTS: %d total\n", len(e.result.Events))
if len(e.result.Events) > 0 {
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Time\tSeverity\tSource\tType\tName\tDescription")
for _, ev := range e.result.Events {
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%s\t%s\n",
ev.Timestamp.Format("2006-01-02 15:04:05"), ev.Severity, ev.Source, ev.SensorType, ev.SensorName, ev.Description)
}
_ = tw.Flush()
}
var critical, warning, info int
for _, ev := range e.result.Events {
switch ev.Severity {
case models.SeverityCritical:
critical++
case models.SeverityWarning:
warning++
case models.SeverityInfo:
info++
}
}
fmt.Fprintf(w, " Critical: %d\n", critical)
fmt.Fprintf(w, " Warning: %d\n", warning)
fmt.Fprintf(w, " Info: %d\n", info)
// Footer
fmt.Fprintln(w)
fmt.Fprintln(w, "------------------------------------")
fmt.Fprintln(w, "Generated by LOGPile - mchus.pro")
fmt.Fprintln(w, "https://git.mchus.pro/mchus/logpile")
return nil
}

View File

@@ -0,0 +1,79 @@
package exporter
import (
"bytes"
"encoding/csv"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestExportCSV_IncludesAllComponentTypesWithUsableSerials(t *testing.T) {
result := &models.AnalysisResult{
FRU: []models.FRUInfo{
{ProductName: "FRU Board", SerialNumber: "FRU-001", Manufacturer: "ACME"},
},
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
ProductName: "X12",
SerialNumber: "BOARD-001",
Manufacturer: "Supermicro",
},
CPUs: []models.CPU{
{Socket: 0, Model: "Xeon", SerialNumber: "CPU-001"},
},
Memory: []models.MemoryDIMM{
{Slot: "DIMM0", PartNumber: "MEM-PN", SerialNumber: "MEM-001", Manufacturer: "Samsung"},
},
Storage: []models.Storage{
{Slot: "U.2-1", Model: "PM9A3", SerialNumber: "SSD-001", Manufacturer: "Samsung"},
},
GPUs: []models.GPU{
{Slot: "GPU1", Model: "H200", SerialNumber: "GPU-001", Manufacturer: "NVIDIA"},
},
PCIeDevices: []models.PCIeDevice{
{Slot: "PCIe1", DeviceClass: "NVSwitch", SerialNumber: "PCIE-001", Manufacturer: "NVIDIA"},
},
NetworkAdapters: []models.NetworkAdapter{
{Slot: "Slot 17", Location: "#CPU0_PCIE4", Model: "I350", SerialNumber: "NIC-001", Vendor: "Intel"},
{Slot: "Slot 18", Model: "skip-na", SerialNumber: "N/A", Vendor: "Intel"},
},
NetworkCards: []models.NIC{
{Model: "Legacy NIC", SerialNumber: "LNIC-001"},
},
PowerSupply: []models.PSU{
{Slot: "PSU0", Model: "GW-CRPS3000LW", SerialNumber: "PSU-001", Vendor: "Great Wall"},
},
},
}
var buf bytes.Buffer
if err := New(result).ExportCSV(&buf); err != nil {
t.Fatalf("ExportCSV failed: %v", err)
}
rows, err := csv.NewReader(bytes.NewReader(buf.Bytes())).ReadAll()
if err != nil {
t.Fatalf("read csv: %v", err)
}
if len(rows) < 2 {
t.Fatalf("expected data rows, got %d", len(rows))
}
serials := make(map[string]bool)
for _, row := range rows[1:] {
if len(row) > 1 {
serials[row[1]] = true
}
}
want := []string{"FRU-001", "BOARD-001", "CPU-001", "MEM-001", "SSD-001", "GPU-001", "PCIE-001", "NIC-001", "LNIC-001", "PSU-001"}
for _, sn := range want {
if !serials[sn] {
t.Fatalf("expected serial %s in csv export", sn)
}
}
if serials["N/A"] {
t.Fatalf("did not expect unusable serial N/A in export")
}
}

View File

@@ -0,0 +1,164 @@
package exporter
import (
"encoding/json"
"os"
"path/filepath"
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
// TestGenerateReanimatorExample generates an example reanimator.json file
// This test is marked as skipped by default - run with: go test -v -run TestGenerateReanimatorExample
func TestGenerateReanimatorExample(t *testing.T) {
t.Skip("Skip by default - run manually to generate example")
// Create realistic test data matching import-example-full.json structure
result := &models.AnalysisResult{
Filename: "redfish://10.10.10.103",
SourceType: "api",
Protocol: "redfish",
TargetHost: "10.10.10.103",
CollectedAt: time.Date(2026, 2, 10, 15, 30, 0, 0, time.UTC),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
Manufacturer: "Supermicro",
ProductName: "X12DPG-QT6",
SerialNumber: "21D634101",
PartNumber: "X12DPG-QT6-REV1.01",
UUID: "d7ef2fe5-2fd0-11f0-910a-346f11040868",
},
Firmware: []models.FirmwareInfo{
{DeviceName: "BIOS", Version: "06.08.05"},
{DeviceName: "BMC", Version: "5.17.00"},
{DeviceName: "CPLD", Version: "01.02.03"},
},
CPUs: []models.CPU{
{
Socket: 0,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
{
Socket: 1,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
},
Memory: []models.MemoryDIMM{
{
Slot: "CPU0_C0D0",
Location: "CPU0_C0D0",
Present: true,
SizeMB: 32768,
Type: "DDR5",
MaxSpeedMHz: 4800,
CurrentSpeedMHz: 4800,
Manufacturer: "Hynix",
SerialNumber: "80AD032419E17CEEC1",
PartNumber: "HMCG88AGBRA191N",
Status: "OK",
},
{
Slot: "CPU1_C0D0",
Location: "CPU1_C0D0",
Present: true,
SizeMB: 32768,
Type: "DDR5",
MaxSpeedMHz: 4800,
CurrentSpeedMHz: 4800,
Manufacturer: "Hynix",
SerialNumber: "80AD032419E17D6FBA",
PartNumber: "HMCG88AGBRA191N",
Status: "OK",
},
},
Storage: []models.Storage{
{
Slot: "OB01",
Type: "NVMe",
Model: "INTEL SSDPF2KX076T1",
SizeGB: 7680,
SerialNumber: "BTAX41900GF87P6DGN",
Manufacturer: "Intel",
Firmware: "9CV10510",
Interface: "NVMe",
Present: true,
},
{
Slot: "OB02",
Type: "NVMe",
Model: "INTEL SSDPF2KX076T1",
SizeGB: 7680,
SerialNumber: "BTAX41900BEG7P6DGN",
Manufacturer: "Intel",
Firmware: "9CV10510",
Interface: "NVMe",
Present: true,
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "PCIeCard1",
VendorID: 32902,
DeviceID: 2912,
BDF: "0000:18:00.0",
DeviceClass: "MassStorageController",
Manufacturer: "Intel",
PartNumber: "RAID Controller",
SerialNumber: "RAID-001-12345",
LinkWidth: 8,
LinkSpeed: "Gen3",
MaxLinkWidth: 8,
MaxLinkSpeed: "Gen3",
},
},
PowerSupply: []models.PSU{
{
Slot: "0",
Present: true,
Model: "GW-CRPS3000LW",
Vendor: "Great Wall",
WattageW: 3000,
SerialNumber: "2P06C102610",
PartNumber: "V0310C9000000000",
Firmware: "00.03.05",
Status: "OK",
InputType: "ACWideRange",
InputPowerW: 137,
OutputPowerW: 104,
InputVoltage: 215.25,
},
},
},
}
// Convert to Reanimator format
reanimator, err := ConvertToReanimator(result)
if err != nil {
t.Fatalf("ConvertToReanimator failed: %v", err)
}
// Marshal to JSON with indentation
jsonData, err := json.MarshalIndent(reanimator, "", " ")
if err != nil {
t.Fatalf("Failed to marshal JSON: %v", err)
}
// Write to example file
examplePath := filepath.Join("../../example/docs", "export-example-logpile.json")
if err := os.WriteFile(examplePath, jsonData, 0644); err != nil {
t.Fatalf("Failed to write example file: %v", err)
}
t.Logf("Generated example file: %s", examplePath)
t.Logf("JSON length: %d bytes", len(jsonData))
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,883 @@
package exporter
import (
"encoding/json"
"strings"
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestConvertToReanimator(t *testing.T) {
tests := []struct {
name string
input *models.AnalysisResult
wantErr bool
errMsg string
}{
{
name: "nil result",
input: nil,
wantErr: true,
errMsg: "no data available",
},
{
name: "no hardware",
input: &models.AnalysisResult{
Filename: "test.json",
},
wantErr: true,
errMsg: "no hardware data available",
},
{
name: "no board serial",
input: &models.AnalysisResult{
Filename: "test.json",
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{},
},
},
wantErr: true,
errMsg: "board serial_number is required",
},
{
name: "valid minimal data",
input: &models.AnalysisResult{
Filename: "test.json",
SourceType: "api",
Protocol: "redfish",
TargetHost: "10.10.10.10",
CollectedAt: time.Date(2026, 2, 10, 15, 30, 0, 0, time.UTC),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
Manufacturer: "Supermicro",
ProductName: "X12DPG-QT6",
SerialNumber: "TEST123",
},
},
},
wantErr: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result, err := ConvertToReanimator(tt.input)
if tt.wantErr {
if err == nil {
t.Errorf("expected error containing %q, got nil", tt.errMsg)
}
return
}
if err != nil {
t.Errorf("unexpected error: %v", err)
return
}
if result == nil {
t.Error("expected non-nil result")
return
}
if result.Hardware.Board.SerialNumber != tt.input.Hardware.BoardInfo.SerialNumber {
t.Errorf("board serial mismatch: got %q, want %q",
result.Hardware.Board.SerialNumber,
tt.input.Hardware.BoardInfo.SerialNumber)
}
})
}
}
func TestInferCPUManufacturer(t *testing.T) {
tests := []struct {
model string
want string
}{
{"INTEL(R) XEON(R) GOLD 6530", "Intel"},
{"Intel Core i9-12900K", "Intel"},
{"AMD EPYC 7763", "AMD"},
{"AMD Ryzen 9 5950X", "AMD"},
{"ARM Cortex-A78", "ARM"},
{"Ampere Altra Max", "Ampere"},
{"Unknown CPU Model", ""},
}
for _, tt := range tests {
t.Run(tt.model, func(t *testing.T) {
got := inferCPUManufacturer(tt.model)
if got != tt.want {
t.Errorf("inferCPUManufacturer(%q) = %q, want %q", tt.model, got, tt.want)
}
})
}
}
func TestNormalizedSerial(t *testing.T) {
tests := []struct {
name string
in string
want string
}{
{
name: "empty",
in: "",
want: "",
},
{
name: "n_a",
in: "N/A",
want: "",
},
{
name: "unknown",
in: "unknown",
want: "",
},
{
name: "normal",
in: "SN123",
want: "SN123",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := normalizedSerial(tt.in)
if got != tt.want {
t.Errorf("normalizedSerial() = %q, want %q", got, tt.want)
}
})
}
}
func TestInferStorageStatus(t *testing.T) {
tests := []struct {
name string
stor models.Storage
want string
}{
{
name: "present",
stor: models.Storage{
Present: true,
},
want: "Unknown",
},
{
name: "not present",
stor: models.Storage{
Present: false,
},
want: "Unknown",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := inferStorageStatus(tt.stor)
if got != tt.want {
t.Errorf("inferStorageStatus() = %q, want %q", got, tt.want)
}
})
}
}
func TestNormalizeStatus_PassFail(t *testing.T) {
if got := normalizeStatus("PASS", false); got != "OK" {
t.Fatalf("expected PASS -> OK, got %q", got)
}
if got := normalizeStatus("FAIL", false); got != "Critical" {
t.Fatalf("expected FAIL -> Critical, got %q", got)
}
}
func TestConvertCPUs(t *testing.T) {
cpus := []models.CPU{
{
Socket: 0,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
{
Socket: 1,
Model: "AMD EPYC 7763",
Cores: 64,
Threads: 128,
FrequencyMHz: 2450,
MaxFreqMHz: 3500,
},
}
result := convertCPUs(cpus, "2026-02-10T15:30:00Z")
if len(result) != 2 {
t.Fatalf("expected 2 CPUs, got %d", len(result))
}
if result[0].Manufacturer != "Intel" {
t.Errorf("expected Intel manufacturer for first CPU, got %q", result[0].Manufacturer)
}
if result[1].Manufacturer != "AMD" {
t.Errorf("expected AMD manufacturer for second CPU, got %q", result[1].Manufacturer)
}
if result[0].Status != "Unknown" {
t.Errorf("expected Unknown status, got %q", result[0].Status)
}
}
func TestConvertMemory(t *testing.T) {
memory := []models.MemoryDIMM{
{
Slot: "CPU0_C0D0",
Present: true,
SizeMB: 32768,
Type: "DDR5",
SerialNumber: "TEST-MEM-001",
Status: "OK",
},
{
Slot: "CPU0_C1D0",
Present: false,
},
}
result := convertMemory(memory, "2026-02-10T15:30:00Z")
if len(result) != 2 {
t.Fatalf("expected 2 memory modules, got %d", len(result))
}
if result[0].Status != "OK" {
t.Errorf("expected OK status for first module, got %q", result[0].Status)
}
if result[1].Status != "Empty" {
t.Errorf("expected Empty status for second module, got %q", result[1].Status)
}
}
func TestConvertStorage(t *testing.T) {
storage := []models.Storage{
{
Slot: "OB01",
Type: "NVMe",
Model: "INTEL SSDPF2KX076T1",
SerialNumber: "BTAX41900GF87P6DGN",
Present: true,
},
{
Slot: "OB02",
Type: "NVMe",
Model: "INTEL SSDPF2KX076T1",
SerialNumber: "", // No serial - should be skipped
Present: true,
},
}
result := convertStorage(storage, "2026-02-10T15:30:00Z")
if len(result) != 1 {
t.Fatalf("expected 1 storage device (skipped one without serial), got %d", len(result))
}
if result[0].Status != "Unknown" {
t.Errorf("expected Unknown status, got %q", result[0].Status)
}
}
func TestConvertPCIeDevices(t *testing.T) {
hw := &models.HardwareConfig{
PCIeDevices: []models.PCIeDevice{
{
Slot: "PCIeCard1",
VendorID: 32902,
DeviceID: 2912,
BDF: "0000:18:00.0",
DeviceClass: "MassStorageController",
Manufacturer: "Intel",
PartNumber: "RSP3DD080F",
SerialNumber: "RAID-001",
},
{
Slot: "PCIeCard2",
DeviceClass: "NetworkController",
Manufacturer: "Mellanox",
SerialNumber: "", // Should be generated
},
},
GPUs: []models.GPU{
{
Slot: "GPU1",
Model: "NVIDIA A100",
Manufacturer: "NVIDIA",
SerialNumber: "GPU-001",
Status: "OK",
},
},
NetworkAdapters: []models.NetworkAdapter{
{
Slot: "NIC1",
Model: "ConnectX-6",
Vendor: "Mellanox",
Present: true,
SerialNumber: "NIC-001",
},
},
}
result := convertPCIeDevices(hw, "2026-02-10T15:30:00Z")
// Should have: 2 PCIe devices + 1 GPU + 1 NIC = 4 total
if len(result) != 4 {
t.Fatalf("expected 4 PCIe devices total, got %d", len(result))
}
// Check that serial is empty for second PCIe device (no auto-generation)
if result[1].SerialNumber != "" {
t.Errorf("expected empty serial for missing device serial, got %q", result[1].SerialNumber)
}
// Check GPU was included
foundGPU := false
for _, dev := range result {
if dev.SerialNumber == "GPU-001" {
foundGPU = true
if dev.DeviceClass != "DisplayController" {
t.Errorf("expected GPU device_class DisplayController, got %q", dev.DeviceClass)
}
break
}
}
if !foundGPU {
t.Error("expected GPU to be included in PCIe devices")
}
}
func TestConvertPCIeDevices_NVSwitchWithoutSerialRemainsEmpty(t *testing.T) {
hw := &models.HardwareConfig{
Firmware: []models.FirmwareInfo{
{
DeviceName: "NVSwitch NVSWITCH1 (965-25612-0002-000)",
Version: "96.10.6D.00.01",
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "NVSWITCH1",
DeviceClass: "NVSwitch",
BDF: "0000:06:00.0",
// SerialNumber empty on purpose; should remain empty.
},
},
}
result := convertPCIeDevices(hw, "2026-02-10T15:30:00Z")
if len(result) != 1 {
t.Fatalf("expected 1 PCIe device, got %d", len(result))
}
if result[0].SerialNumber != "" {
t.Fatalf("expected empty NVSwitch serial, got %q", result[0].SerialNumber)
}
if result[0].Firmware != "96.10.6D.00.01" {
t.Fatalf("expected NVSwitch firmware 96.10.6D.00.01, got %q", result[0].Firmware)
}
}
func TestConvertPCIeDevices_SkipsDisplayControllerDuplicates(t *testing.T) {
hw := &models.HardwareConfig{
PCIeDevices: []models.PCIeDevice{
{
Slot: "#GPU0",
DeviceClass: "3D Controller",
},
},
GPUs: []models.GPU{
{
Slot: "#GPU0",
Model: "B200 180GB HBM3e",
Manufacturer: "NVIDIA",
SerialNumber: "1655024043371",
Status: "OK",
},
},
}
result := convertPCIeDevices(hw, "2026-02-10T15:30:00Z")
if len(result) != 1 {
t.Fatalf("expected only dedicated GPU record without duplicate display PCIe, got %d", len(result))
}
if result[0].DeviceClass != "DisplayController" {
t.Fatalf("expected GPU record with DisplayController class, got %q", result[0].DeviceClass)
}
if result[0].Status != "OK" {
t.Fatalf("expected GPU status OK, got %q", result[0].Status)
}
}
func TestConvertPCIeDevices_MapsGPUStatusHistory(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "#GPU6",
Model: "B200 180GB HBM3e",
Manufacturer: "NVIDIA",
SerialNumber: "1655024043204",
Status: "Critical",
StatusHistory: []models.StatusHistoryEntry{
{
Status: "Critical",
ChangedAt: time.Date(2026, 1, 12, 15, 5, 18, 0, time.UTC),
Details: "BIOS miss F_GPU6",
},
},
ErrorDescription: "BIOS miss F_GPU6",
},
},
}
result := convertPCIeDevices(hw, "2026-02-10T15:30:00Z")
if len(result) != 1 {
t.Fatalf("expected 1 converted GPU, got %d", len(result))
}
if len(result[0].StatusHistory) != 1 {
t.Fatalf("expected 1 history entry, got %d", len(result[0].StatusHistory))
}
if result[0].StatusHistory[0].ChangedAt != "2026-01-12T15:05:18Z" {
t.Fatalf("unexpected history changed_at: %q", result[0].StatusHistory[0].ChangedAt)
}
if result[0].StatusAtCollect == nil || result[0].StatusAtCollect.At != "2026-02-10T15:30:00Z" {
t.Fatalf("expected status_at_collection to be populated from collected_at")
}
}
func TestConvertPowerSupplies(t *testing.T) {
psus := []models.PSU{
{
Slot: "0",
Present: true,
Model: "GW-CRPS3000LW",
Vendor: "Great Wall",
WattageW: 3000,
SerialNumber: "PSU-001",
Status: "OK",
},
{
Slot: "1",
Present: false,
SerialNumber: "", // Not present, should be skipped
},
}
result := convertPowerSupplies(psus, "2026-02-10T15:30:00Z")
if len(result) != 1 {
t.Fatalf("expected 1 PSU (skipped empty), got %d", len(result))
}
if result[0].Status != "OK" {
t.Errorf("expected OK status, got %q", result[0].Status)
}
}
func TestConvertBoardNormalizesNULL(t *testing.T) {
board := convertBoard(models.BoardInfo{
Manufacturer: " NULL ",
ProductName: "null",
SerialNumber: "TEST123",
})
if board.Manufacturer != "" {
t.Fatalf("expected empty manufacturer, got %q", board.Manufacturer)
}
if board.ProductName != "" {
t.Fatalf("expected empty product_name, got %q", board.ProductName)
}
}
func TestSourceTypeOmittedWhenInvalidOrEmpty(t *testing.T) {
result, err := ConvertToReanimator(&models.AnalysisResult{
Filename: "redfish://10.0.0.1",
SourceType: "archive",
TargetHost: "10.0.0.1",
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{SerialNumber: "TEST123"},
},
})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
payload, err := json.Marshal(result)
if err != nil {
t.Fatalf("marshal failed: %v", err)
}
if strings.Contains(string(payload), `"source_type"`) {
t.Fatalf("expected source_type to be omitted for invalid value, got %s", string(payload))
}
}
func TestTargetHostOmittedWhenUnavailable(t *testing.T) {
result, err := ConvertToReanimator(&models.AnalysisResult{
Filename: "test.json",
SourceType: "api",
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{SerialNumber: "TEST123"},
},
})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
payload, err := json.Marshal(result)
if err != nil {
t.Fatalf("marshal failed: %v", err)
}
if strings.Contains(string(payload), `"target_host"`) {
t.Fatalf("expected target_host to be omitted when unavailable, got %s", string(payload))
}
}
func TestInferTargetHost(t *testing.T) {
tests := []struct {
name string
targetHost string
filename string
want string
}{
{
name: "explicit target host wins",
targetHost: "10.0.0.10",
filename: "redfish://10.0.0.20",
want: "10.0.0.10",
},
{
name: "hostname from URL",
filename: "redfish://10.10.10.103",
want: "10.10.10.103",
},
{
name: "ip extracted from archive name",
filename: "nvidia_bug_report_192.168.12.34.tar.gz",
want: "192.168.12.34",
},
{
name: "no host available",
filename: "test.json",
want: "",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := inferTargetHost(tt.targetHost, tt.filename)
if got != tt.want {
t.Fatalf("inferTargetHost() = %q, want %q", got, tt.want)
}
})
}
}
func TestConvertToReanimator_DeduplicatesAllSections(t *testing.T) {
input := &models.AnalysisResult{
Filename: "dup-test.json",
CollectedAt: time.Date(2026, 2, 10, 15, 30, 0, 0, time.UTC),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{SerialNumber: "BOARD-001"},
Firmware: []models.FirmwareInfo{
{DeviceName: "BMC", Version: "1.0"},
{DeviceName: "BMC", Version: "1.1"},
},
CPUs: []models.CPU{
{Socket: 0, Model: "CPU-A"},
{Socket: 0, Model: "CPU-A-DUP"},
},
Memory: []models.MemoryDIMM{
{Slot: "DIMM_A1", Present: true, SizeMB: 32768, SerialNumber: "MEM-1", Status: "OK"},
{Slot: "DIMM_A1", Present: true, SizeMB: 32768, SerialNumber: "MEM-1-DUP", Status: "OK"},
},
Storage: []models.Storage{
{Slot: "U.2-1", SerialNumber: "SSD-1", Model: "Disk1", Present: true},
{Slot: "U.2-2", SerialNumber: "SSD-1", Model: "Disk1-dup", Present: true},
},
PCIeDevices: []models.PCIeDevice{
{Slot: "#GPU0", DeviceClass: "3D Controller", BDF: "17:00.0"},
{Slot: "SLOT-NIC1", DeviceClass: "NetworkController", BDF: "18:00.0"},
{Slot: "SLOT-NIC1", DeviceClass: "NetworkController", BDF: "18:00.1"},
},
GPUs: []models.GPU{
{Slot: "#GPU0", Model: "B200 180GB HBM3e", SerialNumber: "GPU-1", Status: "OK"},
},
PowerSupply: []models.PSU{
{Slot: "0", Present: true, SerialNumber: "PSU-1", Status: "OK"},
{Slot: "1", Present: true, SerialNumber: "PSU-1", Status: "OK"},
},
},
}
out, err := ConvertToReanimator(input)
if err != nil {
t.Fatalf("ConvertToReanimator() failed: %v", err)
}
if len(out.Hardware.Firmware) != 1 {
t.Fatalf("expected deduped firmware len=1, got %d", len(out.Hardware.Firmware))
}
if len(out.Hardware.CPUs) != 2 {
t.Fatalf("expected cpus len=2 (no serial/bdf dedupe), got %d", len(out.Hardware.CPUs))
}
if len(out.Hardware.Memory) != 2 {
t.Fatalf("expected memory len=2 (different serials), got %d", len(out.Hardware.Memory))
}
if len(out.Hardware.Storage) != 1 {
t.Fatalf("expected deduped storage len=1, got %d", len(out.Hardware.Storage))
}
if len(out.Hardware.PowerSupplies) != 1 {
t.Fatalf("expected deduped psu len=1, got %d", len(out.Hardware.PowerSupplies))
}
if len(out.Hardware.PCIeDevices) != 4 {
t.Fatalf("expected pcie len=4 with serial->bdf dedupe, got %d", len(out.Hardware.PCIeDevices))
}
gpuCount := 0
for _, dev := range out.Hardware.PCIeDevices {
if dev.Slot == "#GPU0" {
gpuCount++
}
}
if gpuCount != 2 {
t.Fatalf("expected two #GPU0 records (pcie+gpu kinds), got %d", gpuCount)
}
}
func TestConvertToReanimator_StatusFallbackUsesCollectedAt(t *testing.T) {
collectedAt := time.Date(2026, 2, 10, 15, 30, 0, 0, time.UTC)
input := &models.AnalysisResult{
Filename: "status-fallback.json",
CollectedAt: collectedAt,
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{SerialNumber: "BOARD-001"},
Storage: []models.Storage{
{
Slot: "U.2-1",
Model: "PM9A3",
SerialNumber: "SSD-001",
Present: true,
Status: "OK",
},
},
},
}
out, err := ConvertToReanimator(input)
if err != nil {
t.Fatalf("ConvertToReanimator() failed: %v", err)
}
if len(out.Hardware.Storage) != 1 {
t.Fatalf("expected 1 storage entry, got %d", len(out.Hardware.Storage))
}
wantTs := collectedAt.UTC().Format(time.RFC3339)
got := out.Hardware.Storage[0]
if got.StatusCheckedAt != wantTs {
t.Fatalf("expected status_checked_at=%q, got %q", wantTs, got.StatusCheckedAt)
}
if got.StatusAtCollect == nil || got.StatusAtCollect.At != wantTs {
t.Fatalf("expected status_at_collection.at=%q, got %#v", wantTs, got.StatusAtCollect)
}
}
func TestConvertToReanimator_FirmwareExcludesDeviceBoundEntries(t *testing.T) {
input := &models.AnalysisResult{
Filename: "fw-filter-test.json",
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{SerialNumber: "BOARD-001"},
Firmware: []models.FirmwareInfo{
{DeviceName: "BIOS", Version: "1.0.0"},
{DeviceName: "BMC", Version: "2.0.0"},
{DeviceName: "GPU GPUSXM1 (692-2G520-0280-501)", Version: "96.00.D0.00.03"},
{DeviceName: "NVSwitch NVSWITCH0 (965-25612-0002-000)", Version: "96.10.6D.00.01"},
{DeviceName: "NIC #CPU1_PCIE9 (MCX512A-ACAT)", Version: "28.38.1900"},
{DeviceName: "CPU0 Microcode", Version: "0x2b000643"},
},
},
}
out, err := ConvertToReanimator(input)
if err != nil {
t.Fatalf("ConvertToReanimator() failed: %v", err)
}
if len(out.Hardware.Firmware) != 2 {
t.Fatalf("expected only machine-level firmware entries, got %d", len(out.Hardware.Firmware))
}
got := map[string]string{}
for _, fw := range out.Hardware.Firmware {
got[fw.DeviceName] = fw.Version
}
if got["BIOS"] != "1.0.0" {
t.Fatalf("expected BIOS firmware to be kept")
}
if got["BMC"] != "2.0.0" {
t.Fatalf("expected BMC firmware to be kept")
}
if _, exists := got["GPU GPUSXM1 (692-2G520-0280-501)"]; exists {
t.Fatalf("expected GPU firmware to be excluded from hardware.firmware")
}
if _, exists := got["NVSwitch NVSWITCH0 (965-25612-0002-000)"]; exists {
t.Fatalf("expected NVSwitch firmware to be excluded from hardware.firmware")
}
}
func TestConvertToReanimator_UsesCanonicalDevices(t *testing.T) {
input := &models.AnalysisResult{
Filename: "canonical.json",
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{SerialNumber: "BOARD-001"},
Devices: []models.HardwareDevice{
{
Kind: models.DeviceKindCPU,
Slot: "CPU0",
Model: "INTEL(R) XEON(R)",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
},
{
Kind: models.DeviceKindStorage,
Slot: "U.2-1",
Model: "Disk1",
SerialNumber: "SSD-1",
Present: boolPtr(true),
},
},
},
}
out, err := ConvertToReanimator(input)
if err != nil {
t.Fatalf("ConvertToReanimator() failed: %v", err)
}
if len(out.Hardware.CPUs) != 1 {
t.Fatalf("expected cpu from hardware.devices, got %d", len(out.Hardware.CPUs))
}
if len(out.Hardware.Storage) != 1 {
t.Fatalf("expected storage from hardware.devices, got %d", len(out.Hardware.Storage))
}
}
func TestConvertToReanimator_BindsDeviceVitals(t *testing.T) {
input := &models.AnalysisResult{
Filename: "vitals.json",
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{SerialNumber: "BOARD-001"},
Devices: []models.HardwareDevice{
{
Kind: models.DeviceKindGPU,
Slot: "#GPU0",
Model: "B200 180GB HBM3e",
SerialNumber: "GPU-001",
BDF: "0000:17:00.0",
Details: map[string]any{
"temperature": 71,
"power": 350,
"voltage": 12.2,
},
},
{
Kind: models.DeviceKindPSU,
Slot: "PSU0",
SerialNumber: "PSU-001",
Present: boolPtr(true),
InputPowerW: 1400,
OutputPowerW: 1300,
InputVoltage: 229.5,
TemperatureC: 44,
},
},
},
}
out, err := ConvertToReanimator(input)
if err != nil {
t.Fatalf("ConvertToReanimator() failed: %v", err)
}
if len(out.Hardware.PCIeDevices) != 1 {
t.Fatalf("expected one pcie device, got %d", len(out.Hardware.PCIeDevices))
}
pcie := out.Hardware.PCIeDevices[0]
if pcie.TemperatureC != 71 {
t.Fatalf("expected GPU temperature 71C, got %d", pcie.TemperatureC)
}
if pcie.PowerW != 350 {
t.Fatalf("expected GPU power 350W, got %d", pcie.PowerW)
}
if pcie.VoltageV != 12.2 {
t.Fatalf("expected device voltage 12.2V, got %.2f", pcie.VoltageV)
}
if len(out.Hardware.PowerSupplies) != 1 {
t.Fatalf("expected one PSU, got %d", len(out.Hardware.PowerSupplies))
}
psu := out.Hardware.PowerSupplies[0]
if psu.TemperatureC != 44 {
t.Fatalf("expected PSU temperature 44C, got %d", psu.TemperatureC)
}
}
func TestConvertToReanimator_PreservesVitalsAcrossCanonicalDedup(t *testing.T) {
input := &models.AnalysisResult{
Filename: "dedup-vitals.json",
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{SerialNumber: "BOARD-001"},
PCIeDevices: []models.PCIeDevice{
{
Slot: "#GPU0",
BDF: "0000:17:00.0",
DeviceClass: "3D Controller",
PartNumber: "Generic Display",
Manufacturer: "NVIDIA",
SerialNumber: "GPU-SN-001",
},
},
GPUs: []models.GPU{
{
Slot: "#GPU0",
BDF: "0000:17:00.0",
Model: "B200 180GB HBM3e",
Manufacturer: "NVIDIA",
SerialNumber: "GPU-SN-001",
Temperature: 67,
Power: 330,
Status: "OK",
},
},
},
}
out, err := ConvertToReanimator(input)
if err != nil {
t.Fatalf("ConvertToReanimator() failed: %v", err)
}
if len(out.Hardware.PCIeDevices) != 1 {
t.Fatalf("expected deduped one pcie entry, got %d", len(out.Hardware.PCIeDevices))
}
got := out.Hardware.PCIeDevices[0]
if got.TemperatureC != 67 {
t.Fatalf("expected deduped GPU temperature 67C, got %d", got.TemperatureC)
}
if got.PowerW != 330 {
t.Fatalf("expected deduped GPU power 330W, got %d", got.PowerW)
}
}
func boolPtr(v bool) *bool { return &v }

View File

@@ -0,0 +1,289 @@
package exporter
import (
"encoding/json"
"strings"
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
// TestFullReanimatorExport tests complete export with realistic data
func TestFullReanimatorExport(t *testing.T) {
// Create a realistic AnalysisResult similar to import-example-full.json
result := &models.AnalysisResult{
Filename: "redfish://10.10.10.103",
SourceType: "api",
Protocol: "redfish",
TargetHost: "10.10.10.103",
CollectedAt: time.Date(2026, 2, 10, 15, 30, 0, 0, time.UTC),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
Manufacturer: "Supermicro",
ProductName: "X12DPG-QT6",
SerialNumber: "21D634101",
PartNumber: "X12DPG-QT6-REV1.01",
UUID: "d7ef2fe5-2fd0-11f0-910a-346f11040868",
},
Firmware: []models.FirmwareInfo{
{DeviceName: "BIOS", Version: "06.08.05"},
{DeviceName: "BMC", Version: "5.17.00"},
{DeviceName: "CPLD", Version: "01.02.03"},
},
CPUs: []models.CPU{
{
Socket: 0,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
{
Socket: 1,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
},
Memory: []models.MemoryDIMM{
{
Slot: "CPU0_C0D0",
Location: "CPU0_C0D0",
Present: true,
SizeMB: 32768,
Type: "DDR5",
MaxSpeedMHz: 4800,
CurrentSpeedMHz: 4800,
Manufacturer: "Hynix",
SerialNumber: "80AD032419E17CEEC1",
PartNumber: "HMCG88AGBRA191N",
Status: "OK",
},
{
Slot: "CPU0_C1D0",
Location: "CPU0_C1D0",
Present: false,
SizeMB: 0,
Type: "",
MaxSpeedMHz: 0,
CurrentSpeedMHz: 0,
Status: "Empty",
},
},
Storage: []models.Storage{
{
Slot: "OB01",
Type: "NVMe",
Model: "INTEL SSDPF2KX076T1",
SizeGB: 7680,
SerialNumber: "BTAX41900GF87P6DGN",
Manufacturer: "Intel",
Firmware: "9CV10510",
Interface: "NVMe",
Present: true,
},
{
Slot: "FP00HDD00",
Type: "HDD",
Model: "ST12000NM0008",
SizeGB: 12000,
SerialNumber: "ZJV01234ABC",
Manufacturer: "Seagate",
Firmware: "SN03",
Interface: "SATA",
Present: true,
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "PCIeCard1",
VendorID: 32902,
DeviceID: 2912,
BDF: "0000:18:00.0",
DeviceClass: "MassStorageController",
Manufacturer: "Intel",
PartNumber: "RAID Controller RSP3DD080F",
LinkWidth: 8,
LinkSpeed: "Gen3",
MaxLinkWidth: 8,
MaxLinkSpeed: "Gen3",
SerialNumber: "RAID-001-12345",
},
{
Slot: "PCIeCard2",
VendorID: 5555,
DeviceID: 4401,
BDF: "0000:3b:00.0",
DeviceClass: "NetworkController",
Manufacturer: "Mellanox",
PartNumber: "ConnectX-5",
LinkWidth: 16,
LinkSpeed: "Gen3",
MaxLinkWidth: 16,
MaxLinkSpeed: "Gen3",
SerialNumber: "MT2892012345",
},
},
PowerSupply: []models.PSU{
{
Slot: "0",
Present: true,
Model: "GW-CRPS3000LW",
Vendor: "Great Wall",
WattageW: 3000,
SerialNumber: "2P06C102610",
PartNumber: "V0310C9000000000",
Firmware: "00.03.05",
Status: "OK",
InputType: "ACWideRange",
InputPowerW: 137,
OutputPowerW: 104,
InputVoltage: 215.25,
},
},
},
}
// Convert to Reanimator format
reanimator, err := ConvertToReanimator(result)
if err != nil {
t.Fatalf("ConvertToReanimator failed: %v", err)
}
// Verify top-level fields
if reanimator.Filename != "redfish://10.10.10.103" {
t.Errorf("Filename mismatch: got %q", reanimator.Filename)
}
if reanimator.SourceType != "api" {
t.Errorf("SourceType mismatch: got %q", reanimator.SourceType)
}
if reanimator.Protocol != "redfish" {
t.Errorf("Protocol mismatch: got %q", reanimator.Protocol)
}
if reanimator.TargetHost != "10.10.10.103" {
t.Errorf("TargetHost mismatch: got %q", reanimator.TargetHost)
}
if reanimator.CollectedAt != "2026-02-10T15:30:00Z" {
t.Errorf("CollectedAt mismatch: got %q", reanimator.CollectedAt)
}
// Verify hardware sections
hw := reanimator.Hardware
// Board
if hw.Board.SerialNumber != "21D634101" {
t.Errorf("Board serial mismatch: got %q", hw.Board.SerialNumber)
}
// Firmware
if len(hw.Firmware) != 3 {
t.Errorf("Expected 3 firmware entries, got %d", len(hw.Firmware))
}
// CPUs
if len(hw.CPUs) != 2 {
t.Fatalf("Expected 2 CPUs, got %d", len(hw.CPUs))
}
if hw.CPUs[0].Manufacturer != "Intel" {
t.Errorf("CPU manufacturer not inferred: got %q", hw.CPUs[0].Manufacturer)
}
if hw.CPUs[0].Status != "Unknown" {
t.Errorf("CPU status mismatch: got %q", hw.CPUs[0].Status)
}
// Memory (empty slots are excluded)
if len(hw.Memory) != 1 {
t.Errorf("Expected 1 memory entry (installed only), got %d", len(hw.Memory))
}
// Storage
if len(hw.Storage) != 2 {
t.Errorf("Expected 2 storage devices, got %d", len(hw.Storage))
}
if hw.Storage[0].Status != "Unknown" {
t.Errorf("Storage status mismatch: got %q", hw.Storage[0].Status)
}
// PCIe devices
if len(hw.PCIeDevices) != 2 {
t.Errorf("Expected 2 PCIe devices, got %d", len(hw.PCIeDevices))
}
if hw.PCIeDevices[0].Model == "" {
t.Error("PCIe model should be populated from PartNumber")
}
// Power supplies
if len(hw.PowerSupplies) != 1 {
t.Errorf("Expected 1 PSU, got %d", len(hw.PowerSupplies))
}
// Verify JSON marshaling works
jsonData, err := json.MarshalIndent(reanimator, "", " ")
if err != nil {
t.Fatalf("Failed to marshal to JSON: %v", err)
}
// Check that JSON contains expected fields
jsonStr := string(jsonData)
expectedFields := []string{
`"filename"`,
`"source_type"`,
`"protocol"`,
`"target_host"`,
`"collected_at"`,
`"hardware"`,
`"board"`,
`"cpus"`,
`"memory"`,
`"storage"`,
`"pcie_devices"`,
`"power_supplies"`,
`"firmware"`,
}
for _, field := range expectedFields {
if !strings.Contains(jsonStr, field) {
t.Errorf("JSON missing expected field: %s", field)
}
}
// Optional: print JSON for manual inspection (commented out for normal test runs)
// t.Logf("Generated Reanimator JSON:\n%s", string(jsonData))
}
// TestReanimatorExportWithoutTargetHost tests that target_host is inferred from filename
func TestReanimatorExportWithoutTargetHost(t *testing.T) {
result := &models.AnalysisResult{
Filename: "redfish://192.168.1.100",
SourceType: "api",
Protocol: "redfish",
TargetHost: "", // Empty - should be inferred
CollectedAt: time.Now(),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
SerialNumber: "TEST123",
},
},
}
reanimator, err := ConvertToReanimator(result)
if err != nil {
t.Fatalf("ConvertToReanimator failed: %v", err)
}
if reanimator.TargetHost != "192.168.1.100" {
t.Errorf("Expected target_host to be inferred from filename, got %q", reanimator.TargetHost)
}
}

View File

@@ -0,0 +1,153 @@
package exporter
// ReanimatorExport represents the top-level structure for Reanimator format export
type ReanimatorExport struct {
Filename string `json:"filename"`
SourceType string `json:"source_type,omitempty"`
Protocol string `json:"protocol,omitempty"`
TargetHost string `json:"target_host,omitempty"`
CollectedAt string `json:"collected_at"` // RFC3339 format
Hardware ReanimatorHardware `json:"hardware"`
}
// ReanimatorHardware contains all hardware components
type ReanimatorHardware struct {
Board ReanimatorBoard `json:"board"`
Firmware []ReanimatorFirmware `json:"firmware,omitempty"`
CPUs []ReanimatorCPU `json:"cpus,omitempty"`
Memory []ReanimatorMemory `json:"memory,omitempty"`
Storage []ReanimatorStorage `json:"storage,omitempty"`
PCIeDevices []ReanimatorPCIe `json:"pcie_devices,omitempty"`
PowerSupplies []ReanimatorPSU `json:"power_supplies,omitempty"`
}
// ReanimatorBoard represents motherboard/server information
type ReanimatorBoard struct {
Manufacturer string `json:"manufacturer,omitempty"`
ProductName string `json:"product_name,omitempty"`
SerialNumber string `json:"serial_number"`
PartNumber string `json:"part_number,omitempty"`
UUID string `json:"uuid,omitempty"`
}
// ReanimatorFirmware represents firmware version information
type ReanimatorFirmware struct {
DeviceName string `json:"device_name"`
Version string `json:"version"`
}
type ReanimatorStatusAtCollection struct {
Status string `json:"status"`
At string `json:"at"`
}
type ReanimatorStatusHistoryEntry struct {
Status string `json:"status"`
ChangedAt string `json:"changed_at"`
Details string `json:"details,omitempty"`
}
// ReanimatorCPU represents processor information
type ReanimatorCPU struct {
Socket int `json:"socket"`
Model string `json:"model"`
Cores int `json:"cores,omitempty"`
Threads int `json:"threads,omitempty"`
FrequencyMHz int `json:"frequency_mhz,omitempty"`
MaxFrequencyMHz int `json:"max_frequency_mhz,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorMemory represents a memory module (DIMM)
type ReanimatorMemory struct {
Slot string `json:"slot"`
Location string `json:"location,omitempty"`
Present bool `json:"present"`
SizeMB int `json:"size_mb,omitempty"`
Type string `json:"type,omitempty"`
MaxSpeedMHz int `json:"max_speed_mhz,omitempty"`
CurrentSpeedMHz int `json:"current_speed_mhz,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorStorage represents a storage device
type ReanimatorStorage struct {
Slot string `json:"slot"`
Type string `json:"type,omitempty"`
Model string `json:"model"`
SizeGB int `json:"size_gb,omitempty"`
SerialNumber string `json:"serial_number"`
Manufacturer string `json:"manufacturer,omitempty"`
Firmware string `json:"firmware,omitempty"`
Interface string `json:"interface,omitempty"`
Present bool `json:"present"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorPCIe represents a PCIe device
type ReanimatorPCIe struct {
Slot string `json:"slot"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
BDF string `json:"bdf,omitempty"`
DeviceClass string `json:"device_class,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Model string `json:"model,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
PowerW int `json:"power_w,omitempty"`
VoltageV float64 `json:"voltage_v,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorPSU represents a power supply unit
type ReanimatorPSU struct {
Slot string `json:"slot"`
Present bool `json:"present"`
Model string `json:"model,omitempty"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}

View File

@@ -13,6 +13,7 @@ type AnalysisResult struct {
SourceType string `json:"source_type,omitempty"` // archive | api
Protocol string `json:"protocol,omitempty"` // redfish | ipmi
TargetHost string `json:"target_host,omitempty"` // BMC host for live collect
SourceTimezone string `json:"source_timezone,omitempty"` // Source timezone/offset used during collection (e.g. +08:00)
CollectedAt time.Time `json:"collected_at,omitempty"` // Collection/upload timestamp
RawPayloads map[string]any `json:"raw_payloads,omitempty"` // Additional source payloads (e.g. Redfish tree)
Events []Event `json:"events"`
@@ -43,6 +44,19 @@ const (
SeverityInfo Severity = "info"
)
// StatusAtCollection captures component status at a specific timestamp.
type StatusAtCollection struct {
Status string `json:"status"`
At time.Time `json:"at"`
}
// StatusHistoryEntry represents a status transition point.
type StatusHistoryEntry struct {
Status string `json:"status"`
ChangedAt time.Time `json:"changed_at"`
Details string `json:"details,omitempty"`
}
// SensorReading represents a single sensor reading
type SensorReading struct {
Name string `json:"name"`
@@ -71,9 +85,11 @@ type FRUInfo struct {
type HardwareConfig struct {
Firmware []FirmwareInfo `json:"firmware,omitempty"`
BoardInfo BoardInfo `json:"board,omitempty"`
Devices []HardwareDevice `json:"devices,omitempty"`
CPUs []CPU `json:"cpus,omitempty"`
Memory []MemoryDIMM `json:"memory,omitempty"`
Storage []Storage `json:"storage,omitempty"`
Volumes []StorageVolume `json:"volumes,omitempty"`
PCIeDevices []PCIeDevice `json:"pcie_devices,omitempty"`
GPUs []GPU `json:"gpus,omitempty"`
NetworkCards []NIC `json:"network_cards,omitempty"`
@@ -81,27 +97,91 @@ type HardwareConfig struct {
PowerSupply []PSU `json:"power_supplies,omitempty"`
}
const (
DeviceKindBoard = "board"
DeviceKindCPU = "cpu"
DeviceKindMemory = "memory"
DeviceKindStorage = "storage"
DeviceKindPCIe = "pcie"
DeviceKindGPU = "gpu"
DeviceKindNetwork = "network"
DeviceKindPSU = "psu"
)
// HardwareDevice is canonical device inventory used across UI and exports.
type HardwareDevice struct {
ID string `json:"id"`
Kind string `json:"kind"`
Source string `json:"source,omitempty"`
Slot string `json:"slot,omitempty"`
Location string `json:"location,omitempty"`
BDF string `json:"bdf,omitempty"`
DeviceClass string `json:"device_class,omitempty"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
Model string `json:"model,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Type string `json:"type,omitempty"`
Interface string `json:"interface,omitempty"`
Present *bool `json:"present,omitempty"`
SizeMB int `json:"size_mb,omitempty"`
SizeGB int `json:"size_gb,omitempty"`
Cores int `json:"cores,omitempty"`
Threads int `json:"threads,omitempty"`
FrequencyMHz int `json:"frequency_mhz,omitempty"`
MaxFreqMHz int `json:"max_frequency_mhz,omitempty"`
PortCount int `json:"port_count,omitempty"`
PortType string `json:"port_type,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// FirmwareInfo represents firmware version information
type FirmwareInfo struct {
DeviceName string `json:"device_name"`
Version string `json:"version"`
BuildTime string `json:"build_time,omitempty"`
DeviceName string `json:"device_name"`
Description string `json:"description,omitempty"`
Version string `json:"version"`
BuildTime string `json:"build_time,omitempty"`
}
// BoardInfo represents motherboard/system information
type BoardInfo struct {
Manufacturer string `json:"manufacturer,omitempty"`
ProductName string `json:"product_name,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Version string `json:"version,omitempty"`
UUID string `json:"uuid,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
ProductName string `json:"product_name,omitempty"`
Description string `json:"description,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Version string `json:"version,omitempty"`
UUID string `json:"uuid,omitempty"`
BMCMACAddress string `json:"bmc_mac_address,omitempty"`
}
// CPU represents processor information
type CPU struct {
Socket int `json:"socket"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Cores int `json:"cores"`
Threads int `json:"threads"`
FrequencyMHz int `json:"frequency_mhz"`
@@ -112,12 +192,20 @@ type CPU struct {
TDP int `json:"tdp_w,omitempty"`
PPIN string `json:"ppin,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// MemoryDIMM represents a memory module
type MemoryDIMM struct {
Slot string `json:"slot"`
Location string `json:"location"`
Description string `json:"description,omitempty"`
Present bool `json:"present"`
SizeMB int `json:"size_mb"`
Type string `json:"type"`
@@ -129,6 +217,12 @@ type MemoryDIMM struct {
PartNumber string `json:"part_number,omitempty"`
Status string `json:"status,omitempty"`
Ranks int `json:"ranks,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// Storage represents a storage device
@@ -136,6 +230,7 @@ type Storage struct {
Slot string `json:"slot"`
Type string `json:"type"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
SizeGB int `json:"size_gb"`
SerialNumber string `json:"serial_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
@@ -144,11 +239,32 @@ type Storage struct {
Present bool `json:"present"`
Location string `json:"location,omitempty"` // Front/Rear
BackplaneID int `json:"backplane_id,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// StorageVolume represents a logical storage volume (RAID/VROC/etc.).
type StorageVolume struct {
ID string `json:"id,omitempty"`
Name string `json:"name,omitempty"`
Controller string `json:"controller,omitempty"`
RAIDLevel string `json:"raid_level,omitempty"`
SizeGB int `json:"size_gb,omitempty"`
CapacityBytes int64 `json:"capacity_bytes,omitempty"`
Status string `json:"status,omitempty"`
Bootable bool `json:"bootable,omitempty"`
Encrypted bool `json:"encrypted,omitempty"`
}
// PCIeDevice represents a PCIe device
type PCIeDevice struct {
Slot string `json:"slot"`
Description string `json:"description,omitempty"`
VendorID int `json:"vendor_id"`
DeviceID int `json:"device_id"`
BDF string `json:"bdf"`
@@ -161,12 +277,20 @@ type PCIeDevice struct {
PartNumber string `json:"part_number,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// NIC represents a network interface card
type NIC struct {
Name string `json:"name"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
MACAddress string `json:"mac_address"`
SpeedMbps int `json:"speed_mbps,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
@@ -177,6 +301,7 @@ type PSU struct {
Slot string `json:"slot"`
Present bool `json:"present"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
@@ -189,6 +314,12 @@ type PSU struct {
InputVoltage float64 `json:"input_voltage,omitempty"`
OutputVoltage float64 `json:"output_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// GPU represents a graphics processing unit
@@ -196,6 +327,7 @@ type GPU struct {
Slot string `json:"slot"`
Location string `json:"location,omitempty"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
@@ -220,6 +352,12 @@ type GPU struct {
CurrentLinkWidth int `json:"current_link_width,omitempty"`
CurrentLinkSpeed string `json:"current_link_speed,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// NetworkAdapter represents a network adapter with detailed info
@@ -228,6 +366,7 @@ type NetworkAdapter struct {
Location string `json:"location"`
Present bool `json:"present"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Vendor string `json:"vendor,omitempty"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
@@ -238,4 +377,10 @@ type NetworkAdapter struct {
PortType string `json:"port_type,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}

View File

@@ -9,25 +9,45 @@ import (
"io"
"os"
"path/filepath"
"sort"
"strings"
"time"
)
const maxSingleFileSize = 10 * 1024 * 1024
const maxZipArchiveSize = 50 * 1024 * 1024
const maxGzipDecompressedSize = 50 * 1024 * 1024
var supportedArchiveExt = map[string]struct{}{
".gz": {},
".tgz": {},
".tar": {},
".sds": {},
".zip": {},
".txt": {},
".log": {},
}
// ExtractedFile represents a file extracted from archive
type ExtractedFile struct {
Path string
Content []byte
Path string
Content []byte
ModTime time.Time
Truncated bool
TruncatedMessage string
}
// ExtractArchive extracts tar.gz or zip archive and returns file contents
func ExtractArchive(archivePath string) ([]ExtractedFile, error) {
if !IsSupportedArchiveFilename(archivePath) {
return nil, fmt.Errorf("unsupported archive format: %s", strings.ToLower(filepath.Ext(archivePath)))
}
ext := strings.ToLower(filepath.Ext(archivePath))
switch ext {
case ".gz", ".tgz":
return extractTarGz(archivePath)
case ".tar":
case ".tar", ".sds":
return extractTar(archivePath)
case ".zip":
return extractZip(archivePath)
@@ -40,13 +60,18 @@ func ExtractArchive(archivePath string) ([]ExtractedFile, error) {
// ExtractArchiveFromReader extracts archive from reader
func ExtractArchiveFromReader(r io.Reader, filename string) ([]ExtractedFile, error) {
if !IsSupportedArchiveFilename(filename) {
return nil, fmt.Errorf("unsupported archive format: %s", strings.ToLower(filepath.Ext(filename)))
}
ext := strings.ToLower(filepath.Ext(filename))
switch ext {
case ".gz", ".tgz":
return extractTarGzFromReader(r, filename)
case ".tar":
case ".tar", ".sds":
return extractTarFromReader(r)
case ".zip":
return extractZipFromReader(r)
case ".txt", ".log":
return extractSingleFileFromReader(r, filename)
default:
@@ -54,6 +79,27 @@ func ExtractArchiveFromReader(r io.Reader, filename string) ([]ExtractedFile, er
}
}
// IsSupportedArchiveFilename reports whether filename extension is supported by archive extractor.
func IsSupportedArchiveFilename(filename string) bool {
ext := strings.ToLower(strings.TrimSpace(filepath.Ext(filename)))
if ext == "" {
return false
}
_, ok := supportedArchiveExt[ext]
return ok
}
// SupportedArchiveExtensions returns sorted list of archive/file extensions
// accepted by archive extractor.
func SupportedArchiveExtensions() []string {
out := make([]string, 0, len(supportedArchiveExt))
for ext := range supportedArchiveExt {
out = append(out, ext)
}
sort.Strings(out)
return out
}
func extractTarGz(archivePath string) ([]ExtractedFile, error) {
f, err := os.Open(archivePath)
if err != nil {
@@ -105,6 +151,7 @@ func extractTarFromReader(r io.Reader) ([]ExtractedFile, error) {
files = append(files, ExtractedFile{
Path: header.Name,
Content: content,
ModTime: header.ModTime,
})
}
@@ -118,12 +165,16 @@ func extractTarGzFromReader(r io.Reader, filename string) ([]ExtractedFile, erro
}
defer gzr.Close()
// Read all decompressed content into buffer
// Limit to 50MB for plain gzip files, 10MB per file for tar.gz
decompressed, err := io.ReadAll(io.LimitReader(gzr, 50*1024*1024))
// Read decompressed content with a hard cap.
// When the payload exceeds the cap, keep the first chunk and mark it as truncated.
decompressed, err := io.ReadAll(io.LimitReader(gzr, maxGzipDecompressedSize+1))
if err != nil {
return nil, fmt.Errorf("read gzip content: %w", err)
}
gzipTruncated := len(decompressed) > maxGzipDecompressedSize
if gzipTruncated {
decompressed = decompressed[:maxGzipDecompressedSize]
}
// Try to read as tar archive
tr := tar.NewReader(bytes.NewReader(decompressed))
@@ -139,12 +190,20 @@ func extractTarGzFromReader(r io.Reader, filename string) ([]ExtractedFile, erro
baseName = gzr.Name
}
return []ExtractedFile{
{
Path: baseName,
Content: decompressed,
},
}, nil
file := ExtractedFile{
Path: baseName,
Content: decompressed,
ModTime: gzr.ModTime,
}
if gzipTruncated {
file.Truncated = true
file.TruncatedMessage = fmt.Sprintf(
"decompressed gzip content exceeded %d bytes and was truncated",
maxGzipDecompressedSize,
)
}
return []ExtractedFile{file}, nil
}
return nil, fmt.Errorf("tar read: %w", err)
}
@@ -163,6 +222,7 @@ func extractTarGzFromReader(r io.Reader, filename string) ([]ExtractedFile, erro
files = append(files, ExtractedFile{
Path: header.Name,
Content: content,
ModTime: header.ModTime,
})
}
}
@@ -213,6 +273,59 @@ func extractZip(archivePath string) ([]ExtractedFile, error) {
files = append(files, ExtractedFile{
Path: f.Name,
Content: content,
ModTime: f.Modified,
})
}
return files, nil
}
func extractZipFromReader(r io.Reader) ([]ExtractedFile, error) {
// Read all data into memory with a hard cap
data, err := io.ReadAll(io.LimitReader(r, maxZipArchiveSize+1))
if err != nil {
return nil, fmt.Errorf("read zip data: %w", err)
}
if len(data) > maxZipArchiveSize {
return nil, fmt.Errorf("zip too large: max %d bytes", maxZipArchiveSize)
}
// Create a ReaderAt from the byte slice
readerAt := bytes.NewReader(data)
// Open the zip archive
zipReader, err := zip.NewReader(readerAt, int64(len(data)))
if err != nil {
return nil, fmt.Errorf("open zip: %w", err)
}
var files []ExtractedFile
for _, f := range zipReader.File {
if f.FileInfo().IsDir() {
continue
}
// Skip large files (>10MB)
if f.FileInfo().Size() > 10*1024*1024 {
continue
}
rc, err := f.Open()
if err != nil {
return nil, fmt.Errorf("open file %s: %w", f.Name, err)
}
content, err := io.ReadAll(rc)
rc.Close()
if err != nil {
return nil, fmt.Errorf("read file %s: %w", f.Name, err)
}
files = append(files, ExtractedFile{
Path: f.Name,
Content: content,
ModTime: f.Modified,
})
}
@@ -220,13 +333,24 @@ func extractZip(archivePath string) ([]ExtractedFile, error) {
}
func extractSingleFile(path string) ([]ExtractedFile, error) {
info, err := os.Stat(path)
if err != nil {
return nil, fmt.Errorf("stat file: %w", err)
}
f, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("open file: %w", err)
}
defer f.Close()
return extractSingleFileFromReader(f, filepath.Base(path))
files, err := extractSingleFileFromReader(f, filepath.Base(path))
if err != nil {
return nil, err
}
if len(files) > 0 {
files[0].ModTime = info.ModTime()
}
return files, nil
}
func extractSingleFileFromReader(r io.Reader, filename string) ([]ExtractedFile, error) {
@@ -234,16 +358,24 @@ func extractSingleFileFromReader(r io.Reader, filename string) ([]ExtractedFile,
if err != nil {
return nil, fmt.Errorf("read file content: %w", err)
}
if len(content) > maxSingleFileSize {
return nil, fmt.Errorf("file too large: max %d bytes", maxSingleFileSize)
truncated := len(content) > maxSingleFileSize
if truncated {
content = content[:maxSingleFileSize]
}
return []ExtractedFile{
{
Path: filepath.Base(filename),
Content: content,
},
}, nil
file := ExtractedFile{
Path: filepath.Base(filename),
Content: content,
}
if truncated {
file.Truncated = true
file.TruncatedMessage = fmt.Sprintf(
"file exceeded %d bytes and was truncated",
maxSingleFileSize,
)
}
return []ExtractedFile{file}, nil
}
// FindFileByPattern finds files matching pattern in extracted files

View File

@@ -1,6 +1,8 @@
package parser
import (
"archive/tar"
"bytes"
"os"
"path/filepath"
"strings"
@@ -46,3 +48,79 @@ func TestExtractArchiveTXT(t *testing.T) {
t.Fatalf("content mismatch")
}
}
func TestExtractArchiveFromReaderTXT_TruncatedWhenTooLarge(t *testing.T) {
large := bytes.Repeat([]byte("a"), maxSingleFileSize+1024)
files, err := ExtractArchiveFromReader(bytes.NewReader(large), "huge.log")
if err != nil {
t.Fatalf("extract huge txt from reader: %v", err)
}
if len(files) != 1 {
t.Fatalf("expected 1 file, got %d", len(files))
}
f := files[0]
if !f.Truncated {
t.Fatalf("expected file to be marked as truncated")
}
if got := len(f.Content); got != maxSingleFileSize {
t.Fatalf("expected truncated size %d, got %d", maxSingleFileSize, got)
}
if f.TruncatedMessage == "" {
t.Fatalf("expected truncation message")
}
}
func TestIsSupportedArchiveFilename(t *testing.T) {
cases := []struct {
name string
want bool
}{
{name: "dump.tar.gz", want: true},
{name: "nvidia-bug-report-1651124000923.log.gz", want: true},
{name: "snapshot.zip", want: true},
{name: "h3c_20250819.sds", want: true},
{name: "report.log", want: true},
{name: "xigmanas.txt", want: true},
{name: "raw_export.json", want: false},
{name: "archive.bin", want: false},
}
for _, tc := range cases {
got := IsSupportedArchiveFilename(tc.name)
if got != tc.want {
t.Fatalf("IsSupportedArchiveFilename(%q)=%v, want %v", tc.name, got, tc.want)
}
}
}
func TestExtractArchiveFromReaderSDS(t *testing.T) {
var buf bytes.Buffer
tw := tar.NewWriter(&buf)
payload := []byte("STARTTIME:0\nENDTIME:0\n")
if err := tw.WriteHeader(&tar.Header{
Name: "bmc/pack.info",
Mode: 0o600,
Size: int64(len(payload)),
}); err != nil {
t.Fatalf("write tar header: %v", err)
}
if _, err := tw.Write(payload); err != nil {
t.Fatalf("write tar payload: %v", err)
}
if err := tw.Close(); err != nil {
t.Fatalf("close tar writer: %v", err)
}
files, err := ExtractArchiveFromReader(bytes.NewReader(buf.Bytes()), "sample.sds")
if err != nil {
t.Fatalf("extract sds from reader: %v", err)
}
if len(files) != 1 {
t.Fatalf("expected 1 extracted file, got %d", len(files))
}
if files[0].Path != "bmc/pack.info" {
t.Fatalf("expected bmc/pack.info, got %q", files[0].Path)
}
}

View File

@@ -9,7 +9,7 @@ type VendorParser interface {
// Name returns human-readable parser name
Name() string
// Vendor returns vendor identifier (e.g., "inspur", "supermicro", "dell")
// Vendor returns vendor identifier (e.g., "inspur", "dell", "h3c_g6")
Vendor() string
// Version returns parser version string

View File

@@ -3,6 +3,8 @@ package parser
import (
"fmt"
"io"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
@@ -62,11 +64,74 @@ func (p *BMCParser) parseFiles() error {
// Preserve filename
result.Filename = p.result.Filename
appendExtractionWarnings(result, p.files)
if result.CollectedAt.IsZero() {
if ts := inferCollectedAtFromExtractedFiles(p.files); !ts.IsZero() {
result.CollectedAt = ts.UTC()
}
}
p.result = result
return nil
}
func inferCollectedAtFromExtractedFiles(files []ExtractedFile) time.Time {
var latestReliable time.Time
var latestAny time.Time
for _, f := range files {
ts := f.ModTime
if ts.IsZero() {
continue
}
if latestAny.IsZero() || ts.After(latestAny) {
latestAny = ts
}
// Ignore placeholder archive mtimes like 1980-01-01.
if ts.Year() < 2000 {
continue
}
if latestReliable.IsZero() || ts.After(latestReliable) {
latestReliable = ts
}
}
if !latestReliable.IsZero() {
return latestReliable
}
return latestAny
}
func appendExtractionWarnings(result *models.AnalysisResult, files []ExtractedFile) {
if result == nil {
return
}
truncated := make([]string, 0)
for _, f := range files {
if !f.Truncated {
continue
}
if f.TruncatedMessage != "" {
truncated = append(truncated, fmt.Sprintf("%s: %s", f.Path, f.TruncatedMessage))
continue
}
truncated = append(truncated, fmt.Sprintf("%s: content was truncated due to size limit", f.Path))
}
if len(truncated) == 0 {
return
}
result.Events = append(result.Events, models.Event{
Timestamp: time.Now(),
Source: "LOGPile",
EventType: "Analysis Warning",
Severity: models.SeverityWarning,
Description: "Input data was too large; analysis is partial and may be incomplete",
RawData: strings.Join(truncated, "; "),
})
}
// Result returns the analysis result
func (p *BMCParser) Result() *models.AnalysisResult {
return p.result

View File

@@ -0,0 +1,62 @@
package parser
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestAppendExtractionWarnings(t *testing.T) {
result := &models.AnalysisResult{
Events: make([]models.Event, 0),
}
files := []ExtractedFile{
{Path: "ok.log", Content: []byte("ok")},
{Path: "big.log", Truncated: true, TruncatedMessage: "file exceeded size limit and was truncated"},
}
appendExtractionWarnings(result, files)
if len(result.Events) != 1 {
t.Fatalf("expected 1 warning event, got %d", len(result.Events))
}
ev := result.Events[0]
if ev.Severity != models.SeverityWarning {
t.Fatalf("expected warning severity, got %q", ev.Severity)
}
if ev.EventType != "Analysis Warning" {
t.Fatalf("unexpected event type: %q", ev.EventType)
}
if ev.RawData == "" {
t.Fatalf("expected warning details in RawData")
}
}
func TestInferCollectedAtFromExtractedFiles_PrefersReliableMTime(t *testing.T) {
files := []ExtractedFile{
{Path: "a.log", ModTime: time.Date(1980, 1, 1, 0, 0, 0, 0, time.UTC)},
{Path: "b.log", ModTime: time.Date(2025, 12, 12, 10, 14, 49, 0, time.FixedZone("EST", -5*3600))},
{Path: "c.log", ModTime: time.Date(2026, 2, 28, 4, 18, 18, 0, time.FixedZone("UTC+8", 8*3600))},
}
got := inferCollectedAtFromExtractedFiles(files)
want := files[2].ModTime
if !got.Equal(want) {
t.Fatalf("expected %s, got %s", want, got)
}
}
func TestInferCollectedAtFromExtractedFiles_FallsBackToAnyMTime(t *testing.T) {
files := []ExtractedFile{
{Path: "a.log", ModTime: time.Date(1980, 1, 1, 0, 0, 0, 0, time.UTC)},
{Path: "b.log", ModTime: time.Date(1970, 1, 2, 0, 0, 0, 0, time.UTC)},
}
got := inferCollectedAtFromExtractedFiles(files)
want := files[0].ModTime
if !got.Equal(want) {
t.Fatalf("expected fallback %s, got %s", want, got)
}
}

View File

@@ -0,0 +1,33 @@
package parser
import (
"sync"
"time"
)
const fallbackTimezoneName = "Europe/Moscow"
var (
fallbackTimezoneOnce sync.Once
fallbackTimezone *time.Location
)
// DefaultArchiveLocation returns the timezone used for source timestamps
// that do not contain an explicit offset.
func DefaultArchiveLocation() *time.Location {
fallbackTimezoneOnce.Do(func() {
loc, err := time.LoadLocation(fallbackTimezoneName)
if err != nil {
fallbackTimezone = time.FixedZone("MSK", 3*60*60)
return
}
fallbackTimezone = loc
})
return fallbackTimezone
}
// ParseInDefaultArchiveLocation parses timestamps without timezone information
// using Europe/Moscow as the assumed source timezone.
func ParseInDefaultArchiveLocation(layout, value string) (time.Time, error) {
return time.ParseInLocation(layout, value, DefaultArchiveLocation())
}

View File

@@ -1,96 +0,0 @@
# Vendor Parser Modules
Каждый производитель серверов имеет свой формат диагностических архивов BMC.
Эта директория содержит модули парсеров для разных производителей.
## Структура модуля
```
vendors/
├── vendors.go # Импорты всех модулей (добавьте сюда новый)
├── README.md # Эта документация
├── template/ # Шаблон для нового модуля
│ └── parser.go.template
├── inspur/ # Модуль Inspur/Kaytus
│ ├── parser.go # Основной парсер + регистрация
│ ├── sdr.go # Парсинг SDR (сенсоры)
│ ├── fru.go # Парсинг FRU (серийники)
│ ├── asset.go # Парсинг asset.json
│ └── syslog.go # Парсинг syslog
├── supermicro/ # Будущий модуль Supermicro
├── dell/ # Будущий модуль Dell iDRAC
└── hpe/ # Будущий модуль HPE iLO
```
## Как добавить новый модуль
### 1. Создайте директорию модуля
```bash
mkdir -p internal/parser/vendors/VENDORNAME
```
### 2. Скопируйте шаблон
```bash
cp internal/parser/vendors/template/parser.go.template \
internal/parser/vendors/VENDORNAME/parser.go
```
### 3. Отредактируйте parser.go
- Замените `VENDORNAME` на идентификатор вендора (например, `supermicro`)
- Замените `VENDOR_DESCRIPTION` на описание (например, `Supermicro`)
- Реализуйте метод `Detect()` для определения формата
- Реализуйте метод `Parse()` для парсинга данных
### 4. Зарегистрируйте модуль
Добавьте импорт в `vendors/vendors.go`:
```go
import (
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/VENDORNAME" // Новый модуль
)
```
### 5. Готово!
Модуль автоматически зарегистрируется при старте приложения через `init()`.
## Интерфейс VendorParser
```go
type VendorParser interface {
// Name возвращает человекочитаемое имя парсера
Name() string
// Vendor возвращает идентификатор вендора
Vendor() string
// Detect проверяет, подходит ли этот парсер для файлов
// Возвращает уверенность 0-100 (0 = не подходит, 100 = точно этот формат)
Detect(files []ExtractedFile) int
// Parse парсит извлеченные файлы
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
}
```
## Советы по реализации Detect()
- Ищите уникальные файлы/директории для данного вендора
- Проверяйте содержимое файлов на характерные маркеры
- Возвращайте высокий confidence (70+) только при уверенном совпадении
- Несколько парсеров могут вернуть >0, выбирается с максимальным confidence
## Поддерживаемые вендоры
| Вендор | Идентификатор | Статус | Протестировано на |
|--------|---------------|--------|-------------------|
| Inspur/Kaytus | `inspur` | ✅ Готов | KR4268X2 (onekeylog) |
| Supermicro | `supermicro` | ⏳ Планируется | - |
| Dell iDRAC | `dell` | ⏳ Планируется | - |
| HPE iLO | `hpe` | ⏳ Планируется | - |
| Lenovo XCC | `lenovo` | ⏳ Планируется | - |

1493
internal/parser/vendors/dell/parser.go vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,224 @@
package dell
import (
"archive/zip"
"bytes"
"testing"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestDetectNestedTSRZip(t *testing.T) {
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{"Make":"Dell Inc.","Model":"PowerEdge R750","ServiceTag":"G37Q064"}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(`<CIM><MESSAGE><SIMPLEREQ/></MESSAGE></CIM>`),
})
p := &Parser{}
score := p.Detect([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR20241119143901_G37Q064.pl.zip", Content: inner},
})
if score < 80 {
t.Fatalf("expected high detect score for nested TSR zip, got %d", score)
}
}
func TestParseNestedTSRZip(t *testing.T) {
const viewXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SystemView">
<PROPERTY NAME="Manufacturer"><VALUE>Dell Inc.</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>PowerEdge R750</VALUE></PROPERTY>
<PROPERTY NAME="ServiceTag"><VALUE>G37Q064</VALUE></PROPERTY>
<PROPERTY NAME="BIOSVersionString"><VALUE>2.19.1</VALUE></PROPERTY>
<PROPERTY NAME="LifecycleControllerVersion"><VALUE>7.00.30.00</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_CPUView">
<PROPERTY NAME="FQDD"><VALUE>CPU.Socket.1</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>Intel(R) Xeon(R) Gold 6330</VALUE></PROPERTY>
<PROPERTY NAME="Manufacturer"><VALUE>Intel</VALUE></PROPERTY>
<PROPERTY NAME="NumberOfEnabledCores"><VALUE>28</VALUE></PROPERTY>
<PROPERTY NAME="NumberOfEnabledThreads"><VALUE>56</VALUE></PROPERTY>
<PROPERTY NAME="CurrentClockSpeed"><VALUE>2000</VALUE></PROPERTY>
<PROPERTY NAME="MaxClockSpeed"><VALUE>3100</VALUE></PROPERTY>
<PROPERTY NAME="PPIN"><VALUE>ABCD</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_NICView">
<PROPERTY NAME="FQDD"><VALUE>NIC.Slot.1-1-1</VALUE></PROPERTY>
<PROPERTY NAME="ProductName"><VALUE>Broadcom 57414 Dual Port 10/25GbE SFP28 Adapter</VALUE></PROPERTY>
<PROPERTY NAME="VendorName"><VALUE>Broadcom</VALUE></PROPERTY>
<PROPERTY NAME="CurrentMACAddress"><VALUE>00:11:22:33:44:55</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>NICSERIAL1</VALUE></PROPERTY>
<PROPERTY NAME="FamilyVersion"><VALUE>22.80.17</VALUE></PROPERTY>
<PROPERTY NAME="PCIVendorID"><VALUE>0x14e4</VALUE></PROPERTY>
<PROPERTY NAME="PCIDeviceID"><VALUE>0x16d7</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PowerSupplyView">
<PROPERTY NAME="FQDD"><VALUE>PSU.Slot.1</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>D1400E-S0</VALUE></PROPERTY>
<PROPERTY NAME="Manufacturer"><VALUE>Dell</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>PSUSERIAL1</VALUE></PROPERTY>
<PROPERTY NAME="FirmwareVersion"><VALUE>00.1A</VALUE></PROPERTY>
<PROPERTY NAME="TotalOutputPower"><VALUE>1400</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_VideoView">
<PROPERTY NAME="FQDD"><VALUE>Video.Slot.38-1</VALUE></PROPERTY>
<PROPERTY NAME="MarketingName"><VALUE>NVIDIA H100 PCIe</VALUE></PROPERTY>
<PROPERTY NAME="Description"><VALUE>GH100 [H100 PCIe]</VALUE></PROPERTY>
<PROPERTY NAME="Manufacturer"><VALUE>NVIDIA Corporation</VALUE></PROPERTY>
<PROPERTY NAME="PCIVendorID"><VALUE>10DE</VALUE></PROPERTY>
<PROPERTY NAME="PCIDeviceID"><VALUE>2331</VALUE></PROPERTY>
<PROPERTY NAME="BusNumber"><VALUE>74</VALUE></PROPERTY>
<PROPERTY NAME="DeviceNumber"><VALUE>0</VALUE></PROPERTY>
<PROPERTY NAME="FunctionNumber"><VALUE>0</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>1793924039808</VALUE></PROPERTY>
<PROPERTY NAME="FirmwareVersion"><VALUE>96.00.AF.00.01</VALUE></PROPERTY>
<PROPERTY NAME="GPUGUID"><VALUE>bc681a6d4785dde08c21f49c46c05cc3</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
const swXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SoftwareIdentity">
<PROPERTY NAME="ElementName"><VALUE>NIC.Slot.1-1-1</VALUE></PROPERTY>
<PROPERTY NAME="VersionString"><VALUE>22.80.17</VALUE></PROPERTY>
<PROPERTY NAME="ComponentType"><VALUE>Network</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
const eventsXML = `<Log>
<Event AgentID="Lifecycle Controller" Category="System Health" Severity="Warning" Timestamp="2024-11-19T14:39:01-0800">
<MessageID>SYS1001</MessageID>
<Message>Link is down</Message>
<FQDD>NIC.Slot.1-1-1</FQDD>
</Event>
</Log>`
const cimSensorXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_GPUSensor">
<PROPERTY NAME="DeviceID"><VALUE>Video.Slot.38-1</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryGPUTemperature"><VALUE>290</VALUE></PROPERTY>
<PROPERTY NAME="MemoryTemperature"><VALUE>440</VALUE></PROPERTY>
<PROPERTY NAME="PowerConsumption"><VALUE>295</VALUE></PROPERTY>
<PROPERTY NAME="ThermalAlertStatus"><VALUE>5</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="CIM_NumericSensor">
<PROPERTY NAME="ElementName"><VALUE>PS1 Voltage 1</VALUE></PROPERTY>
<PROPERTY NAME="CurrentReading"><VALUE>224.0</VALUE></PROPERTY>
<PROPERTY NAME="BaseUnits"><VALUE>5</VALUE></PROPERTY>
<PROPERTY NAME="UnitModifier"><VALUE>0</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>5</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{
"Make":"Dell Inc.",
"Model":"PowerEdge R750",
"ServiceTag":"G37Q064",
"FirmwareVersion":"7.00.30.00",
"CollectionDateTime":"2024-11-19 14:39:01.000-0800"
}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(viewXML),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml": []byte(swXML),
"tsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xml": []byte(cimSensorXML),
"tsr/hardware/sysinfo/lcfiles/curr_lclog.xml": []byte(eventsXML),
})
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR20241119143901_G37Q064.pl.zip", Content: inner},
})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if got := result.Hardware.BoardInfo.Manufacturer; got != "Dell Inc." {
t.Fatalf("unexpected board manufacturer: %q", got)
}
if got := result.Hardware.BoardInfo.ProductName; got != "PowerEdge R750" {
t.Fatalf("unexpected board product: %q", got)
}
if got := result.Hardware.BoardInfo.SerialNumber; got != "G37Q064" {
t.Fatalf("unexpected service tag: %q", got)
}
if len(result.Hardware.CPUs) != 1 {
t.Fatalf("expected 1 cpu, got %d", len(result.Hardware.CPUs))
}
if got := result.Hardware.CPUs[0].Model; got != "Intel(R) Xeon(R) Gold 6330" {
t.Fatalf("unexpected cpu model: %q", got)
}
if len(result.Hardware.NetworkAdapters) != 1 {
t.Fatalf("expected 1 network adapter, got %d", len(result.Hardware.NetworkAdapters))
}
adapter := result.Hardware.NetworkAdapters[0]
if adapter.Vendor != "Broadcom" {
t.Fatalf("unexpected nic vendor: %q", adapter.Vendor)
}
if adapter.Firmware != "22.80.17" {
t.Fatalf("unexpected nic firmware: %q", adapter.Firmware)
}
if adapter.SerialNumber != "NICSERIAL1" {
t.Fatalf("unexpected nic serial: %q", adapter.SerialNumber)
}
if len(result.Hardware.PowerSupply) != 1 {
t.Fatalf("expected 1 psu, got %d", len(result.Hardware.PowerSupply))
}
psu := result.Hardware.PowerSupply[0]
if psu.Model != "D1400E-S0" {
t.Fatalf("unexpected psu model: %q", psu.Model)
}
if psu.Firmware != "00.1A" {
t.Fatalf("unexpected psu firmware: %q", psu.Firmware)
}
if len(result.Hardware.Firmware) == 0 {
t.Fatalf("expected firmware entries")
}
if len(result.Hardware.GPUs) != 1 {
t.Fatalf("expected 1 gpu, got %d", len(result.Hardware.GPUs))
}
if got := result.Hardware.GPUs[0].Model; got != "NVIDIA H100 PCIe" {
t.Fatalf("unexpected gpu model: %q", got)
}
if got := result.Hardware.GPUs[0].SerialNumber; got != "1793924039808" {
t.Fatalf("unexpected gpu serial: %q", got)
}
if got := result.Hardware.GPUs[0].Temperature; got != 29 {
t.Fatalf("unexpected gpu temperature: %d", got)
}
if len(result.Sensors) == 0 {
t.Fatalf("expected sensors from CIM_Sensor")
}
if len(result.Events) != 1 {
t.Fatalf("expected one lifecycle event, got %d", len(result.Events))
}
if got := string(result.Events[0].Severity); got != "warning" {
t.Fatalf("unexpected event severity: %q", got)
}
}
func makeZipArchive(t *testing.T, files map[string][]byte) []byte {
t.Helper()
var buf bytes.Buffer
zw := zip.NewWriter(&buf)
for name, content := range files {
w, err := zw.Create(name)
if err != nil {
t.Fatalf("create zip entry %s: %v", name, err)
}
if _, err := w.Write(content); err != nil {
t.Fatalf("write zip entry %s: %v", name, err)
}
}
if err := zw.Close(); err != nil {
t.Fatalf("close zip: %v", err)
}
return buf.Bytes()
}

View File

@@ -1,72 +0,0 @@
# Generic Text File Parser
Fallback парсер для текстовых файлов, которые не распознаны другими парсерами.
## Назначение
Этот парсер обрабатывает любые текстовые файлы, которые:
- Не являются архивами специфичных вендоров
- Содержат текстовую информацию (не бинарные данные)
- Представляют собой одиночные .gz файлы или простые текстовые файлы
## Приоритет
**Confidence score: 15** (низкий приоритет)
Этот парсер срабатывает только если ни один другой парсер не подошел с более высоким confidence.
## Поддерживаемые файлы
### Автоматически распознаваемые типы
1. **NVIDIA Bug Report** (`nvidia-bug-report-*.log.gz`)
- Извлекает информацию о драйвере NVIDIA
- Находит GPU устройства
- Показывает версию драйвера
2. **Любые текстовые файлы**
- Проверяет, что содержимое - текст (не бинарные данные)
- Показывает базовую информацию о файле
## Извлекаемые данные
### Events
- **Text File**: Базовая информация о загруженном файле
- **Driver Info**: Информация о NVIDIA драйвере (для nvidia-bug-report)
- **GPU Device**: Обнаруженные GPU устройства (для nvidia-bug-report)
## Пример использования
```bash
# Запуск с nvidia-bug-report
./logpile --file nvidia-bug-report-*.log.gz
# Запуск с любым текстовым файлом
./logpile --file system.log.gz
```
## Версионирование
**Текущая версия парсера:** 1.0.0
## Ограничения
1. Этот парсер предоставляет только базовую информацию
2. Не выполняет глубокий анализ содержимого
3. Для детального анализа специфичных логов рекомендуется создать dedicated парсер
## Расширение
Чтобы добавить поддержку нового типа файлов:
1. Добавьте проверку в функцию `Parse()`
2. Создайте функцию `parseXXX()` для извлечения специфичной информации
3. Увеличьте версию парсера
Пример:
```go
if strings.Contains(strings.ToLower(file.Path), "custom-log") {
parseCustomLog(content, result)
}
```

View File

@@ -10,7 +10,7 @@ import (
)
// parserVersion - version of this parser module
const parserVersion = "1.0.0"
const parserVersion = "1.1"
func init() {
parser.Register(&Parser{})

3516
internal/parser/vendors/h3c/parser.go vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,962 @@
package h3c
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestDetectH3C_GenerationRouting(t *testing.T) {
g5 := &G5Parser{}
g6 := &G6Parser{}
g5Files := []parser.ExtractedFile{
{Path: "bmc/pack.info", Content: []byte("STARTTIME:0")},
{Path: "static/FRUInfo.ini", Content: []byte("[Baseboard]\nBoard Manufacturer=H3C\n")},
{Path: "static/hardware_info.ini", Content: []byte("[Processors: Processor 1]\nModel: Intel Xeon\n")},
{Path: "static/hardware.info", Content: []byte("[Disk_0_Front_NA]\nSerialNumber=DISK-0\n")},
{Path: "static/firmware_version.ini", Content: []byte("[System board]\nBIOS Version: 5.59\n")},
{Path: "user/test1.csv", Content: []byte("Record Time Stamp,DescInfo\n2025-01-01 00:00:00,foo\n")},
}
if gotG5, gotG6 := g5.Detect(g5Files), g6.Detect(g5Files); gotG5 <= gotG6 {
t.Fatalf("expected G5 confidence > G6 for G5 sample, got g5=%d g6=%d", gotG5, gotG6)
}
g6Files := []parser.ExtractedFile{
{Path: "bmc/pack.info", Content: []byte("STARTTIME:0")},
{Path: "static/FRUInfo.ini", Content: []byte("[Baseboard]\nBoard Manufacturer=H3C\n")},
{Path: "static/board_info.ini", Content: []byte("[System board]\nBoardMfr=H3C\n")},
{Path: "static/firmware_version.json", Content: []byte(`{"BIOS":{"Firmware Name":"BIOS","Firmware Version":"6.10"}}`)},
{Path: "static/CPUDetailInfo.xml", Content: []byte("<Root><CPU1><Model>X</Model></CPU1></Root>")},
{Path: "static/MemoryDetailInfo.xml", Content: []byte("<Root><DIMM1><Name>A0</Name></DIMM1></Root>")},
{Path: "user/Sel.json", Content: []byte(`{"Id":1}`)},
}
if gotG5, gotG6 := g5.Detect(g6Files), g6.Detect(g6Files); gotG6 <= gotG5 {
t.Fatalf("expected G6 confidence > G5 for G6 sample, got g5=%d g6=%d", gotG5, gotG6)
}
}
func TestParseH3CG6_RaidAndNVMeEnrichment(t *testing.T) {
p := &G6Parser{}
files := []parser.ExtractedFile{
{
Path: "static/storage_disk.ini",
Content: []byte(`[Disk_000]
DiskSlotDesc=Front0
Present=YES
SerialNumber=SER-0
`),
},
{
Path: "static/raid.json",
Content: []byte(`{
"RaidConfig": {
"CtrlInfo": [
{
"CtrlSlot": 1,
"CtrlName": "RAID-LSI-9560",
"LDInfo": [
{
"LDID": "0",
"LDName": "VD0",
"RAIDLevel": "1",
"CapacityBytes": 1000000000,
"Status": "Optimal"
}
]
}
]
}
}`),
},
{
Path: "static/Storage_RAID-LSI-9560-LP-8i-4GB[1].txt",
Content: []byte(`Controller Information
------------------------------------------------------------------------
AssetTag : RAID-LSI-9560
Logical Device Information
------------------------------------------------------------------------
LDID : 0
Name : VD0
RAID Level : 1
CapacityBytes : 1000000000
Status : Optimal
Physical Device Information
------------------------------------------------------------------------
ConnectionID : 0
Position : Front0
StatusIndicator : OK
Protocol : SATA
MediaType : SSD
Manufacturer : Samsung
Model : PM893
Revision : GDC1
SerialNumber : SER-0
CapacityBytes : 480000000000
ConnectionID : 1
Position : Front1
StatusIndicator : OK
Protocol : SATA
MediaType : SSD
Manufacturer : Samsung
Model : PM893
Revision : GDC1
SerialNumber : SER-1
CapacityBytes : 480000000000
`),
},
{
Path: "static/NVMe_info.txt",
Content: []byte(`[NVMe_0]
Present=YES
DiskSlotDesc=Front2
Model=INTEL SSDPE2KX010T8
SerialNumber=NVME-1
Firmware=V100
CapacityBytes=1000204886016
Interface=NVMe
Status=OK
`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if len(result.Hardware.Volumes) != 1 {
t.Fatalf("expected 1 volume, got %d", len(result.Hardware.Volumes))
}
vol := result.Hardware.Volumes[0]
if vol.RAIDLevel != "RAID1" {
t.Fatalf("expected RAID1 level, got %q", vol.RAIDLevel)
}
if vol.SizeGB != 1 {
t.Fatalf("expected 1GB logical volume, got %d", vol.SizeGB)
}
if len(result.Hardware.Storage) != 3 {
t.Fatalf("expected 3 unique storage devices, got %d", len(result.Hardware.Storage))
}
var front0 *models.Storage
var nvme *models.Storage
for i := range result.Hardware.Storage {
s := &result.Hardware.Storage[i]
if strings.EqualFold(s.SerialNumber, "SER-0") {
front0 = s
}
if strings.EqualFold(s.SerialNumber, "NVME-1") {
nvme = s
}
}
if front0 == nil {
t.Fatalf("expected merged Front0 disk by serial SER-0")
}
if front0.Model != "PM893" {
t.Fatalf("expected Front0 model PM893, got %q", front0.Model)
}
if front0.SizeGB != 480 {
t.Fatalf("expected Front0 size 480GB, got %d", front0.SizeGB)
}
if nvme == nil {
t.Fatalf("expected NVMe disk by serial NVME-1")
}
if nvme.Type != "nvme" {
t.Fatalf("expected nvme type, got %q", nvme.Type)
}
}
func TestParseH3CG6(t *testing.T) {
p := &G6Parser{}
files := []parser.ExtractedFile{
{
Path: "static/FRUInfo.ini",
Content: []byte(`[Baseboard]
Board Manufacturer=H3C
Board Product Name=RS36M2C6SB
Product Product Name=H3C UniServer R4700 G6
Product Serial Number=210235A4FYH257000010
Product Part Number=0235A4FY
`),
},
{
Path: "static/firmware_version.json",
Content: []byte(`{
"BMCP": {"Firmware Name":"HDM","Firmware Version":"1.83","Location":"bmc card","Part Model":"-"},
"BIOS": {"Firmware Name":"BIOS","Firmware Version":"6.10.53","Location":"system board","Part Model":"-"}
}`),
},
{
Path: "static/CPUDetailInfo.xml",
Content: []byte(`<Root>
<CPU1>
<Status>Presence</Status>
<Model>INTEL(R) XEON(R) GOLD 6542Y</Model>
<ProcessorSpeed>0xb54</ProcessorSpeed>
<ProcessorMaxSpeed>0x1004</ProcessorMaxSpeed>
<TotalCores>0x18</TotalCores>
<TotalThreads>0x30</TotalThreads>
<SerialNumber>68-5C-81-C1-0E-A3-4E-40</SerialNumber>
<PPIN>68-5C-81-C1-0E-A3-4E-40</PPIN>
</CPU1>
</Root>`),
},
{
Path: "static/MemoryDetailInfo.xml",
Content: []byte(`<Root>
<DIMM1>
<Status>Presence</Status>
<Name>CPU1_CH1_D0 (A0)</Name>
<PartNumber>M321R8GA0PB0-CWMXJ</PartNumber>
<DIMMTech>RDIMM</DIMMTech>
<SerialNumber>80CE032519135C82ED</SerialNumber>
<DIMMRanks>0x2</DIMMRanks>
<DIMMSize>0x10000</DIMMSize>
<CurFreq>0x1130</CurFreq>
<MaxFreq>0x15e0</MaxFreq>
<DIMMSilk>A0</DIMMSilk>
</DIMM1>
</Root>`),
},
{
Path: "static/storage_disk.ini",
Content: []byte(`[Disk_000]
SerialNumber=S6KLNN0Y516813
DiskSlotDesc=Front0
Present=YES
`),
},
{
Path: "static/net_cfg.ini",
Content: []byte(`[Network Configuration]
eth0 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F6
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth0.2 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F6
inet6 addr: fe80::32c6:d7ff:fe94:54f6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1496 Metric:1
eth1 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F5
inet addr:10.201.129.0 Bcast:10.201.143.255 Mask:255.255.240.0
inet6 addr: fe80::32c6:d7ff:fe94:54f5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
`),
},
{
Path: "static/psu_cfg.ini",
Content: []byte(`[Psu0]
SN=210231AGUNH257001569
Max_Power(W)=1600
Manufacturer=Great Wall
Power Status=Input Normal, Output Normal
Present_Status=Present
Power_ID=1
Model=GW-CRPS1600D2
Version=03.02.00
[Psu1]
Manufacturer=Great Wall
Power_ID=2
Version=03.02.00
Power Status=Input Normal, Output Normal
SN=210231AGUNH257001570
Model=GW-CRPS1600D2
Present_Status=Present
Max_Power(W)=1600
`),
},
{
Path: "static/hardware_info.ini",
Content: []byte(`[Ethernet adapters: Port 1]
Device Type : NIC
Network Port : Port 1
Location : PCIE-[1]
MAC Address : E4:3D:1A:6F:B0:30
Speed : 8.0GT/s
Product Name : NIC-BCM957414-F-B-25Gb-2P
[Ethernet adapters: Port 2]
Device Type : NIC
Network Port : Port 2
Location : PCIE-[1]
MAC Address : E4:3D:1A:6F:B0:31
Speed : 8.0GT/s
Product Name : NIC-BCM957414-F-B-25Gb-2P
[PCIe Card: PCIe 1]
Location : 1
Product Name : NIC-BCM957414-F-B-25Gb-2P
Status : Normal
Vendor ID : 0x14E4
Device ID : 0x16D7
Serial Number : NICSN-G6-001
Part Number : NICPN-G6-001
Firmware Version : 22.35.1010
`),
},
{
Path: "static/sensor_info.ini",
Content: []byte(`Sensor Name | Reading | Unit | Status| Crit low
Inlet_Temp | 20.000 | degrees C | ok | na
CPU1_Status | 0x0 | discrete | 0x8080| na
`),
},
{
Path: "user/Sel.json",
Content: []byte(`
{
"Created": "2025-07-14 03:34:18 UTC+08:00",
"Severity": "Info",
"EntryCode": "Asserted",
"EntryType": "Event",
"Id": 1,
"Level": "Info",
"Message": "Processor Presence detected",
"SensorName": "CPU1_Status",
"SensorType": "Processor"
},
{
"Created": "2025-07-14 20:56:45 UTC+08:00",
"Severity": "Critical",
"EntryCode": "Asserted",
"EntryType": "Event",
"Id": 2,
"Level": "Critical",
"Message": "Power Supply AC lost",
"SensorName": "PSU1_Status",
"SensorType": "Power Supply"
}
`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if result.Hardware.BoardInfo.Manufacturer != "H3C" {
t.Fatalf("unexpected board manufacturer: %q", result.Hardware.BoardInfo.Manufacturer)
}
if result.Hardware.BoardInfo.ProductName != "H3C UniServer R4700 G6" {
t.Fatalf("unexpected board product: %q", result.Hardware.BoardInfo.ProductName)
}
if result.Hardware.BoardInfo.SerialNumber != "210235A4FYH257000010" {
t.Fatalf("unexpected board serial: %q", result.Hardware.BoardInfo.SerialNumber)
}
if len(result.Hardware.Firmware) < 2 {
t.Fatalf("expected firmware entries, got %d", len(result.Hardware.Firmware))
}
if len(result.Hardware.CPUs) != 1 {
t.Fatalf("expected 1 cpu, got %d", len(result.Hardware.CPUs))
}
if result.Hardware.CPUs[0].Cores != 24 {
t.Fatalf("expected 24 cores, got %d", result.Hardware.CPUs[0].Cores)
}
if len(result.Hardware.Memory) != 1 {
t.Fatalf("expected 1 dimm, got %d", len(result.Hardware.Memory))
}
if result.Hardware.Memory[0].SizeMB != 65536 {
t.Fatalf("expected 65536MB, got %d", result.Hardware.Memory[0].SizeMB)
}
if len(result.Hardware.Storage) != 1 {
t.Fatalf("expected 1 disk, got %d", len(result.Hardware.Storage))
}
if result.Hardware.Storage[0].SerialNumber != "S6KLNN0Y516813" {
t.Fatalf("unexpected disk serial: %q", result.Hardware.Storage[0].SerialNumber)
}
if len(result.Hardware.PowerSupply) != 2 {
t.Fatalf("expected 2 PSUs from psu_cfg.ini, got %d", len(result.Hardware.PowerSupply))
}
if result.Hardware.PowerSupply[0].WattageW == 0 {
t.Fatalf("expected PSU wattage parsed, got 0")
}
if len(result.Hardware.NetworkAdapters) != 1 {
t.Fatalf("expected 1 host network adapter from hardware_info.ini, got %d", len(result.Hardware.NetworkAdapters))
}
macs := make(map[string]struct{})
var hostNIC models.NetworkAdapter
var hostNICFound bool
for _, nic := range result.Hardware.NetworkAdapters {
if len(nic.MACAddresses) == 0 {
t.Fatalf("expected MAC on network adapter %+v", nic)
}
for _, mac := range nic.MACAddresses {
macs[strings.ToLower(mac)] = struct{}{}
}
if strings.EqualFold(nic.Slot, "PCIe 1") && strings.Contains(strings.ToLower(nic.Model), "bcm957414") {
hostNIC = nic
hostNICFound = true
}
}
if !hostNICFound {
t.Fatalf("expected host NIC from hardware_info.ini, got %+v", result.Hardware.NetworkAdapters)
}
if _, ok := macs["e4:3d:1a:6f:b0:30"]; !ok {
t.Fatalf("expected host NIC MAC e4:3d:1a:6f:b0:30 in adapters, got %+v", result.Hardware.NetworkAdapters)
}
if _, ok := macs["e4:3d:1a:6f:b0:31"]; !ok {
t.Fatalf("expected host NIC MAC e4:3d:1a:6f:b0:31 in adapters, got %+v", result.Hardware.NetworkAdapters)
}
if !strings.Contains(strings.ToLower(hostNIC.Vendor), "broadcom") {
t.Fatalf("expected host NIC vendor enrichment from Vendor ID, got %q", hostNIC.Vendor)
}
if hostNIC.SerialNumber != "NICSN-G6-001" {
t.Fatalf("expected host NIC serial from PCIe card section, got %q", hostNIC.SerialNumber)
}
if hostNIC.PartNumber != "NICPN-G6-001" {
t.Fatalf("expected host NIC part number from PCIe card section, got %q", hostNIC.PartNumber)
}
if hostNIC.Firmware != "22.35.1010" {
t.Fatalf("expected host NIC firmware from PCIe card section, got %q", hostNIC.Firmware)
}
if len(result.Sensors) != 2 {
t.Fatalf("expected 2 sensors, got %d", len(result.Sensors))
}
if result.Sensors[0].Name != "Inlet_Temp" {
t.Fatalf("unexpected first sensor: %q", result.Sensors[0].Name)
}
if len(result.Events) != 2 {
t.Fatalf("expected 2 events, got %d", len(result.Events))
}
if result.Events[0].Timestamp.Year() != 2025 || result.Events[0].Timestamp.Month() != 7 {
t.Fatalf("expected SEL timestamp from payload, got %s", result.Events[0].Timestamp)
}
if result.Events[1].Severity != models.SeverityCritical {
t.Fatalf("expected critical severity for AC lost event, got %q", result.Events[1].Severity)
}
}
func TestParseH3CG5_PCIeArgumentsEnrichesNonNVMeStorage(t *testing.T) {
p := &G5Parser{}
files := []parser.ExtractedFile{
{
Path: "static/storage_disk.ini",
Content: []byte(`[Disk_000]
DiskSlotDesc=Front slot 3
Present=YES
SerialNumber=SAT-03
`),
},
{
Path: "static/NVMe_info.txt",
Content: []byte(`[NVMe_0]
Present=YES
DiskSlotDesc=Front slot 108
SerialNumber=NVME-108
`),
},
{
Path: "static/PCIe_arguments_table.xml",
Content: []byte(`<root>
<PCIE100>
<base_args>
<type>SSD</type>
<name>SSD-SATA-960G</name>
</base_args>
<type_get_args>
<bios_args>
<vendor_id>0x144D</vendor_id>
</bios_args>
</type_get_args>
</PCIE100>
<PCIE200>
<base_args>
<type>SSD</type>
<name>SSD-3.84T-NVMe-SFF</name>
</base_args>
<type_get_args>
<bios_args>
<vendor_id>0x144D</vendor_id>
</bios_args>
</type_get_args>
</PCIE200>
</root>`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if len(result.Hardware.Storage) != 2 {
t.Fatalf("expected 2 storage devices, got %d", len(result.Hardware.Storage))
}
var sata *models.Storage
var nvme *models.Storage
for i := range result.Hardware.Storage {
s := &result.Hardware.Storage[i]
switch s.SerialNumber {
case "SAT-03":
sata = s
case "NVME-108":
nvme = s
}
}
if sata == nil {
t.Fatalf("expected SATA storage SAT-03")
}
if sata.Model != "SSD-SATA-960G" {
t.Fatalf("expected SATA model enrichment from PCIe table, got %q", sata.Model)
}
if !strings.Contains(strings.ToLower(sata.Manufacturer), "samsung") {
t.Fatalf("expected SATA vendor enrichment to Samsung, got %q", sata.Manufacturer)
}
if nvme == nil {
t.Fatalf("expected NVMe storage NVME-108")
}
if nvme.Model != "SSD-3.84T-NVMe-SFF" {
t.Fatalf("expected NVMe model enrichment from PCIe table, got %q", nvme.Model)
}
if !strings.Contains(strings.ToLower(nvme.Manufacturer), "samsung") {
t.Fatalf("expected NVMe vendor enrichment to Samsung, got %q", nvme.Manufacturer)
}
}
func TestParseH3CG5_VariantLayout(t *testing.T) {
p := &G5Parser{}
files := []parser.ExtractedFile{
{
Path: "static/FRUInfo.ini",
Content: []byte(`[Baseboard]
Board Manufacturer=H3C
Product Product Name=H3C UniServer R4900 G5
Product Serial Number=02A6AX5231C003VM
`),
},
{
Path: "static/firmware_version.ini",
Content: []byte(`[System board]
BIOS Version : 5.59 V100R001B05D078
ME Version : 4.4.4.202
HDM Version : 3.34.01 HDM V100R001B05D078SP01
CPLD Version : V00C
`),
},
{
Path: "static/board_cfg.ini",
Content: []byte(`[Board Type]
Board Type : R4900 G5
[Board Version]
Board Version : VER.D
[Customer ID]
CustomerID : 255
[OEM ID]
OEM Flag : 1
`),
},
{
Path: "static/hardware_info.ini",
Content: []byte(`[Processors: Processor 1]
Model : Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz
Status : Normal
Frequency : 2800 MHz
Cores : 24
Threads : 48
L1 Cache : 1920 KB
L2 Cache : 30720 KB
L3 Cache : 36864 KB
CPU PPIN : 49-A9-50-C0-15-9F-2D-DC
[Processors: Processor 2]
Model : Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz
Status : Normal
Frequency : 2800 MHz
Cores : 24
Threads : 48
CPU PPIN : 49-AC-3D-BF-85-7F-17-58
[Memory Details: Dimm Index 0]
Location : Processor 1
Channel : 1
Socket ID : A0
Status : Normal
Size : 65536 MB
Maximum Frequency : 3200 MHz
Type : DDR4
Ranks : 2R DIMM
Technology : RDIMM
Part Number : M393A8G40AB2-CWE
Manufacture : Samsung
Serial Number : S02K0D0243351D7079
[Memory Details: Dimm Index 16]
Location : Processor 2
Channel : 1
Socket ID : A0
Status : Normal
Size : 65536 MB
Maximum Frequency : 3200 MHz
Type : DDR4
Ranks : 2R DIMM
Technology : RDIMM
Part Number : M393A8G40AB2-CWE
Manufacture : Samsung
Serial Number : S02K0D0243351D73F0
[Ethernet adapters: Port 1]
Device Type : NIC
Network Port : Port 1
Location : PCIE-[1]
MAC Address : E4:3D:1A:6F:B0:30
Speed : 8.0GT/s
Product Name : NIC-BCM957414-F-B-25Gb-2P
[Ethernet adapters: Port 2]
Device Type : NIC
Network Port : Port 2
Location : PCIE-[1]
MAC Address : E4:3D:1A:6F:B0:31
Speed : 8.0GT/s
Product Name : NIC-BCM957414-F-B-25Gb-2P
[Ethernet adapters: Port 1]
Device Type : NIC
Network Port : Port 1
Location : PCIE-[4]
MAC Address : E8:EB:D3:4F:2E:90
Speed : 8.0GT/s
Product Name : NIC-MCX512A-ACAT-2*25Gb-F
[Ethernet adapters: Port 2]
Device Type : NIC
Network Port : Port 2
Location : PCIE-[4]
MAC Address : E8:EB:D3:4F:2E:91
Speed : 8.0GT/s
Product Name : NIC-MCX512A-ACAT-2*25Gb-F
[PCIe Card: PCIe 1]
Location : 1
Product Name : NIC-BCM957414-F-B-25Gb-2P
Status : Normal
Vendor ID : 0x14E4
Device ID : 0x16D7
Serial Number : NICSN-G5-001
Part Number : NICPN-G5-001
Firmware Version : 21.80.1
[PCIe Card: PCIe 4]
Location : 4
Product Name : NIC-MCX512A-ACAT-2*25Gb-F
Status : Normal
Vendor ID : 0x15B3
Device ID : 0x1017
Serial Number : NICSN-G5-004
Part Number : NICPN-G5-004
Firmware Version : 28.33.15
`),
},
{
Path: "static/hardware.info",
Content: []byte(`[Disk_0_Front_NA]
Present=YES
SlotNum=0
FrontOrRear=Front
SerialNumber=22443C4EE184
[Nvme_Front slot 21]
Present=YES
NvmePhySlot=Front slot 21
SlotNum=121
SerialNumber=NVME-21
[Nvme_255_121]
Present=YES
SlotNum=121
SerialNumber=NVME-21
`),
},
{
Path: "static/raid.json",
Content: []byte(`{
"RAIDCONFIG": {
"Ctrl info": [
{
"CtrlDevice Slot": 3,
"CtrlDevice Name": "AVAGO MegaRAID SAS 9460-8i",
"LDInfo": [
{
"LD ID": 0,
"LD_name": "SystemRAID",
"RAID_level(RAID 0,RAID 1,RAID 5,RAID 6,RAID 00,RAID 10,RAID 50,RAID 60)": "RAID1",
"Logical_capicity(per 512byte)": 936640512
}
]
},
{
"CtrlDevice Slot": 6,
"CtrlDevice Name": "MegaRAID 9560-16i 8GB",
"LDInfo": [
{
"LD ID": 0,
"LD_name": "DataRAID",
"RAID_level(RAID 0,RAID 1,RAID 5,RAID 6,RAID 00,RAID 10,RAID 50,RAID 60)": "RAID50",
"Logical_capicity(per 512byte)": 90004783104
}
]
}
]
}
}`),
},
{
Path: "static/Raid_BP_Conf_Info.ini",
Content: []byte(`[BP Information]
Description | BP TYPE | I2cPort | BpConnectorNum | FrontOrRear | Node Num | DiskSlotRange |
8SFF SAS/SATA | BP_G5_8SFF | AUX_1 | ~ | ~ | ~ | ~ |
8SFF SAS/SATA | BP_G5_8SFF | AUX_2 | ~ | ~ | ~ | ~ |
8SFF SAS/SATA | BP_G5_8SFF | AUX_3 | ~ | ~ | ~ | ~ |
[RAID Information]
PCIE SLOT | RAID SAS_NUM |
3 | 2 |
6 | 4 |
`),
},
{
Path: "static/PCIe_arguments_table.xml",
Content: []byte(`<root>
<PCIE100>
<base_args>
<type>SSD</type>
<name>SSD-1.92T/3.84T-NVMe-EV-SFF-sa</name>
</base_args>
<type_get_args>
<bios_args>
<vendor_id>0x144D</vendor_id>
</bios_args>
</type_get_args>
</PCIE100>
</root>`),
},
{
Path: "static/psu_cfg.ini",
Content: []byte(`[Active / Standby configuration]
Power ID : 1
Present Status : Present
Cold Status : Active Power
Model : DPS-1300AB-6 R
SN : 210231ACT9H232000080
Max Power(W) : 1300
Power ID : 2
Present Status : Present
Cold Status : Active Power
Model : DPS-1300AB-6 R
SN : 210231ACT9H232000079
Max Power(W) : 1300
`),
},
{
Path: "static/net_cfg.ini",
Content: []byte(`[Network Configuration]
eth0 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F6
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth0.2 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F6
inet6 addr: fe80::32c6:d7ff:fe94:54f6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1496 Metric:1
eth1 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F5
inet addr:10.201.129.0 Bcast:10.201.143.255 Mask:255.255.240.0
inet6 addr: fe80::32c6:d7ff:fe94:54f5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
`),
},
{
Path: "static/smartdata/Front0/first_date_analysis.txt",
Content: []byte(`The Current System Time Is 2023_09_22_14_19_39
Model Info: ATA Micron_5300_MTFD
Serial Number: 22443C4EE184
`),
},
{
Path: "user/test1.csv",
Content: []byte(`Record Time Stamp,Severity Level,Severity Level ID,SensorTypeStr,SensorName,Event Dir,Event Occurred Time,DescInfo,Explanation,Suggestion
2025-04-01 08:50:13,Minor,0x1,NA,NA,NA,2025-04-01 08:50:13,"SSH login failed from IP: 10.200.10.121 user: admin"," "," "
Pre-Init,Info,0x0,Management Subsystem Health,Health,Assertion event,Pre-Init,"Management controller off-line"," "," "
2025-04-01 08:51:10,Major,0x2,Power Supply,PSU1_Status,Assertion event,2025-04-01 08:51:10,"Power Supply AC lost"," "," "
`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if len(result.Hardware.CPUs) != 2 {
t.Fatalf("expected 2 CPUs from hardware_info.ini, got %d", len(result.Hardware.CPUs))
}
if result.Hardware.CPUs[0].FrequencyMHz != 2800 {
t.Fatalf("expected CPU frequency 2800MHz, got %d", result.Hardware.CPUs[0].FrequencyMHz)
}
if len(result.Hardware.Memory) != 2 {
t.Fatalf("expected 2 DIMMs from hardware_info.ini, got %d", len(result.Hardware.Memory))
}
if result.Hardware.Memory[0].SizeMB != 65536 {
t.Fatalf("expected DIMM size 65536MB, got %d", result.Hardware.Memory[0].SizeMB)
}
if len(result.Hardware.Firmware) < 4 {
t.Fatalf("expected firmware entries from firmware_version.ini, got %d", len(result.Hardware.Firmware))
}
if result.Hardware.BoardInfo.Version == "" {
t.Fatalf("expected board version from board_cfg.ini")
}
if !strings.Contains(result.Hardware.BoardInfo.Description, "CustomerID: 255") {
t.Fatalf("expected board description enrichment from board_cfg.ini, got %q", result.Hardware.BoardInfo.Description)
}
if len(result.Hardware.Storage) != 2 {
t.Fatalf("expected 2 unique storage devices from hardware.info, got %d", len(result.Hardware.Storage))
}
var nvmeFound bool
var diskModelEnriched bool
for _, s := range result.Hardware.Storage {
if s.SerialNumber == "NVME-21" {
nvmeFound = true
if s.Type != "nvme" {
t.Fatalf("expected NVME-21 type nvme, got %q", s.Type)
}
if !strings.Contains(strings.ToLower(s.Manufacturer), "samsung") {
t.Fatalf("expected NVME vendor enrichment to Samsung, got %q", s.Manufacturer)
}
if s.Model != "SSD-1.92T/3.84T-NVMe-EV-SFF-sa" {
t.Fatalf("expected NVME model enrichment from PCIe table, got %q", s.Model)
}
}
if s.SerialNumber == "22443C4EE184" && strings.Contains(s.Model, "Micron") {
diskModelEnriched = true
}
}
if !nvmeFound {
t.Fatalf("expected deduped NVME storage by serial NVME-21")
}
if !diskModelEnriched {
t.Fatalf("expected disk model enrichment from smartdata by serial")
}
if len(result.Hardware.PowerSupply) != 2 {
t.Fatalf("expected 2 PSUs from psu_cfg.ini, got %d", len(result.Hardware.PowerSupply))
}
if result.Hardware.PowerSupply[0].WattageW == 0 {
t.Fatalf("expected PSU wattage parsed, got 0")
}
if len(result.Hardware.NetworkAdapters) != 2 {
t.Fatalf("expected 2 host network adapters from hardware_info.ini, got %d", len(result.Hardware.NetworkAdapters))
}
if len(result.Hardware.NetworkCards) != 2 {
t.Fatalf("expected 2 network cards synthesized from adapters, got %d", len(result.Hardware.NetworkCards))
}
var g5NIC models.NetworkAdapter
var g5NICFound bool
for _, nic := range result.Hardware.NetworkAdapters {
if strings.EqualFold(nic.Slot, "PCIe 1") && strings.Contains(strings.ToLower(nic.Model), "bcm957414") {
g5NIC = nic
g5NICFound = true
break
}
}
if !g5NICFound {
t.Fatalf("expected host NIC PCIe 1 from hardware_info.ini, got %+v", result.Hardware.NetworkAdapters)
}
if !strings.Contains(strings.ToLower(g5NIC.Vendor), "broadcom") {
t.Fatalf("expected G5 NIC vendor from Vendor ID, got %q", g5NIC.Vendor)
}
if g5NIC.SerialNumber != "NICSN-G5-001" {
t.Fatalf("expected G5 NIC serial from PCIe card section, got %q", g5NIC.SerialNumber)
}
if g5NIC.PartNumber != "NICPN-G5-001" {
t.Fatalf("expected G5 NIC part number from PCIe card section, got %q", g5NIC.PartNumber)
}
if g5NIC.Firmware != "21.80.1" {
t.Fatalf("expected G5 NIC firmware from PCIe card section, got %q", g5NIC.Firmware)
}
if len(result.Hardware.Devices) != 5 {
t.Fatalf("expected 5 topology devices from Raid_BP_Conf_Info.ini (3 BP + 2 RAID), got %d", len(result.Hardware.Devices))
}
var bpFound bool
var raidFound bool
for _, d := range result.Hardware.Devices {
if strings.Contains(d.ID, "h3c-bp-") && strings.Contains(d.Model, "BP_G5_8SFF") {
bpFound = true
}
desc, _ := d.Details["description"].(string)
if strings.Contains(d.ID, "h3c-raid-slot-3") && strings.Contains(desc, "SAS ports: 2") {
raidFound = true
}
}
if !bpFound || !raidFound {
t.Fatalf("expected parsed backplane and RAID topology devices, got %+v", result.Hardware.Devices)
}
if len(result.Hardware.Volumes) != 2 {
t.Fatalf("expected 2 RAID volumes (same LD ID on different controllers), got %d", len(result.Hardware.Volumes))
}
var raid1Found bool
var raid50Found bool
for _, v := range result.Hardware.Volumes {
if strings.Contains(v.Controller, "slot 3") {
raid1Found = v.RAIDLevel == "RAID1" && v.CapacityBytes > 0
}
if strings.Contains(v.Controller, "slot 6") {
raid50Found = v.RAIDLevel == "RAID50" && v.CapacityBytes > 0
}
}
if !raid1Found || !raid50Found {
t.Fatalf("expected RAID1 and RAID50 volumes with parsed capacities, got %+v", result.Hardware.Volumes)
}
if len(result.Events) != 2 {
t.Fatalf("expected 2 CSV events (Pre-Init skipped), got %d", len(result.Events))
}
if result.Events[0].Severity != models.SeverityWarning {
t.Fatalf("expected Minor CSV severity mapped to warning, got %q", result.Events[0].Severity)
}
if result.Events[1].Severity != models.SeverityCritical {
t.Fatalf("expected Major CSV severity mapped to critical, got %q", result.Events[1].Severity)
}
}

View File

@@ -3,12 +3,15 @@ package inspur
import (
"encoding/json"
"fmt"
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser/vendors/pciids"
)
var rawHexPCIDeviceRegex = regexp.MustCompile(`(?i)^0x[0-9a-f]+$`)
// AssetJSON represents the structure of Inspur asset.json file
type AssetJSON struct {
VersionInfo []struct {
@@ -55,6 +58,7 @@ type AssetJSON struct {
} `json:"MemInfo"`
HddInfo []struct {
PresentBitmap []int `json:"PresentBitmap"`
SerialNumber string `json:"SerialNumber"`
Manufacturer string `json:"Manufacturer"`
ModelName string `json:"ModelName"`
@@ -158,8 +162,19 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
}
// Parse storage info
seenHDDFW := make(map[string]bool)
for _, hdd := range asset.HddInfo {
slot := normalizeAssetHDDSlot(hdd.LocationString, hdd.Location, hdd.DiskInterfaceType)
modelName := strings.TrimSpace(hdd.ModelName)
serial := normalizeRedisValue(hdd.SerialNumber)
present := bitmapHasAnyValue(hdd.PresentBitmap)
if !present && (slot != "" || modelName != "" || serial != "" || hdd.Capacity > 0) {
present = true
}
if !present && slot == "" && modelName == "" && serial == "" && hdd.Capacity == 0 {
continue
}
storageType := "HDD"
if hdd.DiskInterfaceType == 5 {
storageType = "NVMe"
@@ -168,35 +183,21 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
}
// Resolve manufacturer: try vendor ID first, then model name extraction
modelName := strings.TrimSpace(hdd.ModelName)
manufacturer := resolveManufacturer(hdd.Manufacturer, modelName)
config.Storage = append(config.Storage, models.Storage{
Slot: hdd.LocationString,
Slot: slot,
Type: storageType,
Model: modelName,
SizeGB: hdd.Capacity,
SerialNumber: hdd.SerialNumber,
SerialNumber: serial,
Manufacturer: manufacturer,
Firmware: hdd.FirmwareVersion,
Interface: diskInterfaceToString(hdd.DiskInterfaceType),
Present: present,
})
// Add HDD firmware to firmware list (deduplicated by model+version)
if hdd.FirmwareVersion != "" {
fwKey := modelName + ":" + hdd.FirmwareVersion
if !seenHDDFW[fwKey] {
slot := hdd.LocationString
if slot == "" {
slot = fmt.Sprintf("%s %dGB", storageType, hdd.Capacity)
}
config.Firmware = append(config.Firmware, models.FirmwareInfo{
DeviceName: fmt.Sprintf("%s (%s)", modelName, slot),
Version: hdd.FirmwareVersion,
})
seenHDDFW[fwKey] = true
}
}
// Disk firmware is already stored in Storage.Firmware — do not duplicate in Hardware.Firmware.
}
// Parse PCIe info
@@ -207,8 +208,8 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
VendorID: pcie.VendorId,
DeviceID: pcie.DeviceId,
BDF: formatBDF(pcie.BusNumber, pcie.DeviceNumber, pcie.FunctionNumber),
LinkWidth: pcie.NegotiatedLinkWidth,
LinkSpeed: pcieLinkSpeedToString(pcie.CurrentLinkSpeed),
LinkWidth: pcie.NegotiatedLinkWidth,
LinkSpeed: pcieLinkSpeedToString(pcie.CurrentLinkSpeed),
MaxLinkWidth: pcie.MaxLinkWidth,
MaxLinkSpeed: pcieLinkSpeedToString(pcie.MaxLinkSpeed),
DeviceClass: pcieClassToString(pcie.ClassCode, pcie.SubClassCode),
@@ -225,25 +226,22 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
}
// Use device name from PCI IDs database if available
if deviceName != "" {
device.DeviceClass = deviceName
device.DeviceClass = normalizeModelLabel(deviceName)
}
config.PCIeDevices = append(config.PCIeDevices, device)
// Extract GPUs (class 3 = display controller)
if pcie.ClassCode == 3 {
gpuModel := deviceName
if gpuModel == "" {
gpuModel = pcieClassToString(pcie.ClassCode, pcie.SubClassCode)
}
gpuModel := normalizeGPUModel(pcie.VendorId, pcie.DeviceId, deviceName, pcie.ClassCode, pcie.SubClassCode)
gpu := models.GPU{
Slot: pcie.LocString,
Model: gpuModel,
Manufacturer: vendor,
VendorID: pcie.VendorId,
DeviceID: pcie.DeviceId,
BDF: formatBDF(pcie.BusNumber, pcie.DeviceNumber, pcie.FunctionNumber),
CurrentLinkWidth: pcie.NegotiatedLinkWidth,
CurrentLinkSpeed: pcieLinkSpeedToString(pcie.CurrentLinkSpeed),
Slot: pcie.LocString,
Model: gpuModel,
Manufacturer: vendor,
VendorID: pcie.VendorId,
DeviceID: pcie.DeviceId,
BDF: formatBDF(pcie.BusNumber, pcie.DeviceNumber, pcie.FunctionNumber),
CurrentLinkWidth: pcie.NegotiatedLinkWidth,
CurrentLinkSpeed: pcieLinkSpeedToString(pcie.CurrentLinkSpeed),
MaxLinkWidth: pcie.MaxLinkWidth,
MaxLinkSpeed: pcieLinkSpeedToString(pcie.MaxLinkSpeed),
}
@@ -260,6 +258,45 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
return config, nil
}
func normalizeModelLabel(v string) string {
v = strings.TrimSpace(v)
if v == "" {
return ""
}
return strings.Join(strings.Fields(v), " ")
}
func normalizeGPUModel(vendorID, deviceID int, model string, classCode, subClass int) string {
model = normalizeModelLabel(model)
if model == "" || rawHexPCIDeviceRegex.MatchString(model) || isGenericGPUModelLabel(model) {
if pciModel := normalizeModelLabel(pciids.DeviceName(vendorID, deviceID)); pciModel != "" {
model = pciModel
}
}
if model == "" || isGenericGPUModelLabel(model) {
model = pcieClassToString(classCode, subClass)
}
// Last fallback for unknown NVIDIA display devices: expose PCI DeviceID
// instead of generic "3D Controller".
if (model == "" || strings.EqualFold(model, "3D Controller")) && vendorID == 0x10de && deviceID > 0 {
return fmt.Sprintf("0x%04X", deviceID)
}
return model
}
func isGenericGPUModelLabel(model string) bool {
switch strings.ToLower(strings.TrimSpace(model)) {
case "", "gpu", "display", "display controller", "vga", "3d controller", "other", "unknown":
return true
default:
return false
}
}
func memoryTypeToString(memType int) string {
switch memType {
case 26:
@@ -284,6 +321,29 @@ func diskInterfaceToString(ifType int) string {
}
}
func normalizeAssetHDDSlot(locationString string, location int, diskInterfaceType int) string {
slot := strings.TrimSpace(locationString)
if slot != "" {
return slot
}
if location < 0 {
return ""
}
if diskInterfaceType == 5 {
return fmt.Sprintf("OB%02d", location+1)
}
return fmt.Sprintf("%d", location)
}
func bitmapHasAnyValue(values []int) bool {
for _, v := range values {
if v != 0 {
return true
}
}
return false
}
func pcieLinkSpeedToString(speed int) string {
switch speed {
case 1:

View File

@@ -0,0 +1,48 @@
package inspur
import "testing"
func TestParseAssetJSON_NVIDIAGPUModelFromPCIIDs(t *testing.T) {
raw := []byte(`{
"VersionInfo": [],
"CpuInfo": [],
"MemInfo": {"MemCommonInfo": [], "DimmInfo": []},
"HddInfo": [],
"PcieInfo": [{
"VendorId": 4318,
"DeviceId": 9019,
"BusNumber": 12,
"DeviceNumber": 0,
"FunctionNumber": 0,
"MaxLinkWidth": 16,
"MaxLinkSpeed": 5,
"NegotiatedLinkWidth": 16,
"CurrentLinkSpeed": 5,
"ClassCode": 3,
"SubClassCode": 2,
"PcieSlot": 11,
"LocString": "#CPU0_PCIE2",
"PartNumber": null,
"SerialNumber": null,
"Mac": []
}]
}`)
hw, err := ParseAssetJSON(raw)
if err != nil {
t.Fatalf("ParseAssetJSON failed: %v", err)
}
if len(hw.GPUs) != 1 {
t.Fatalf("expected 1 GPU, got %d", len(hw.GPUs))
}
if hw.GPUs[0].Model != "GH100 [H200 NVL]" {
t.Fatalf("expected model GH100 [H200 NVL], got %q", hw.GPUs[0].Model)
}
}
func TestNormalizeGPUModel_FallbackToDeviceIDForUnknownNVIDIA(t *testing.T) {
got := normalizeGPUModel(0x10de, 0xbeef, "0xBEEF\t", 3, 2)
if got != "0xBEEF" {
t.Fatalf("expected 0xBEEF, got %q", got)
}
}

View File

@@ -8,6 +8,7 @@ import (
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser/vendors/pciids"
)
// ParseComponentLog parses component.log file and extracts detailed hardware info
@@ -45,27 +46,38 @@ func ParseComponentLogEvents(content []byte) []models.Event {
// Parse RESTful Memory info for Warning/Error status
memEvents := parseMemoryEvents(text)
events = append(events, memEvents...)
events = append(events, parseFanEvents(text)...)
return events
}
// ParseComponentLogSensors extracts sensor readings from component.log JSON sections.
func ParseComponentLogSensors(content []byte) []models.SensorReading {
text := string(content)
var out []models.SensorReading
out = append(out, parseFanSensors(text)...)
out = append(out, parseDiskBackplaneSensors(text)...)
out = append(out, parsePSUSummarySensors(text)...)
return out
}
// MemoryRESTInfo represents the RESTful Memory info structure
type MemoryRESTInfo struct {
MemModules []struct {
MemModID int `json:"mem_mod_id"`
ConfigStatus int `json:"config_status"`
MemModSlot string `json:"mem_mod_slot"`
MemModStatus int `json:"mem_mod_status"`
MemModSize int `json:"mem_mod_size"`
MemModType string `json:"mem_mod_type"`
MemModTechnology string `json:"mem_mod_technology"`
MemModFrequency int `json:"mem_mod_frequency"`
MemModCurrentFreq int `json:"mem_mod_current_frequency"`
MemModVendor string `json:"mem_mod_vendor"`
MemModPartNum string `json:"mem_mod_part_num"`
MemModSerial string `json:"mem_mod_serial_num"`
MemModRanks int `json:"mem_mod_ranks"`
Status string `json:"status"`
MemModID int `json:"mem_mod_id"`
ConfigStatus int `json:"config_status"`
MemModSlot string `json:"mem_mod_slot"`
MemModStatus int `json:"mem_mod_status"`
MemModSize int `json:"mem_mod_size"`
MemModType string `json:"mem_mod_type"`
MemModTechnology string `json:"mem_mod_technology"`
MemModFrequency int `json:"mem_mod_frequency"`
MemModCurrentFreq int `json:"mem_mod_current_frequency"`
MemModVendor string `json:"mem_mod_vendor"`
MemModPartNum string `json:"mem_mod_part_num"`
MemModSerial string `json:"mem_mod_serial_num"`
MemModRanks int `json:"mem_mod_ranks"`
Status string `json:"status"`
} `json:"mem_modules"`
TotalMemoryCount int `json:"total_memory_count"`
PresentMemoryCount int `json:"present_memory_count"`
@@ -112,21 +124,21 @@ func parseMemoryInfo(text string, hw *models.HardwareConfig) {
// PSURESTInfo represents the RESTful PSU info structure
type PSURESTInfo struct {
PowerSupplies []struct {
ID int `json:"id"`
Present int `json:"present"`
VendorID string `json:"vendor_id"`
Model string `json:"model"`
SerialNum string `json:"serial_num"`
PartNum string `json:"part_num"`
FwVer string `json:"fw_ver"`
InputType string `json:"input_type"`
Status string `json:"status"`
RatedPower int `json:"rated_power"`
PSInPower int `json:"ps_in_power"`
PSOutPower int `json:"ps_out_power"`
PSInVolt float64 `json:"ps_in_volt"`
PSOutVolt float64 `json:"ps_out_volt"`
PSUMaxTemp int `json:"psu_max_temperature"`
ID int `json:"id"`
Present int `json:"present"`
VendorID string `json:"vendor_id"`
Model string `json:"model"`
SerialNum string `json:"serial_num"`
PartNum string `json:"part_num"`
FwVer string `json:"fw_ver"`
InputType string `json:"input_type"`
Status string `json:"status"`
RatedPower int `json:"rated_power"`
PSInPower int `json:"ps_in_power"`
PSOutPower int `json:"ps_out_power"`
PSInVolt float64 `json:"ps_in_volt"`
PSOutVolt float64 `json:"ps_out_volt"`
PSUMaxTemp int `json:"psu_max_temperature"`
} `json:"power_supplies"`
PresentPowerReading int `json:"present_power_reading"`
}
@@ -209,20 +221,49 @@ func parseHDDInfo(text string, hw *models.HardwareConfig) {
})
for _, hdd := range hddInfo {
if hdd.Present == 1 {
hddMap[hdd.LocationString] = struct {
slot := strings.TrimSpace(hdd.LocationString)
if slot == "" {
slot = fmt.Sprintf("HDD%d", hdd.ID)
}
hddMap[slot] = struct {
SN string
Model string
Firmware string
Mfr string
}{
SN: strings.TrimSpace(hdd.SN),
SN: normalizeRedisValue(hdd.SN),
Model: strings.TrimSpace(hdd.Model),
Firmware: strings.TrimSpace(hdd.Firmware),
Firmware: normalizeRedisValue(hdd.Firmware),
Mfr: strings.TrimSpace(hdd.Manufacture),
}
}
}
// Merge into existing inventory first (asset/other sections).
for i := range hw.Storage {
slot := strings.TrimSpace(hw.Storage[i].Slot)
if slot == "" {
continue
}
detail, ok := hddMap[slot]
if !ok {
continue
}
if normalizeRedisValue(hw.Storage[i].SerialNumber) == "" {
hw.Storage[i].SerialNumber = detail.SN
}
if hw.Storage[i].Model == "" {
hw.Storage[i].Model = detail.Model
}
if normalizeRedisValue(hw.Storage[i].Firmware) == "" {
hw.Storage[i].Firmware = detail.Firmware
}
if hw.Storage[i].Manufacturer == "" {
hw.Storage[i].Manufacturer = detail.Mfr
}
hw.Storage[i].Present = true
}
// If storage is empty, populate from HDD info
if len(hw.Storage) == 0 {
for _, hdd := range hddInfo {
@@ -239,21 +280,42 @@ func parseHDDInfo(text string, hw *models.HardwareConfig) {
if hdd.CapableSpeed == 12 {
iface = "SAS"
}
slot := strings.TrimSpace(hdd.LocationString)
if slot == "" {
slot = fmt.Sprintf("HDD%d", hdd.ID)
}
hw.Storage = append(hw.Storage, models.Storage{
Slot: hdd.LocationString,
Slot: slot,
Type: storType,
Model: model,
SizeGB: hdd.Capacity,
SerialNumber: strings.TrimSpace(hdd.SN),
SerialNumber: normalizeRedisValue(hdd.SN),
Manufacturer: extractStorageManufacturer(model),
Firmware: strings.TrimSpace(hdd.Firmware),
Firmware: normalizeRedisValue(hdd.Firmware),
Interface: iface,
Present: true,
})
}
}
}
// FanRESTInfo represents the RESTful fan info structure.
type FanRESTInfo struct {
Fans []struct {
ID int `json:"id"`
FanName string `json:"fan_name"`
Present string `json:"present"`
Status string `json:"status"`
StatusStr string `json:"status_str"`
SpeedRPM int `json:"speed_rpm"`
SpeedPercent int `json:"speed_percent"`
MaxSpeedRPM int `json:"max_speed_rpm"`
FanModel string `json:"fan_model"`
} `json:"fans"`
FansPower int `json:"fans_power"`
}
// NetworkAdapterRESTInfo represents the RESTful Network Adapter info structure
type NetworkAdapterRESTInfo struct {
SysAdapters []struct {
@@ -304,17 +366,28 @@ func parseNetworkAdapterInfo(text string, hw *models.HardwareConfig) {
}
}
model := normalizeModelLabel(adapter.Model)
if model == "" || looksLikeRawDeviceID(model) {
if resolved := normalizeModelLabel(pciids.DeviceName(adapter.VendorID, adapter.DeviceID)); resolved != "" {
model = resolved
}
}
vendor := normalizeModelLabel(adapter.Vendor)
if vendor == "" {
vendor = normalizeModelLabel(pciids.VendorName(adapter.VendorID))
}
hw.NetworkAdapters = append(hw.NetworkAdapters, models.NetworkAdapter{
Slot: fmt.Sprintf("Slot %d", adapter.Slot),
Location: adapter.Location,
Present: adapter.Present == 1,
Model: strings.TrimSpace(adapter.Model),
Vendor: strings.TrimSpace(adapter.Vendor),
Model: model,
Vendor: vendor,
VendorID: adapter.VendorID,
DeviceID: adapter.DeviceID,
SerialNumber: strings.TrimSpace(adapter.SN),
PartNumber: strings.TrimSpace(adapter.PN),
Firmware: adapter.FwVer,
SerialNumber: normalizeRedisValue(adapter.SN),
PartNumber: normalizeRedisValue(adapter.PN),
Firmware: normalizeRedisValue(adapter.FwVer),
PortCount: adapter.PortNum,
PortType: adapter.PortType,
MACAddresses: macs,
@@ -323,6 +396,223 @@ func parseNetworkAdapterInfo(text string, hw *models.HardwareConfig) {
}
}
func parseFanSensors(text string) []models.SensorReading {
re := regexp.MustCompile(`RESTful fan info:\s*(\{[\s\S]*?\})\s*RESTful diskbackplane`)
match := re.FindStringSubmatch(text)
if match == nil {
return nil
}
jsonStr := strings.ReplaceAll(match[1], "\n", "")
var fanInfo FanRESTInfo
if err := json.Unmarshal([]byte(jsonStr), &fanInfo); err != nil {
return nil
}
out := make([]models.SensorReading, 0, len(fanInfo.Fans)+1)
for _, fan := range fanInfo.Fans {
name := strings.TrimSpace(fan.FanName)
if name == "" {
name = fmt.Sprintf("FAN%d", fan.ID)
}
status := normalizeComponentStatus(fan.StatusStr, fan.Status, fan.Present)
raw := fmt.Sprintf("rpm=%d pct=%d model=%s max_rpm=%d", fan.SpeedRPM, fan.SpeedPercent, fan.FanModel, fan.MaxSpeedRPM)
out = append(out, models.SensorReading{
Name: name,
Type: "fan_speed",
Value: float64(fan.SpeedRPM),
Unit: "RPM",
RawValue: raw,
Status: status,
})
}
if fanInfo.FansPower > 0 {
out = append(out, models.SensorReading{
Name: "Fans_Power",
Type: "power",
Value: float64(fanInfo.FansPower),
Unit: "W",
RawValue: fmt.Sprintf("%d", fanInfo.FansPower),
Status: "OK",
})
}
return out
}
func parseFanEvents(text string) []models.Event {
re := regexp.MustCompile(`RESTful fan info:\s*(\{[\s\S]*?\})\s*RESTful diskbackplane`)
match := re.FindStringSubmatch(text)
if match == nil {
return nil
}
jsonStr := strings.ReplaceAll(match[1], "\n", "")
var fanInfo FanRESTInfo
if err := json.Unmarshal([]byte(jsonStr), &fanInfo); err != nil {
return nil
}
var events []models.Event
for _, fan := range fanInfo.Fans {
status := normalizeComponentStatus(fan.StatusStr, fan.Status, fan.Present)
if isHealthyComponentStatus(status) {
continue
}
name := strings.TrimSpace(fan.FanName)
if name == "" {
name = fmt.Sprintf("FAN%d", fan.ID)
}
severity := models.SeverityWarning
lowStatus := strings.ToLower(status)
if strings.Contains(lowStatus, "critical") || strings.Contains(lowStatus, "fail") || strings.Contains(lowStatus, "error") {
severity = models.SeverityCritical
}
events = append(events, models.Event{
ID: fmt.Sprintf("fan_%d_status", fan.ID),
Timestamp: time.Now(),
Source: "Fan",
SensorType: "fan",
SensorName: name,
EventType: "Fan Status",
Severity: severity,
Description: fmt.Sprintf("%s reports %s", name, status),
RawData: fmt.Sprintf("rpm=%d pct=%d model=%s", fan.SpeedRPM, fan.SpeedPercent, fan.FanModel),
})
}
return events
}
func parseDiskBackplaneSensors(text string) []models.SensorReading {
re := regexp.MustCompile(`RESTful diskbackplane info:\s*(\[[\s\S]*?\])\s*BMC`)
match := re.FindStringSubmatch(text)
if match == nil {
return nil
}
jsonStr := strings.ReplaceAll(match[1], "\n", "")
var backplaneInfo DiskBackplaneRESTInfo
if err := json.Unmarshal([]byte(jsonStr), &backplaneInfo); err != nil {
return nil
}
out := make([]models.SensorReading, 0, len(backplaneInfo))
for _, bp := range backplaneInfo {
if bp.Present != 1 {
continue
}
name := fmt.Sprintf("Backplane%d_Temp", bp.BackplaneIndex)
status := "OK"
if bp.Temperature <= 0 {
status = "unknown"
}
raw := fmt.Sprintf("front=%d ports=%d drives=%d cpld=%s", bp.Front, bp.PortCount, bp.DriverCount, bp.CPLDVersion)
out = append(out, models.SensorReading{
Name: name,
Type: "temperature",
Value: float64(bp.Temperature),
Unit: "C",
RawValue: raw,
Status: status,
})
}
return out
}
func parsePSUSummarySensors(text string) []models.SensorReading {
re := regexp.MustCompile(`RESTful PSU info:\s*(\{[\s\S]*?\})\s*RESTful Network`)
match := re.FindStringSubmatch(text)
if match == nil {
return nil
}
jsonStr := strings.ReplaceAll(match[1], "\n", "")
var psuInfo PSURESTInfo
if err := json.Unmarshal([]byte(jsonStr), &psuInfo); err != nil {
return nil
}
out := make([]models.SensorReading, 0, len(psuInfo.PowerSupplies)*3+1)
if psuInfo.PresentPowerReading > 0 {
out = append(out, models.SensorReading{
Name: "PSU_Present_Power_Reading",
Type: "power",
Value: float64(psuInfo.PresentPowerReading),
Unit: "W",
RawValue: fmt.Sprintf("%d", psuInfo.PresentPowerReading),
Status: "OK",
})
}
for _, psu := range psuInfo.PowerSupplies {
if psu.Present != 1 {
continue
}
status := normalizeComponentStatus(psu.Status)
out = append(out, models.SensorReading{
Name: fmt.Sprintf("PSU%d_InputPower", psu.ID),
Type: "power",
Value: float64(psu.PSInPower),
Unit: "W",
RawValue: fmt.Sprintf("%d", psu.PSInPower),
Status: status,
})
out = append(out, models.SensorReading{
Name: fmt.Sprintf("PSU%d_OutputPower", psu.ID),
Type: "power",
Value: float64(psu.PSOutPower),
Unit: "W",
RawValue: fmt.Sprintf("%d", psu.PSOutPower),
Status: status,
})
out = append(out, models.SensorReading{
Name: fmt.Sprintf("PSU%d_Temp", psu.ID),
Type: "temperature",
Value: float64(psu.PSUMaxTemp),
Unit: "C",
RawValue: fmt.Sprintf("%d", psu.PSUMaxTemp),
Status: status,
})
}
return out
}
func normalizeComponentStatus(values ...string) string {
for _, v := range values {
s := strings.TrimSpace(v)
if s == "" {
continue
}
return s
}
return "unknown"
}
func isHealthyComponentStatus(status string) bool {
switch strings.ToLower(strings.TrimSpace(status)) {
case "", "ok", "normal", "present", "enabled":
return true
default:
return false
}
}
var rawDeviceIDLikeRegex = regexp.MustCompile(`(?i)^(?:0x)?[0-9a-f]{3,4}$`)
func looksLikeRawDeviceID(v string) bool {
v = strings.TrimSpace(v)
if v == "" {
return true
}
return rawDeviceIDLikeRegex.MatchString(v)
}
func parseMemoryEvents(text string) []models.Event {
var events []models.Event
@@ -452,28 +742,88 @@ func parseDiskBackplaneInfo(text string, hw *models.HardwareConfig) {
return
}
// Create storage entries based on backplane info
presentByBackplane := make(map[int]int)
totalPresent := 0
for _, bp := range backplaneInfo {
if bp.Present != 1 {
continue
}
if bp.DriverCount <= 0 {
continue
}
limit := bp.DriverCount
if bp.PortCount > 0 && limit > bp.PortCount {
limit = bp.PortCount
}
presentByBackplane[bp.BackplaneIndex] = limit
totalPresent += limit
}
if totalPresent == 0 {
return
}
existingPresent := countPresentStorage(hw.Storage)
remaining := totalPresent - existingPresent
if remaining <= 0 {
return
}
for _, bp := range backplaneInfo {
if bp.Present != 1 || remaining <= 0 {
continue
}
driveCount := presentByBackplane[bp.BackplaneIndex]
if driveCount <= 0 {
continue
}
location := "Rear"
if bp.Front == 1 {
location = "Front"
}
// Create entries for each port (disk slot)
for i := 0; i < bp.PortCount; i++ {
isPresent := i < bp.DriverCount
for i := 0; i < driveCount && remaining > 0; i++ {
slot := fmt.Sprintf("BP%d:%d", bp.BackplaneIndex, i)
if hasStorageSlot(hw.Storage, slot) {
continue
}
hw.Storage = append(hw.Storage, models.Storage{
Slot: fmt.Sprintf("%d", i),
Present: isPresent,
Slot: slot,
Present: true,
Location: location,
BackplaneID: bp.BackplaneIndex,
Type: "HDD",
})
remaining--
}
}
}
func countPresentStorage(storage []models.Storage) int {
count := 0
for _, dev := range storage {
if dev.Present {
count++
continue
}
if strings.TrimSpace(dev.Slot) != "" && (normalizeRedisValue(dev.Model) != "" || normalizeRedisValue(dev.SerialNumber) != "" || dev.SizeGB > 0) {
count++
}
}
return count
}
func hasStorageSlot(storage []models.Storage, slot string) bool {
slot = strings.ToLower(strings.TrimSpace(slot))
if slot == "" {
return false
}
for _, dev := range storage {
if strings.ToLower(strings.TrimSpace(dev.Slot)) == slot {
return true
}
}
return false
}

View File

@@ -0,0 +1,166 @@
package inspur
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestParseNetworkAdapterInfo_ResolvesModelFromPCIIDsForRawHexModel(t *testing.T) {
text := `RESTful Network Adapter info:
{
"sys_adapters": [
{
"id": 1,
"name": "NIC1",
"Location": "#CPU0_PCIE4",
"present": 1,
"slot": 4,
"vendor_id": 32902,
"device_id": 5409,
"vendor": "",
"model": "0x1521",
"fw_ver": "",
"status": "OK",
"sn": "",
"pn": "",
"port_num": 4,
"port_type": "Base-T",
"ports": []
}
]
}
RESTful fan`
hw := &models.HardwareConfig{}
parseNetworkAdapterInfo(text, hw)
if len(hw.NetworkAdapters) != 1 {
t.Fatalf("expected 1 network adapter, got %d", len(hw.NetworkAdapters))
}
got := hw.NetworkAdapters[0]
if got.Model == "" {
t.Fatalf("expected NIC model resolved from pci.ids, got empty")
}
if !strings.Contains(strings.ToUpper(got.Model), "I350") {
t.Fatalf("expected I350 in model, got %q", got.Model)
}
if got.Vendor == "" {
t.Fatalf("expected NIC vendor resolved from pci.ids")
}
}
func TestParseComponentLogSensors_ExtractsFanBackplaneAndPSUSummary(t *testing.T) {
text := `RESTful PSU info:
{
"power_supplies": [
{ "id": 0, "present": 1, "status": "OK", "ps_in_power": 123, "ps_out_power": 110, "psu_max_temperature": 41 }
],
"present_power_reading": 999
}
RESTful Network Adapter info:
{ "sys_adapters": [] }
RESTful fan info:
{
"fans": [
{ "id": 1, "fan_name": "FAN0_F_Speed", "present": "OK", "status": "OK", "status_str": "OK", "speed_rpm": 9200, "speed_percent": 35, "max_speed_rpm": 20000, "fan_model": "6056" }
],
"fans_power": 33
}
RESTful diskbackplane info:
[
{ "port_count": 8, "driver_count": 4, "front": 1, "backplane_index": 0, "present": 1, "cpld_version": "3.1", "temperature": 18 }
]
BMC`
sensors := ParseComponentLogSensors([]byte(text))
if len(sensors) == 0 {
t.Fatalf("expected sensors from component.log, got none")
}
has := func(name string) bool {
for _, s := range sensors {
if s.Name == name {
return true
}
}
return false
}
if !has("FAN0_F_Speed") {
t.Fatalf("expected FAN0_F_Speed sensor in parsed output")
}
if !has("Backplane0_Temp") {
t.Fatalf("expected Backplane0_Temp sensor in parsed output")
}
if !has("PSU_Present_Power_Reading") {
t.Fatalf("expected PSU_Present_Power_Reading sensor in parsed output")
}
}
func TestParseComponentLogEvents_FanCriticalStatus(t *testing.T) {
text := `RESTful fan info:
{
"fans": [
{ "id": 7, "fan_name": "FAN3_R_Speed", "present": "OK", "status": "Critical", "status_str": "Critical", "speed_rpm": 0, "speed_percent": 0, "max_speed_rpm": 20000, "fan_model": "6056" }
],
"fans_power": 0
}
RESTful diskbackplane info:
[]
BMC`
events := ParseComponentLogEvents([]byte(text))
if len(events) != 1 {
t.Fatalf("expected 1 fan event, got %d", len(events))
}
if events[0].EventType != "Fan Status" {
t.Fatalf("expected Fan Status event type, got %q", events[0].EventType)
}
if events[0].Severity != models.SeverityCritical {
t.Fatalf("expected critical severity, got %q", events[0].Severity)
}
}
func TestParseHDDInfo_MergesIntoExistingStorage(t *testing.T) {
text := `RESTful HDD info:
[
{
"id": 1,
"present": 1,
"enable": 1,
"SN": "SER123",
"model": "Sample SSD",
"capacity": 1024,
"manufacture": "ACME",
"firmware": "1.0.0",
"locationstring": "OB01",
"capablespeed": 6
}
]
RESTful PSU`
hw := &models.HardwareConfig{
Storage: []models.Storage{
{
Slot: "OB01",
Type: "SSD",
},
},
}
parseHDDInfo(text, hw)
if len(hw.Storage) != 1 {
t.Fatalf("expected 1 storage item, got %d", len(hw.Storage))
}
if hw.Storage[0].SerialNumber != "SER123" {
t.Fatalf("expected serial from HDD section, got %q", hw.Storage[0].SerialNumber)
}
if hw.Storage[0].Model != "Sample SSD" {
t.Fatalf("expected model from HDD section, got %q", hw.Storage[0].Model)
}
if hw.Storage[0].Firmware != "1.0.0" {
t.Fatalf("expected firmware from HDD section, got %q", hw.Storage[0].Firmware)
}
}

View File

@@ -103,8 +103,9 @@ func extractBoardInfo(fruList []models.FRUInfo, hw *models.HardwareConfig) {
return
}
// Look for the main board/chassis FRU entry
// Usually it's the first entry or one with "Builtin FRU" or containing board info
// Look for the main board/chassis FRU entry.
// Keep the first non-empty serial as the server serial and avoid overwriting it
// with module-specific serials (e.g., SCM_FRU).
for _, fru := range fruList {
// Skip empty entries
if fru.ProductName == "" && fru.SerialNumber == "" {
@@ -118,25 +119,23 @@ func extractBoardInfo(fruList []models.FRUInfo, hw *models.HardwareConfig) {
strings.Contains(desc, "chassis") ||
strings.Contains(desc, "board")
// If we haven't set board info yet, or this is a main board entry
if hw.BoardInfo.ProductName == "" || isMainBoard {
if fru.ProductName != "" {
hw.BoardInfo.ProductName = fru.ProductName
}
if fru.SerialNumber != "" {
hw.BoardInfo.SerialNumber = fru.SerialNumber
}
if fru.Manufacturer != "" {
hw.BoardInfo.Manufacturer = fru.Manufacturer
}
if fru.PartNumber != "" {
hw.BoardInfo.PartNumber = fru.PartNumber
}
if fru.SerialNumber != "" && hw.BoardInfo.SerialNumber == "" {
hw.BoardInfo.SerialNumber = fru.SerialNumber
}
if fru.ProductName != "" && (hw.BoardInfo.ProductName == "" || isMainBoard) {
hw.BoardInfo.ProductName = fru.ProductName
}
// Manufacturer from non-main FRU entries (e.g. PSU vendor) should not become server vendor.
if fru.Manufacturer != "" && isMainBoard && hw.BoardInfo.Manufacturer == "" {
hw.BoardInfo.Manufacturer = fru.Manufacturer
}
if fru.PartNumber != "" && (hw.BoardInfo.PartNumber == "" || isMainBoard) {
hw.BoardInfo.PartNumber = fru.PartNumber
}
// If we found a main board entry, stop searching
if isMainBoard && fru.ProductName != "" && fru.SerialNumber != "" {
break
}
// Main board entry with complete data is good enough to stop.
if isMainBoard && hw.BoardInfo.ProductName != "" && hw.BoardInfo.SerialNumber != "" {
break
}
}
}

View File

@@ -0,0 +1,59 @@
package inspur
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestExtractBoardInfo_PreservesBuiltinSerial(t *testing.T) {
hw := &models.HardwareConfig{}
fruList := []models.FRUInfo{
{
Description: "Builtin FRU Device (ID 0)",
SerialNumber: "21D634101",
},
{
Description: "SCM_FRU (ID 8)",
SerialNumber: "CAR509K10613C10",
ProductName: "CA",
Manufacturer: "inagile",
PartNumber: "YZCA-02758-105",
},
}
extractBoardInfo(fruList, hw)
if hw.BoardInfo.SerialNumber != "21D634101" {
t.Fatalf("expected board serial 21D634101, got %q", hw.BoardInfo.SerialNumber)
}
if hw.BoardInfo.ProductName != "CA" {
t.Fatalf("expected product name CA, got %q", hw.BoardInfo.ProductName)
}
}
func TestExtractBoardInfo_DoesNotUsePSUVendorAsBoardManufacturer(t *testing.T) {
hw := &models.HardwareConfig{}
fruList := []models.FRUInfo{
{
Description: "Builtin FRU Device (ID 0)",
SerialNumber: "2KD605238",
},
{
Description: "PSU0_FRU (ID 30)",
SerialNumber: "PMR315HS10F1A",
ProductName: "AP-CR3000F12BY",
Manufacturer: "APLUSPOWER",
PartNumber: "18XA1M43400C2",
},
}
extractBoardInfo(fruList, hw)
if hw.BoardInfo.SerialNumber != "2KD605238" {
t.Fatalf("expected board serial 2KD605238, got %q", hw.BoardInfo.SerialNumber)
}
if hw.BoardInfo.Manufacturer != "" {
t.Fatalf("expected empty board manufacturer, got %q", hw.BoardInfo.Manufacturer)
}
}

View File

@@ -0,0 +1,117 @@
package inspur
import (
"regexp"
"sort"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
var reFaultGPU = regexp.MustCompile(`\bF_GPU(\d+)\b`)
func applyGPUStatusFromEvents(hw *models.HardwareConfig, events []models.Event) {
if hw == nil || len(hw.GPUs) == 0 {
return
}
gpuByIndex := make(map[int]*models.GPU)
for i := range hw.GPUs {
gpu := &hw.GPUs[i]
idx, ok := extractLogicalGPUIndex(gpu.Slot)
if !ok {
continue
}
gpuByIndex[idx] = gpu
gpu.StatusHistory = nil
gpu.ErrorDescription = ""
}
relevantEvents := make([]models.Event, 0)
for _, e := range events {
if !isGPUFaultEvent(e) || len(extractFaultyGPUSet(e.Description)) == 0 {
continue
}
relevantEvents = append(relevantEvents, e)
}
if len(relevantEvents) == 0 {
for _, gpu := range gpuByIndex {
if strings.TrimSpace(gpu.Status) == "" {
gpu.Status = "OK"
}
}
return
}
sort.Slice(relevantEvents, func(i, j int) bool {
return relevantEvents[i].Timestamp.Before(relevantEvents[j].Timestamp)
})
currentStatus := make(map[int]string, len(gpuByIndex))
lastCriticalDetails := make(map[int]string, len(gpuByIndex))
for idx := range gpuByIndex {
currentStatus[idx] = "OK"
}
for _, e := range relevantEvents {
faultySet := extractFaultyGPUSet(e.Description)
for idx, gpu := range gpuByIndex {
newStatus := "OK"
if faultySet[idx] {
newStatus = "Critical"
lastCriticalDetails[idx] = strings.TrimSpace(e.Description)
}
if currentStatus[idx] != newStatus {
gpu.StatusHistory = append(gpu.StatusHistory, models.StatusHistoryEntry{
Status: newStatus,
ChangedAt: e.Timestamp,
Details: strings.TrimSpace(e.Description),
})
ts := e.Timestamp
gpu.StatusChangedAt = &ts
currentStatus[idx] = newStatus
}
ts := e.Timestamp
gpu.StatusCheckedAt = &ts
}
}
for idx, gpu := range gpuByIndex {
gpu.Status = currentStatus[idx]
if gpu.Status == "Critical" {
gpu.ErrorDescription = lastCriticalDetails[idx]
} else {
gpu.ErrorDescription = ""
}
if gpu.StatusCheckedAt == nil && strings.TrimSpace(gpu.Status) == "" {
gpu.Status = "OK"
}
}
}
func extractFaultyGPUSet(description string) map[int]bool {
faulty := make(map[int]bool)
matches := reFaultGPU.FindAllStringSubmatch(description, -1)
for _, m := range matches {
if len(m) < 2 {
continue
}
idx, err := strconv.Atoi(m[1])
if err == nil && idx >= 0 {
faulty[idx] = true
}
}
return faulty
}
func isGPUFaultEvent(e models.Event) bool {
desc := strings.ToLower(e.Description)
if strings.Contains(desc, "bios miss f_gpu") {
return true
}
return strings.EqualFold(strings.TrimSpace(e.ID), "17FFB002")
}

View File

@@ -0,0 +1,69 @@
package inspur
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestAppendHGXFirmwareFromHWInfo_AppendsInventoryEntries(t *testing.T) {
hw := &models.HardwareConfig{
Firmware: []models.FirmwareInfo{
{DeviceName: "BIOS", Version: "1.0.0"},
},
}
content := []byte(`
{
"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/HGX_FW_BMC_0",
"Id": "HGX_FW_BMC_0",
"Oem": {
"Nvidia": {
"ActiveFirmwareSlot": {"Version": "25.05-A"},
"InactiveFirmwareSlot": {"Version": "25.04-B"}
}
},
"Version": "25.05-A",
"WriteProtected": false
}
{
"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/HGX_FW_GPU_SXM_1",
"Id": "HGX_FW_GPU_SXM_1",
"Version": "97.00.C5.00.0E",
"WriteProtected": false
}
{
"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/HGX_Driver_GPU_SXM_1",
"Id": "HGX_Driver_GPU_SXM_1",
"Version": "",
"WriteProtected": false
}
`)
appendHGXFirmwareFromHWInfo(content, hw)
if len(hw.Firmware) != 5 {
t.Fatalf("expected 5 firmware entries after append, got %d", len(hw.Firmware))
}
seen := make(map[string]string)
for _, fw := range hw.Firmware {
seen[fw.DeviceName] = fw.Version
}
if seen["HGX_FW_BMC_0"] != "25.05-A" {
t.Fatalf("expected HGX_FW_BMC_0 version 25.05-A, got %q", seen["HGX_FW_BMC_0"])
}
if seen["HGX_FW_BMC_0 Active Slot"] != "25.05-A" {
t.Fatalf("expected active slot version, got %q", seen["HGX_FW_BMC_0 Active Slot"])
}
if seen["HGX_FW_BMC_0 Inactive Slot"] != "25.04-B" {
t.Fatalf("expected inactive slot version, got %q", seen["HGX_FW_BMC_0 Inactive Slot"])
}
if seen["HGX_FW_GPU_SXM_1"] != "97.00.C5.00.0E" {
t.Fatalf("expected GPU FW entry, got %q", seen["HGX_FW_GPU_SXM_1"])
}
if _, ok := seen["HGX_Driver_GPU_SXM_1"]; ok {
t.Fatalf("did not expect empty version driver entry")
}
}

View File

@@ -0,0 +1,171 @@
package inspur
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestEnrichGPUsFromHGXHWInfo_UsesHGXLogicalMapping(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#GPU6"},
{Slot: "#GPU7"},
{Slot: "#GPU0"},
{Slot: "#CPU0_PE1_E_BMC", Model: "AST2500 VGA"},
},
}
content := []byte(`
# curl -X GET http://127.0.0.1/redfish/v1/Chassis/HGX_GPU_SXM_1/Assembly
{"Name":"GPU Board Assembly","Model":"B200 180GB HBM3e","PartNumber":"PN1","SerialNumber":"SXM1SN"}
# curl -X GET http://127.0.0.1/redfish/v1/Chassis/HGX_GPU_SXM_3/Assembly
{"Name":"GPU Board Assembly","Model":"B200 180GB HBM3e","PartNumber":"PN3","SerialNumber":"SXM3SN"}
# curl -X GET http://127.0.0.1/redfish/v1/Chassis/HGX_GPU_SXM_5/Assembly
{"Name":"GPU Board Assembly","Model":"B200 180GB HBM3e","PartNumber":"PN5","SerialNumber":"SXM5SN"}
{"Id":"HGX_FW_GPU_SXM_1","Version":"FW1"}
{"Id":"HGX_FW_GPU_SXM_3","Version":"FW3"}
{"Id":"HGX_FW_GPU_SXM_5","Version":"FW5"}
{"Id":"HGX_InfoROM_GPU_SXM_3","Version":"IR3"}
`)
enrichGPUsFromHGXHWInfo(content, hw)
if hw.GPUs[0].SerialNumber != "SXM3SN" {
t.Fatalf("expected #GPU6 to map to SXM3 serial, got %q", hw.GPUs[0].SerialNumber)
}
if hw.GPUs[1].SerialNumber != "SXM1SN" {
t.Fatalf("expected #GPU7 to map to SXM1 serial, got %q", hw.GPUs[1].SerialNumber)
}
if hw.GPUs[2].SerialNumber != "SXM5SN" {
t.Fatalf("expected #GPU0 to map to SXM5 serial, got %q", hw.GPUs[2].SerialNumber)
}
if hw.GPUs[0].Firmware != "FW3" {
t.Fatalf("expected #GPU6 firmware FW3, got %q", hw.GPUs[0].Firmware)
}
if hw.GPUs[0].VideoBIOS != "IR3" {
t.Fatalf("expected #GPU6 InfoROM in VideoBIOS IR3, got %q", hw.GPUs[0].VideoBIOS)
}
if hw.GPUs[2].Firmware != "FW5" {
t.Fatalf("expected #GPU0 firmware FW5, got %q", hw.GPUs[2].Firmware)
}
for _, g := range hw.GPUs {
if g.Slot == "#CPU0_PE1_E_BMC" {
t.Fatalf("expected non-HGX BMC VGA entry to be filtered out")
}
}
}
func TestEnrichGPUsFromHGXHWInfo_AddsMissingLogicalGPU(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#GPU0"},
{Slot: "#GPU1"},
{Slot: "#GPU2"},
{Slot: "#GPU3"},
{Slot: "#GPU4"},
{Slot: "#GPU5"},
{Slot: "#GPU7"},
},
}
content := []byte(`
# curl -X GET http://127.0.0.1/redfish/v1/Chassis/HGX_GPU_SXM_3/Assembly
{"Name":"GPU Board Assembly","Model":"B200 180GB HBM3e","PartNumber":"PN3","SerialNumber":"SXM3SN"}
`)
enrichGPUsFromHGXHWInfo(content, hw)
found := false
for _, g := range hw.GPUs {
if g.Slot == "#GPU6" {
found = true
if g.SerialNumber != "SXM3SN" {
t.Fatalf("expected synthesized #GPU6 serial SXM3SN, got %q", g.SerialNumber)
}
}
}
if !found {
t.Fatalf("expected synthesized #GPU6 entry")
}
}
func TestApplyGPUStatusFromEvents_MarksFaultedGPU(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#GPU6"},
{Slot: "#GPU5"},
},
}
events := []models.Event{
{
ID: "17FFB002",
Timestamp: time.Now(),
Description: "PCIe Present mismatch BIOS miss F_GPU6",
},
}
applyGPUStatusFromEvents(hw, events)
if hw.GPUs[0].Status != "Critical" {
t.Fatalf("expected #GPU6 status Critical, got %q", hw.GPUs[0].Status)
}
if hw.GPUs[1].Status != "OK" {
t.Fatalf("expected healthy GPU status OK, got %q", hw.GPUs[1].Status)
}
}
func TestApplyGPUStatusFromEvents_UsesLatestEventAsCurrentStatusAndKeepsHistory(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#GPU1"},
{Slot: "#GPU3"},
{Slot: "#GPU6"},
},
}
events := []models.Event{
{
ID: "17FFB002",
Timestamp: time.Date(2026, 1, 12, 22, 51, 16, 0, time.FixedZone("UTC+8", 8*3600)),
Description: "PCIe Present mismatch BIOS miss F_GPU1 F_GPU3 F_GPU6",
},
{
ID: "17FFB002",
Timestamp: time.Date(2026, 1, 12, 23, 5, 18, 0, time.FixedZone("UTC+8", 8*3600)),
Description: "PCIe Present mismatch BIOS miss F_GPU6",
},
}
applyGPUStatusFromEvents(hw, events)
if hw.GPUs[0].Status != "OK" {
t.Fatalf("expected #GPU1 to recover to OK on latest event, got %q", hw.GPUs[0].Status)
}
if hw.GPUs[1].Status != "OK" {
t.Fatalf("expected #GPU3 to recover to OK on latest event, got %q", hw.GPUs[1].Status)
}
if hw.GPUs[2].Status != "Critical" {
t.Fatalf("expected #GPU6 to remain Critical, got %q", hw.GPUs[2].Status)
}
if len(hw.GPUs[0].StatusHistory) == 0 {
t.Fatalf("expected #GPU1 status history to be populated")
}
}
func TestParseIDLLog_ParsesStructuredJSONLine(t *testing.T) {
content := []byte(`{ "MESSAGE": "|2026-01-12T23:05:18+08:00|PCIE|Assert|Critical|17FFB002|PCIe Present mismatch BIOS miss F_GPU6 - Assert|" }`)
events := ParseIDLLog(content)
if len(events) != 1 {
t.Fatalf("expected 1 event from JSON line, got %d", len(events))
}
if events[0].ID != "17FFB002" {
t.Fatalf("expected event ID 17FFB002, got %q", events[0].ID)
}
if events[0].Source != "PCIE" {
t.Fatalf("expected source PCIE, got %q", events[0].Source)
}
}

View File

@@ -0,0 +1,360 @@
package inspur
import (
"fmt"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
type hgxGPUAssemblyInfo struct {
Model string
Part string
Serial string
}
type hgxGPUFirmwareInfo struct {
Firmware string
InfoROM string
}
type hgxFirmwareInventoryEntry struct {
ID string
Version string
ActiveVersion string
InactiveVersion string
}
// Logical GPU index mapping used by HGX B200 UI ordering.
// Example from real logs/UI:
// GPU0->SXM5, GPU1->SXM7, GPU2->SXM6, GPU3->SXM8, GPU4->SXM2, GPU5->SXM4, GPU6->SXM3, GPU7->SXM1.
var hgxLogicalToSXM = map[int]int{
0: 5,
1: 7,
2: 6,
3: 8,
4: 2,
5: 4,
6: 3,
7: 1,
}
var (
reHGXGPUBlock = regexp.MustCompile(`(?s)/redfish/v1/Chassis/HGX_GPU_SXM_(\d+)/Assembly.*?"Name":\s*"GPU Board Assembly".*?"Model":\s*"([^"]+)".*?"PartNumber":\s*"([^"]+)".*?"SerialNumber":\s*"([^"]+)"`)
reHGXFWBlock = regexp.MustCompile(`(?s)"Id":\s*"HGX_FW_GPU_SXM_(\d+)".*?"Version":\s*"([^"]*)"`)
reHGXInfoROM = regexp.MustCompile(`(?s)"Id":\s*"HGX_InfoROM_GPU_SXM_(\d+)".*?"Version":\s*"([^"]*)"`)
reIDLine = regexp.MustCompile(`"Id":\s*"([^"]+)"`)
reVersion = regexp.MustCompile(`"Version":\s*"([^"]*)"`)
reSlotGPU = regexp.MustCompile(`(?i)gpu\s*#?\s*(\d+)`)
)
func enrichGPUsFromHGXHWInfo(content []byte, hw *models.HardwareConfig) {
if hw == nil || len(hw.GPUs) == 0 || len(content) == 0 {
return
}
bySXM := parseHGXGPUAssembly(content)
if len(bySXM) == 0 {
return
}
fwBySXM := parseHGXGPUFirmware(content)
normalizeHGXGPUInventory(hw, bySXM)
for i := range hw.GPUs {
gpu := &hw.GPUs[i]
logicalIdx, ok := extractLogicalGPUIndex(gpu.Slot)
if !ok {
// Keep existing info if slot index cannot be determined.
continue
}
sxm := resolveSXMIndex(logicalIdx, bySXM)
info, found := bySXM[sxm]
if !found {
continue
}
if strings.TrimSpace(gpu.SerialNumber) == "" {
gpu.SerialNumber = info.Serial
}
if shouldReplaceGPUModel(gpu.Model) {
gpu.Model = info.Model
}
if strings.TrimSpace(gpu.PartNumber) == "" {
gpu.PartNumber = info.Part
}
if strings.TrimSpace(gpu.Manufacturer) == "" {
gpu.Manufacturer = "NVIDIA"
}
if fw, ok := fwBySXM[sxm]; ok {
if strings.TrimSpace(gpu.Firmware) == "" && strings.TrimSpace(fw.Firmware) != "" {
gpu.Firmware = fw.Firmware
}
if strings.TrimSpace(gpu.VideoBIOS) == "" && strings.TrimSpace(fw.InfoROM) != "" {
gpu.VideoBIOS = fw.InfoROM
}
}
}
}
func appendHGXFirmwareFromHWInfo(content []byte, hw *models.HardwareConfig) {
if hw == nil || len(content) == 0 {
return
}
entries := parseHGXFirmwareInventory(content)
if len(entries) == 0 {
return
}
existing := make(map[string]bool, len(hw.Firmware))
for _, fw := range hw.Firmware {
key := strings.ToLower(strings.TrimSpace(fw.DeviceName) + "|" + strings.TrimSpace(fw.Version))
existing[key] = true
}
appendFW := func(name, version string) {
name = strings.TrimSpace(name)
version = strings.TrimSpace(version)
if name == "" || version == "" {
return
}
key := strings.ToLower(name + "|" + version)
if existing[key] {
return
}
existing[key] = true
hw.Firmware = append(hw.Firmware, models.FirmwareInfo{
DeviceName: name,
Version: version,
})
}
for _, e := range entries {
appendFW(e.ID, e.Version)
if e.ActiveVersion != "" && e.InactiveVersion != "" && e.ActiveVersion != e.InactiveVersion {
appendFW(e.ID+" Active Slot", e.ActiveVersion)
appendFW(e.ID+" Inactive Slot", e.InactiveVersion)
}
}
}
func parseHGXGPUAssembly(content []byte) map[int]hgxGPUAssemblyInfo {
result := make(map[int]hgxGPUAssemblyInfo)
matches := reHGXGPUBlock.FindAllSubmatch(content, -1)
for _, m := range matches {
if len(m) != 5 {
continue
}
sxmIdx, err := strconv.Atoi(string(m[1]))
if err != nil || sxmIdx <= 0 {
continue
}
result[sxmIdx] = hgxGPUAssemblyInfo{
Model: strings.TrimSpace(string(m[2])),
Part: strings.TrimSpace(string(m[3])),
Serial: strings.TrimSpace(string(m[4])),
}
}
return result
}
func parseHGXGPUFirmware(content []byte) map[int]hgxGPUFirmwareInfo {
result := make(map[int]hgxGPUFirmwareInfo)
matchesFW := reHGXFWBlock.FindAllSubmatch(content, -1)
for _, m := range matchesFW {
if len(m) != 3 {
continue
}
sxmIdx, err := strconv.Atoi(string(m[1]))
if err != nil || sxmIdx <= 0 {
continue
}
version := strings.TrimSpace(string(m[2]))
if version == "" {
continue
}
current := result[sxmIdx]
if current.Firmware == "" {
current.Firmware = version
}
result[sxmIdx] = current
}
matchesInfoROM := reHGXInfoROM.FindAllSubmatch(content, -1)
for _, m := range matchesInfoROM {
if len(m) != 3 {
continue
}
sxmIdx, err := strconv.Atoi(string(m[1]))
if err != nil || sxmIdx <= 0 {
continue
}
version := strings.TrimSpace(string(m[2]))
if version == "" {
continue
}
current := result[sxmIdx]
if current.InfoROM == "" {
current.InfoROM = version
}
result[sxmIdx] = current
}
return result
}
func parseHGXFirmwareInventory(content []byte) []hgxFirmwareInventoryEntry {
lines := strings.Split(string(content), "\n")
result := make([]hgxFirmwareInventoryEntry, 0)
var current *hgxFirmwareInventoryEntry
section := ""
flush := func() {
if current == nil {
return
}
if current.Version == "" && current.ActiveVersion == "" && current.InactiveVersion == "" {
current = nil
section = ""
return
}
result = append(result, *current)
current = nil
section = ""
}
for _, line := range lines {
if m := reIDLine.FindStringSubmatch(line); len(m) > 1 {
flush()
id := strings.TrimSpace(m[1])
if strings.HasPrefix(id, "HGX_") {
current = &hgxFirmwareInventoryEntry{ID: id}
}
continue
}
if current == nil {
continue
}
if strings.Contains(line, `"ActiveFirmwareSlot"`) {
section = "active"
}
if strings.Contains(line, `"InactiveFirmwareSlot"`) {
section = "inactive"
}
if m := reVersion.FindStringSubmatch(line); len(m) > 1 {
version := strings.TrimSpace(m[1])
if version == "" {
section = ""
continue
}
switch section {
case "active":
if current.ActiveVersion == "" {
current.ActiveVersion = version
}
case "inactive":
if current.InactiveVersion == "" {
current.InactiveVersion = version
}
default:
// Keep top-level version from the last seen plain "Version" in current entry.
current.Version = version
}
section = ""
}
}
flush()
return result
}
func extractLogicalGPUIndex(slot string) (int, bool) {
m := reSlotGPU.FindStringSubmatch(slot)
if len(m) < 2 {
return 0, false
}
idx, err := strconv.Atoi(m[1])
if err != nil || idx < 0 {
return 0, false
}
return idx, true
}
func resolveSXMIndex(logicalIdx int, bySXM map[int]hgxGPUAssemblyInfo) int {
if sxm, ok := hgxLogicalToSXM[logicalIdx]; ok {
if _, exists := bySXM[sxm]; exists {
return sxm
}
}
identity := logicalIdx + 1
if _, exists := bySXM[identity]; exists {
return identity
}
return identity
}
func shouldReplaceGPUModel(model string) bool {
trimmed := strings.TrimSpace(model)
if trimmed == "" {
return true
}
switch strings.ToLower(trimmed) {
case "vga", "3d controller", "display controller", "unknown":
return true
default:
return false
}
}
func normalizeHGXGPUInventory(hw *models.HardwareConfig, bySXM map[int]hgxGPUAssemblyInfo) {
// Keep only logical HGX GPUs (#GPU0..#GPU7) and remove BMC VGA entries.
filtered := make([]models.GPU, 0, len(hw.GPUs))
present := make(map[int]bool)
for _, gpu := range hw.GPUs {
idx, ok := extractLogicalGPUIndex(gpu.Slot)
if !ok || idx < 0 || idx > 7 {
continue
}
present[idx] = true
filtered = append(filtered, gpu)
}
// If some logical GPUs are missing in asset.json, add placeholders from HGX Redfish assembly.
for logicalIdx := 0; logicalIdx <= 7; logicalIdx++ {
if present[logicalIdx] {
continue
}
sxm := resolveSXMIndex(logicalIdx, bySXM)
info, ok := bySXM[sxm]
if !ok {
continue
}
filtered = append(filtered, models.GPU{
Slot: fmt.Sprintf("#GPU%d", logicalIdx),
Model: info.Model,
Manufacturer: "NVIDIA",
SerialNumber: info.Serial,
PartNumber: info.Part,
})
}
hw.GPUs = filtered
}

View File

@@ -8,8 +8,10 @@ import (
"git.mchus.pro/mchus/logpile/internal/models"
)
// ParseIDLLog parses the IDL (Inspur Diagnostic Log) file for BMC alarms
// Format: |timestamp|component|type|severity|eventID|description|
// ParseIDLLog parses IDL-style entries for BMC alarms.
// Works for both plain idl.log lines and JSON structured logs (idl_json/run_json)
// where MESSAGE/LOG2_FMTMSG contains:
// |timestamp|component|type|severity|eventID|description|
func ParseIDLLog(content []byte) []models.Event {
var events []models.Event
@@ -21,10 +23,6 @@ func ParseIDLLog(content []byte) []models.Event {
seenEvents := make(map[string]bool) // Deduplicate events
for _, line := range lines {
if !strings.Contains(line, "CommerDiagnose") {
continue
}
matches := re.FindStringSubmatch(line)
if matches == nil {
continue

View File

@@ -8,6 +8,7 @@ package inspur
import (
"fmt"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
@@ -15,7 +16,7 @@ import (
// parserVersion - version of this parser module
// IMPORTANT: Increment this version when making changes to parser logic!
const parserVersion = "1.0.0"
const parserVersion = "1.5"
func init() {
parser.Register(&Parser{})
@@ -86,6 +87,8 @@ func containsInspurMarkers(content []byte) bool {
// Parse parses Inspur/Kaytus archive
func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, error) {
selLocation := inferInspurArchiveLocation(files)
result := &models.AnalysisResult{
Events: make([]models.Event, 0),
FRU: make([]models.FRUInfo, 0),
@@ -123,17 +126,29 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
// Extract events from component.log (memory errors, etc.)
componentEvents := ParseComponentLogEvents(f.Content)
result.Events = append(result.Events, componentEvents...)
// Extract additional telemetry sensors from component.log sections
// (fan RPM, backplane temperature, PSU summary power, etc.).
componentSensors := ParseComponentLogSensors(f.Content)
result.Sensors = mergeSensorReadings(result.Sensors, componentSensors)
}
// Parse IDL log (BMC alarms/diagnose events)
if f := parser.FindFileByName(files, "idl.log"); f != nil {
// Enrich runtime component data from Redis snapshot (serials, FW, telemetry),
// when text logs miss these fields.
if f := parser.FindFileByName(files, "redis-dump.rdb"); f != nil && result.Hardware != nil {
enrichFromRedisDump(f.Content, result.Hardware)
}
// Parse IDL-like logs (plain and structured JSON logs with embedded IDL messages)
idlFiles := parser.FindFileByPattern(files, "/idl.log", "idl_json.log", "run_json.log")
for _, f := range idlFiles {
idlEvents := ParseIDLLog(f.Content)
result.Events = append(result.Events, idlEvents...)
}
// Parse SEL list (selelist.csv)
if f := parser.FindFileByName(files, "selelist.csv"); f != nil {
selEvents := ParseSELList(f.Content)
selEvents := ParseSELListWithLocation(f.Content, selLocation)
result.Events = append(result.Events, selEvents...)
}
@@ -144,9 +159,71 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
result.Events = append(result.Events, events...)
}
// Fallback for archives where board serial is missing in parsed FRU/asset data:
// recover it from log content, never from archive filename.
if strings.TrimSpace(result.Hardware.BoardInfo.SerialNumber) == "" {
if serial := inferBoardSerialFromFallbackLogs(files); serial != "" {
result.Hardware.BoardInfo.SerialNumber = serial
}
}
if strings.TrimSpace(result.Hardware.BoardInfo.ProductName) == "" {
if model := inferBoardModelFromFallbackLogs(files); model != "" {
result.Hardware.BoardInfo.ProductName = model
}
}
// Enrich GPU inventory from HGX Redfish snapshot (serial/model/part mapping).
if f := parser.FindFileByName(files, "HGX_HWInfo_FWVersion.log"); f != nil && result.Hardware != nil {
enrichGPUsFromHGXHWInfo(f.Content, result.Hardware)
appendHGXFirmwareFromHWInfo(f.Content, result.Hardware)
}
// Mark problematic GPUs from IDL errors like "BIOS miss F_GPU6".
if result.Hardware != nil {
applyGPUStatusFromEvents(result.Hardware, result.Events)
enrichStorageFromSerialFallbackFiles(files, result.Hardware)
}
return result, nil
}
func inferInspurArchiveLocation(files []parser.ExtractedFile) *time.Location {
fallback := parser.DefaultArchiveLocation()
f := parser.FindFileByName(files, "timezone.conf")
if f == nil {
return fallback
}
locName := parseTimezoneConfigLocation(f.Content)
if strings.TrimSpace(locName) == "" {
return fallback
}
loc, err := time.LoadLocation(locName)
if err != nil {
return fallback
}
return loc
}
func parseTimezoneConfigLocation(content []byte) string {
lines := strings.Split(string(content), "\n")
for _, line := range lines {
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "[") || strings.HasPrefix(line, "#") || strings.HasPrefix(line, ";") {
continue
}
parts := strings.SplitN(line, "=", 2)
if len(parts) != 2 {
continue
}
key := strings.ToLower(strings.TrimSpace(parts[0]))
val := strings.TrimSpace(parts[1])
if key == "timezone" && val != "" {
return val
}
}
return ""
}
func (p *Parser) parseDeviceFruSDR(content []byte, result *models.AnalysisResult) {
lines := string(content)
@@ -174,14 +251,9 @@ func (p *Parser) parseDeviceFruSDR(content []byte, result *models.AnalysisResult
// This supplements data from asset.json with serial numbers, firmware, etc.
pcieDevicesFromREST := ParsePCIeDevices(content)
// Merge PCIe data: keep asset.json data but add RESTful data if available
// Merge PCIe data: asset.json is the base inventory, RESTful data enriches names/links/serials.
if result.Hardware != nil {
// If asset.json didn't have PCIe devices, use RESTful data
if len(result.Hardware.PCIeDevices) == 0 && len(pcieDevicesFromREST) > 0 {
result.Hardware.PCIeDevices = pcieDevicesFromREST
}
// If we have both, merge them (RESTful data takes precedence for detailed info)
// For now, we keep asset.json data which has more details
result.Hardware.PCIeDevices = MergePCIeDevices(result.Hardware.PCIeDevices, pcieDevicesFromREST)
}
// Parse GPU devices and add temperature data from sensors
@@ -236,3 +308,38 @@ func extractSlotNumberFromGPU(slot string) int {
}
return 0
}
func mergeSensorReadings(base, extra []models.SensorReading) []models.SensorReading {
if len(extra) == 0 {
return base
}
out := append([]models.SensorReading{}, base...)
seen := make(map[string]struct{}, len(out))
for _, s := range out {
if key := sensorMergeKey(s); key != "" {
seen[key] = struct{}{}
}
}
for _, s := range extra {
key := sensorMergeKey(s)
if key != "" {
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
}
out = append(out, s)
}
return out
}
func sensorMergeKey(s models.SensorReading) string {
name := strings.ToLower(strings.TrimSpace(s.Name))
if name == "" {
return ""
}
return name
}

View File

@@ -3,36 +3,38 @@ package inspur
import (
"encoding/json"
"fmt"
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser/vendors/pciids"
)
// PCIeRESTInfo represents the RESTful PCIE Device info structure
type PCIeRESTInfo []struct {
ID int `json:"id"`
Present int `json:"present"`
Enable int `json:"enable"`
Status int `json:"status"`
VendorID int `json:"vendor_id"`
VendorName string `json:"vendor_name"`
DeviceID int `json:"device_id"`
DeviceName string `json:"device_name"`
BusNum int `json:"bus_num"`
DevNum int `json:"dev_num"`
FuncNum int `json:"func_num"`
MaxLinkWidth int `json:"max_link_width"`
MaxLinkSpeed int `json:"max_link_speed"`
CurrentLinkWidth int `json:"current_link_width"`
CurrentLinkSpeed int `json:"current_link_speed"`
Slot int `json:"slot"`
Location string `json:"location"`
DeviceLocator string `json:"DeviceLocator"`
DevType int `json:"dev_type"`
DevSubtype int `json:"dev_subtype"`
PartNum string `json:"part_num"`
SerialNum string `json:"serial_num"`
FwVer string `json:"fw_ver"`
ID int `json:"id"`
Present int `json:"present"`
Enable int `json:"enable"`
Status int `json:"status"`
VendorID int `json:"vendor_id"`
VendorName string `json:"vendor_name"`
DeviceID int `json:"device_id"`
DeviceName string `json:"device_name"`
BusNum int `json:"bus_num"`
DevNum int `json:"dev_num"`
FuncNum int `json:"func_num"`
MaxLinkWidth int `json:"max_link_width"`
MaxLinkSpeed int `json:"max_link_speed"`
CurrentLinkWidth int `json:"current_link_width"`
CurrentLinkSpeed int `json:"current_link_speed"`
Slot int `json:"slot"`
Location string `json:"location"`
DeviceLocator string `json:"DeviceLocator"`
DevType int `json:"dev_type"`
DevSubtype int `json:"dev_subtype"`
PartNum string `json:"part_num"`
SerialNum string `json:"serial_num"`
FwVer string `json:"fw_ver"`
}
// ParsePCIeDevices parses RESTful PCIE Device info from devicefrusdr.log
@@ -73,9 +75,27 @@ func ParsePCIeDevices(content []byte) []models.PCIeDevice {
// Determine device class based on dev_type
deviceClass := determineDeviceClass(pcie.DevType, pcie.DevSubtype, pcie.DeviceName)
_, pciDeviceName := pciids.DeviceInfo(pcie.VendorID, pcie.DeviceID)
// Build BDF string
bdf := fmt.Sprintf("%04x/%02x/%02x/%02x", 0, pcie.BusNum, pcie.DevNum, pcie.FuncNum)
// Build BDF string in canonical form (bb:dd.f)
bdf := formatBDF(pcie.BusNum, pcie.DevNum, pcie.FuncNum)
partNumber := strings.TrimSpace(pcie.PartNum)
if partNumber == "" {
partNumber = sanitizePCIeDeviceName(pcie.DeviceName)
}
if partNumber == "" {
partNumber = normalizeModelLabel(pciDeviceName)
}
if isGenericPCIeClass(deviceClass) {
if resolved := normalizeModelLabel(pciDeviceName); resolved != "" {
deviceClass = resolved
}
}
manufacturer := strings.TrimSpace(pcie.VendorName)
if manufacturer == "" {
manufacturer = normalizeModelLabel(pciids.VendorName(pcie.VendorID))
}
device := models.PCIeDevice{
Slot: pcie.Location,
@@ -83,12 +103,12 @@ func ParsePCIeDevices(content []byte) []models.PCIeDevice {
DeviceID: pcie.DeviceID,
BDF: bdf,
DeviceClass: deviceClass,
Manufacturer: pcie.VendorName,
Manufacturer: manufacturer,
LinkWidth: pcie.CurrentLinkWidth,
LinkSpeed: currentSpeed,
MaxLinkWidth: pcie.MaxLinkWidth,
MaxLinkSpeed: maxSpeed,
PartNumber: strings.TrimSpace(pcie.PartNum),
PartNumber: partNumber,
SerialNumber: strings.TrimSpace(pcie.SerialNum),
}
@@ -98,6 +118,149 @@ func ParsePCIeDevices(content []byte) []models.PCIeDevice {
return devices
}
var rawHexDeviceNameRegex = regexp.MustCompile(`(?i)^0x[0-9a-f]+$`)
func sanitizePCIeDeviceName(name string) string {
name = strings.TrimSpace(name)
if name == "" {
return ""
}
if strings.EqualFold(name, "N/A") {
return ""
}
if rawHexDeviceNameRegex.MatchString(name) {
return ""
}
return name
}
// MergePCIeDevices enriches base devices (from asset.json) with detailed RESTful PCIe data.
// Matching is done by BDF first, then by slot fallback.
func MergePCIeDevices(base []models.PCIeDevice, rest []models.PCIeDevice) []models.PCIeDevice {
if len(rest) == 0 {
return base
}
if len(base) == 0 {
return append([]models.PCIeDevice(nil), rest...)
}
type ref struct {
index int
}
byBDF := make(map[string]ref, len(base))
bySlot := make(map[string]ref, len(base))
for i := range base {
bdf := normalizePCIeBDF(base[i].BDF)
if bdf != "" {
byBDF[bdf] = ref{index: i}
}
slot := strings.ToLower(strings.TrimSpace(base[i].Slot))
if slot != "" {
bySlot[slot] = ref{index: i}
}
}
for _, detailed := range rest {
idx := -1
if bdf := normalizePCIeBDF(detailed.BDF); bdf != "" {
if found, ok := byBDF[bdf]; ok {
idx = found.index
}
}
if idx == -1 {
slot := strings.ToLower(strings.TrimSpace(detailed.Slot))
if slot != "" {
if found, ok := bySlot[slot]; ok {
idx = found.index
}
}
}
if idx == -1 {
base = append(base, detailed)
newIdx := len(base) - 1
if bdf := normalizePCIeBDF(detailed.BDF); bdf != "" {
byBDF[bdf] = ref{index: newIdx}
}
if slot := strings.ToLower(strings.TrimSpace(detailed.Slot)); slot != "" {
bySlot[slot] = ref{index: newIdx}
}
continue
}
enrichPCIeDevice(&base[idx], detailed)
}
return base
}
func enrichPCIeDevice(dst *models.PCIeDevice, src models.PCIeDevice) {
if dst == nil {
return
}
if strings.TrimSpace(dst.Slot) == "" {
dst.Slot = src.Slot
}
if strings.TrimSpace(dst.BDF) == "" {
dst.BDF = src.BDF
}
if dst.VendorID == 0 {
dst.VendorID = src.VendorID
}
if dst.DeviceID == 0 {
dst.DeviceID = src.DeviceID
}
if strings.TrimSpace(dst.Manufacturer) == "" {
dst.Manufacturer = src.Manufacturer
}
if strings.TrimSpace(dst.SerialNumber) == "" {
dst.SerialNumber = src.SerialNumber
}
if strings.TrimSpace(dst.PartNumber) == "" {
dst.PartNumber = src.PartNumber
}
if strings.TrimSpace(dst.LinkSpeed) == "" || strings.EqualFold(strings.TrimSpace(dst.LinkSpeed), "unknown") {
dst.LinkSpeed = src.LinkSpeed
}
if strings.TrimSpace(dst.MaxLinkSpeed) == "" || strings.EqualFold(strings.TrimSpace(dst.MaxLinkSpeed), "unknown") {
dst.MaxLinkSpeed = src.MaxLinkSpeed
}
if dst.LinkWidth == 0 {
dst.LinkWidth = src.LinkWidth
}
if dst.MaxLinkWidth == 0 {
dst.MaxLinkWidth = src.MaxLinkWidth
}
if isGenericPCIeClass(dst.DeviceClass) && !isGenericPCIeClass(src.DeviceClass) {
dst.DeviceClass = src.DeviceClass
}
}
func normalizePCIeBDF(bdf string) string {
bdf = strings.TrimSpace(strings.ToLower(bdf))
if bdf == "" {
return ""
}
if strings.Contains(bdf, "/") {
parts := strings.Split(bdf, "/")
if len(parts) == 4 {
return fmt.Sprintf("%s:%s.%s", parts[1], parts[2], parts[3])
}
}
return bdf
}
func isGenericPCIeClass(class string) bool {
switch strings.ToLower(strings.TrimSpace(class)) {
case "", "unknown", "other", "bridge", "network", "storage", "sas", "sata", "display", "vga", "3d controller", "serial bus":
return true
default:
return false
}
}
// determineDeviceClass maps device type to human-readable class
func determineDeviceClass(devType, devSubtype int, deviceName string) string {
// dev_type mapping:

View File

@@ -0,0 +1,77 @@
package inspur
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestParsePCIeDevices_UsesDeviceNameAsModelWhenPartNumberMissing(t *testing.T) {
content := []byte(`RESTful PCIE Device info:
[{"id":1,"present":1,"vendor_id":32902,"vendor_name":"Intel","device_id":5409,"device_name":"I350T4V2","bus_num":69,"dev_num":0,"func_num":0,"max_link_width":4,"max_link_speed":2,"current_link_width":4,"current_link_speed":2,"location":"#CPU0_PCIE4","dev_type":2,"dev_subtype":0,"part_num":"","serial_num":"","fw_ver":""}]
BMC sdr Info:`)
devices := ParsePCIeDevices(content)
if len(devices) != 1 {
t.Fatalf("expected 1 device, got %d", len(devices))
}
if devices[0].PartNumber != "I350T4V2" {
t.Fatalf("expected part/model I350T4V2, got %q", devices[0].PartNumber)
}
if devices[0].BDF != "45:00.0" {
t.Fatalf("expected BDF 45:00.0, got %q", devices[0].BDF)
}
}
func TestMergePCIeDevices_EnrichesGenericAssetEntry(t *testing.T) {
base := []models.PCIeDevice{
{
Slot: "#CPU1_PCIE9",
BDF: "98:00.0",
VendorID: 0x9005,
DeviceID: 0x028f,
DeviceClass: "SAS",
Manufacturer: "Adaptec / Microsemi",
},
}
rest := []models.PCIeDevice{
{
Slot: "#CPU1_PCIE9",
BDF: "98:00.0",
VendorID: 0x9005,
DeviceID: 0x028f,
DeviceClass: "Storage Controller",
Manufacturer: "Microchip",
PartNumber: "PM8222-SHBA",
},
}
got := MergePCIeDevices(base, rest)
if len(got) != 1 {
t.Fatalf("expected 1 merged device, got %d", len(got))
}
if got[0].PartNumber != "PM8222-SHBA" {
t.Fatalf("expected merged part number PM8222-SHBA, got %q", got[0].PartNumber)
}
}
func TestParsePCIeDevices_ResolvesModelFromPCIIDsWhenDeviceNameIsRawHex(t *testing.T) {
content := []byte(`RESTful PCIE Device info:
[{"id":5,"present":1,"vendor_id":36869,"vendor_name":"","device_id":655,"device_name":"0x028F","bus_num":152,"dev_num":0,"func_num":0,"max_link_width":8,"max_link_speed":3,"current_link_width":8,"current_link_speed":3,"location":"#CPU1_PCIE9","dev_type":1,"dev_subtype":7,"part_num":"","serial_num":"","fw_ver":""}]
BMC sdr Info:`)
devices := ParsePCIeDevices(content)
if len(devices) != 1 {
t.Fatalf("expected 1 device, got %d", len(devices))
}
if devices[0].PartNumber == "" {
t.Fatalf("expected part number resolved from pci.ids, got empty")
}
if strings.HasPrefix(strings.ToLower(strings.TrimSpace(devices[0].PartNumber)), "0x") {
t.Fatalf("expected resolved name instead of raw hex, got %q", devices[0].PartNumber)
}
if devices[0].Manufacturer == "" {
t.Fatalf("expected manufacturer resolved from pci.ids")
}
}

View File

@@ -0,0 +1,559 @@
package inspur
import (
"encoding/hex"
"regexp"
"sort"
"strconv"
"strings"
"unicode"
"git.mchus.pro/mchus/logpile/internal/models"
)
var (
reRedisGPUKey = regexp.MustCompile(`GPUInfo:REDIS_GPUINFO_T([0-9]+):([A-Za-z0-9_]+)`)
reRedisNICKey = regexp.MustCompile(`RedisNicInfo:redis_nic_info_t:stNicDeviceInfo([0-9]+):([A-Za-z0-9_]+)`)
reRedisRAIDSerial = regexp.MustCompile(`RAIDMSCCInfo:redis_pcie_mscc_raid_info_t([0-9]+):RAIDInfo:SerialNum`)
reRedisPCIESNPN = regexp.MustCompile(`AssetInfoPCIE:SNPN([0-9]+):(SN|PN)`)
)
type redisGPUSnapshot struct {
ByIndex map[int]map[string]string
}
type redisNICSnapshot struct {
ByIndex map[int]map[string]string
}
type redisPCIESerialSnapshot struct {
ByPart map[string]string
}
func enrichFromRedisDump(content []byte, hw *models.HardwareConfig) {
if hw == nil || len(content) == 0 {
return
}
gpuSnap := parseRedisGPUSnapshot(content)
nicSnap := parseRedisNICSnapshot(content)
raidSerials := parseRedisRAIDSerials(content)
pcieSnap := parseRedisPCIESerialSnapshot(content)
applyRedisGPUEnrichment(hw, gpuSnap)
applyRedisNICEnrichment(hw, nicSnap)
applyRedisPCIESNPNEnrichment(hw, pcieSnap)
applyRedisPCIeEnrichment(hw, raidSerials)
}
func parseRedisRAIDSerials(content []byte) []string {
matches := reRedisRAIDSerial.FindAllSubmatchIndex(content, -1)
if len(matches) == 0 {
return nil
}
seen := make(map[string]bool, len(matches))
serials := make([]string, 0, len(matches))
for _, m := range matches {
if len(m) < 4 {
continue
}
value := normalizeRedisValue(extractRedisCandidateValue(content, m[1]))
if value == "" || seen[value] {
continue
}
seen[value] = true
serials = append(serials, value)
}
return serials
}
func parseRedisPCIESerialSnapshot(content []byte) redisPCIESerialSnapshot {
type rec struct {
PN string
SN string
}
tmp := make(map[int]rec)
matches := reRedisPCIESNPN.FindAllSubmatchIndex(content, -1)
for _, m := range matches {
if len(m) < 6 {
continue
}
idxStr := string(content[m[2]:m[3]])
field := string(content[m[4]:m[5]])
idx, err := strconv.Atoi(idxStr)
if err != nil {
continue
}
value := normalizeRedisValue(extractRedisCandidateValue(content, m[1]))
if value == "" {
continue
}
r := tmp[idx]
if field == "PN" {
r.PN = value
} else if field == "SN" {
r.SN = value
}
tmp[idx] = r
}
out := redisPCIESerialSnapshot{ByPart: make(map[string]string)}
for _, r := range tmp {
pn := normalizeRedisValue(r.PN)
sn := normalizeRedisValue(r.SN)
if pn == "" || sn == "" {
continue
}
out.ByPart[strings.ToLower(strings.TrimSpace(pn))] = sn
}
return out
}
func parseRedisGPUSnapshot(content []byte) redisGPUSnapshot {
snap := redisGPUSnapshot{ByIndex: make(map[int]map[string]string)}
matches := reRedisGPUKey.FindAllSubmatchIndex(content, -1)
for _, m := range matches {
if len(m) < 6 {
continue
}
idxStr := string(content[m[2]:m[3]])
field := string(content[m[4]:m[5]])
idx, err := strconv.Atoi(idxStr)
if err != nil {
continue
}
value := extractRedisInlineValue(content, m[1])
if value == "" {
continue
}
byField, ok := snap.ByIndex[idx]
if !ok {
byField = make(map[string]string)
snap.ByIndex[idx] = byField
}
byField[field] = value
}
return snap
}
func parseRedisNICSnapshot(content []byte) redisNICSnapshot {
snap := redisNICSnapshot{ByIndex: make(map[int]map[string]string)}
matches := reRedisNICKey.FindAllSubmatchIndex(content, -1)
for _, m := range matches {
if len(m) < 6 {
continue
}
idxStr := string(content[m[2]:m[3]])
field := string(content[m[4]:m[5]])
idx, err := strconv.Atoi(idxStr)
if err != nil {
continue
}
value := extractRedisInlineValue(content, m[1])
if value == "" {
continue
}
byField, ok := snap.ByIndex[idx]
if !ok {
byField = make(map[string]string)
snap.ByIndex[idx] = byField
}
byField[field] = value
}
return snap
}
func extractRedisInlineValue(content []byte, start int) string {
if start < 0 || start >= len(content) {
return ""
}
i := start
for i < len(content) && content[i] <= 0x20 {
i++
}
if i >= len(content) {
return ""
}
j := i
for j < len(content) {
c := content[j]
if c == 0 || c < 0x20 || c > 0x7e {
break
}
j++
}
if j <= i {
return ""
}
raw := strings.TrimSpace(string(content[i:j]))
if raw == "" {
return ""
}
decoded := maybeDecodeHexString(raw)
if decoded != "" {
return decoded
}
return raw
}
func extractRedisCandidateValue(content []byte, start int) string {
// Fast-path for simple inline string values.
if v := extractRedisInlineValue(content, start); normalizeRedisValue(v) != "" {
return v
}
if start < 0 || start >= len(content) {
return ""
}
end := start + 256
if end > len(content) {
end = len(content)
}
window := content[start:end]
for _, token := range splitAlphaNumTokens(window) {
if len(token) < 6 {
continue
}
lower := strings.ToLower(token)
if strings.Contains(lower, "redis") || strings.Contains(lower, "sensor") || strings.Contains(lower, "fullsdr") {
continue
}
if decoded := maybeDecodeHexString(token); normalizeRedisValue(decoded) != "" {
return decoded
}
if normalizeRedisValue(token) != "" {
return token
}
}
return ""
}
func splitAlphaNumTokens(b []byte) []string {
var out []string
start := -1
for i := 0; i < len(b); i++ {
c := rune(b[i])
if unicode.IsLetter(c) || unicode.IsDigit(c) {
if start == -1 {
start = i
}
continue
}
if start != -1 {
out = append(out, string(b[start:i]))
start = -1
}
}
if start != -1 {
out = append(out, string(b[start:]))
}
return out
}
func maybeDecodeHexString(s string) string {
if len(s) < 8 || len(s)%2 != 0 {
return ""
}
for _, c := range s {
if (c < '0' || c > '9') && (c < 'a' || c > 'f') && (c < 'A' || c > 'F') {
return ""
}
}
b, err := hex.DecodeString(s)
if err != nil {
return ""
}
decoded := strings.TrimSpace(strings.TrimRight(string(b), "\x00"))
if decoded == "" {
return ""
}
for _, c := range decoded {
if c < 0x20 || c > 0x7e {
return ""
}
}
return decoded
}
func applyRedisGPUEnrichment(hw *models.HardwareConfig, snap redisGPUSnapshot) {
if len(hw.GPUs) == 0 || len(snap.ByIndex) == 0 {
return
}
type redisGPU struct {
Index int
Data map[string]string
}
redisGPUs := make([]redisGPU, 0, len(snap.ByIndex))
for idx, data := range snap.ByIndex {
if data == nil {
continue
}
if data["NV_GPU_SerialNumber"] == "" && data["NV_GPU_FWVersion"] == "" && data["NV_GPU_UUID"] == "" {
continue
}
redisGPUs = append(redisGPUs, redisGPU{Index: idx, Data: data})
}
if len(redisGPUs) == 0 {
return
}
sort.Slice(redisGPUs, func(i, j int) bool { return redisGPUs[i].Index < redisGPUs[j].Index })
target := make([]*models.GPU, 0, len(hw.GPUs))
for i := range hw.GPUs {
gpu := &hw.GPUs[i]
if isNVIDIAGPU(gpu) {
target = append(target, gpu)
}
}
if len(target) == 0 || len(target) != len(redisGPUs) {
return
}
sort.Slice(target, func(i, j int) bool {
left := strings.TrimSpace(target[i].BDF)
right := strings.TrimSpace(target[j].BDF)
if left != "" && right != "" {
return left < right
}
return strings.TrimSpace(target[i].Slot) < strings.TrimSpace(target[j].Slot)
})
for i := range target {
applyRedisGPUFields(target[i], redisGPUs[i].Data)
}
}
func isNVIDIAGPU(gpu *models.GPU) bool {
if gpu == nil {
return false
}
if gpu.VendorID == 0x10de {
return true
}
man := strings.ToLower(strings.TrimSpace(gpu.Manufacturer))
return strings.Contains(man, "nvidia")
}
func applyRedisGPUFields(gpu *models.GPU, fields map[string]string) {
if gpu == nil || fields == nil {
return
}
if serial := normalizeRedisValue(fields["NV_GPU_SerialNumber"]); serial != "" && isMissingGPUField(gpu.SerialNumber) {
gpu.SerialNumber = serial
}
if fw := normalizeRedisValue(fields["NV_GPU_FWVersion"]); fw != "" && isMissingGPUField(gpu.Firmware) {
gpu.Firmware = fw
}
if uuid := normalizeRedisValue(fields["NV_GPU_UUID"]); uuid != "" && isMissingGPUField(gpu.UUID) {
gpu.UUID = uuid
}
if part := normalizeRedisValue(fields["NVGPUPartNumber"]); part != "" && isMissingGPUField(gpu.PartNumber) {
gpu.PartNumber = part
}
if model := normalizeRedisValue(fields["NVGPUMarketingName"]); model != "" && isGenericGPUModel(gpu.Model) {
gpu.Model = model
}
if gpu.ClockSpeed == 0 {
if mhz, ok := parseIntField(fields["OperatingSpeedMHz"]); ok {
gpu.ClockSpeed = mhz
}
}
if gpu.Power == 0 {
if pwr, ok := parseIntField(fields["GPUTotalPower"]); ok {
gpu.Power = pwr
}
}
if gpu.Temperature == 0 {
if temp, ok := parseIntField(fields["Temp"]); ok {
gpu.Temperature = temp
}
}
if gpu.MemTemperature == 0 {
if temp, ok := parseIntField(fields["MemTemp"]); ok {
gpu.MemTemperature = temp
}
}
}
func parseIntField(v string) (int, bool) {
v = normalizeRedisValue(v)
if v == "" {
return 0, false
}
n, err := strconv.Atoi(v)
if err != nil {
return 0, false
}
return n, true
}
func normalizeRedisValue(v string) string {
v = strings.TrimSpace(v)
if v == "" {
return ""
}
l := strings.ToLower(v)
if l == "n/a" || l == "na" || l == "null" || l == "unknown" {
return ""
}
return v
}
func isMissingGPUField(v string) bool {
return normalizeRedisValue(v) == ""
}
func isGenericGPUModel(model string) bool {
m := strings.ToLower(strings.TrimSpace(model))
switch m {
case "", "unknown", "display", "display controller", "3d controller", "vga", "gpu":
return true
default:
return false
}
}
func applyRedisNICEnrichment(hw *models.HardwareConfig, snap redisNICSnapshot) {
if len(hw.NetworkAdapters) == 0 || len(snap.ByIndex) == 0 {
return
}
type redisNIC struct {
Index int
Data map[string]string
}
redisNICs := make([]redisNIC, 0, len(snap.ByIndex))
for idx, data := range snap.ByIndex {
if data == nil {
continue
}
if normalizeRedisValue(data["FWVersion"]) == "" {
continue
}
redisNICs = append(redisNICs, redisNIC{Index: idx, Data: data})
}
if len(redisNICs) == 0 {
return
}
sort.Slice(redisNICs, func(i, j int) bool { return redisNICs[i].Index < redisNICs[j].Index })
target := make([]*models.NetworkAdapter, 0, len(hw.NetworkAdapters))
for i := range hw.NetworkAdapters {
nic := &hw.NetworkAdapters[i]
if nic.Present {
target = append(target, nic)
}
}
if len(target) == 0 {
return
}
sort.Slice(target, func(i, j int) bool {
left := strings.TrimSpace(target[i].Location)
right := strings.TrimSpace(target[j].Location)
if left != "" && right != "" {
return left < right
}
return strings.TrimSpace(target[i].Slot) < strings.TrimSpace(target[j].Slot)
})
limit := len(target)
if len(redisNICs) < limit {
limit = len(redisNICs)
}
for i := 0; i < limit; i++ {
nic := target[i]
data := redisNICs[i].Data
if fw := normalizeRedisValue(data["FWVersion"]); fw != "" && normalizeRedisValue(nic.Firmware) == "" {
nic.Firmware = fw
}
if serial := normalizeRedisValue(data["SerialNum"]); serial != "" && normalizeRedisValue(nic.SerialNumber) == "" {
nic.SerialNumber = serial
}
if part := normalizeRedisValue(data["PartNum"]); part != "" && normalizeRedisValue(nic.PartNumber) == "" {
nic.PartNumber = part
}
}
}
func applyRedisPCIeEnrichment(hw *models.HardwareConfig, raidSerials []string) {
if hw == nil || len(hw.PCIeDevices) == 0 || len(raidSerials) == 0 {
return
}
target := make([]*models.PCIeDevice, 0, len(hw.PCIeDevices))
for i := range hw.PCIeDevices {
dev := &hw.PCIeDevices[i]
if normalizeRedisValue(dev.SerialNumber) != "" {
continue
}
class := strings.ToLower(strings.TrimSpace(dev.DeviceClass))
part := strings.ToLower(strings.TrimSpace(dev.PartNumber))
if strings.Contains(class, "raid") || strings.Contains(class, "sas") || strings.Contains(class, "storage") ||
strings.Contains(part, "raid") || strings.Contains(part, "sas") || strings.Contains(part, "hba") {
target = append(target, dev)
}
}
if len(target) == 0 {
return
}
sort.Slice(target, func(i, j int) bool {
left := strings.TrimSpace(target[i].BDF)
right := strings.TrimSpace(target[j].BDF)
if left != "" && right != "" {
return left < right
}
return strings.TrimSpace(target[i].Slot) < strings.TrimSpace(target[j].Slot)
})
limit := len(target)
if len(raidSerials) < limit {
limit = len(raidSerials)
}
for i := 0; i < limit; i++ {
target[i].SerialNumber = raidSerials[i]
}
}
func applyRedisPCIESNPNEnrichment(hw *models.HardwareConfig, snap redisPCIESerialSnapshot) {
if hw == nil || len(hw.PCIeDevices) == 0 || len(snap.ByPart) == 0 {
return
}
for i := range hw.PCIeDevices {
dev := &hw.PCIeDevices[i]
if normalizeRedisValue(dev.SerialNumber) != "" {
continue
}
part := strings.ToLower(strings.TrimSpace(dev.PartNumber))
if part == "" {
continue
}
if serial := normalizeRedisValue(snap.ByPart[part]); serial != "" {
dev.SerialNumber = serial
}
}
}

View File

@@ -0,0 +1,144 @@
package inspur
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestExtractRedisInlineValue_DecodesHexEncodedString(t *testing.T) {
data := []byte("RedisNicInfo:redis_nic_info_t:stNicDeviceInfo0:FWVersion 32362e34332e32353636000000000000\x00tail")
key := []byte("RedisNicInfo:redis_nic_info_t:stNicDeviceInfo0:FWVersion")
pos := indexBytes(data, key)
if pos < 0 {
t.Fatal("key not found")
}
got := extractRedisInlineValue(data, pos+len(key))
if got != "26.43.2566" {
t.Fatalf("expected decoded fw 26.43.2566, got %q", got)
}
}
func TestApplyRedisGPUEnrichment_FillsSerialFirmwareUUID(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#CPU0_PCIE2", BDF: "0c:00.0", VendorID: 0x10de, Model: "3D Controller"},
{Slot: "#CPU0_PCIE1", BDF: "58:00.0", VendorID: 0x10de, Model: "3D Controller"},
},
}
snap := redisGPUSnapshot{
ByIndex: map[int]map[string]string{
1: {
"NV_GPU_SerialNumber": "1321125009572",
"NV_GPU_FWVersion": "96.00.B7.00.02",
"NV_GPU_UUID": "GPU-AAA",
},
2: {
"NV_GPU_SerialNumber": "1321125010420",
"NV_GPU_FWVersion": "96.00.B7.00.02",
"NV_GPU_UUID": "GPU-BBB",
},
},
}
applyRedisGPUEnrichment(hw, snap)
if hw.GPUs[0].SerialNumber != "1321125009572" || hw.GPUs[0].Firmware != "96.00.B7.00.02" || hw.GPUs[0].UUID != "GPU-AAA" {
t.Fatalf("unexpected gpu0 enrichment: %+v", hw.GPUs[0])
}
if hw.GPUs[1].SerialNumber != "1321125010420" || hw.GPUs[1].Firmware != "96.00.B7.00.02" || hw.GPUs[1].UUID != "GPU-BBB" {
t.Fatalf("unexpected gpu1 enrichment: %+v", hw.GPUs[1])
}
}
func TestApplyRedisGPUEnrichment_SkipsOnCountMismatch(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#CPU0_PCIE2", BDF: "0c:00.0", VendorID: 0x10de, Model: "3D Controller"},
},
}
snap := redisGPUSnapshot{
ByIndex: map[int]map[string]string{
1: {"NV_GPU_SerialNumber": "1321125009572"},
2: {"NV_GPU_SerialNumber": "1321125010420"},
},
}
applyRedisGPUEnrichment(hw, snap)
if hw.GPUs[0].SerialNumber != "" {
t.Fatalf("expected no enrichment on count mismatch, got %q", hw.GPUs[0].SerialNumber)
}
}
func TestParseRedisRAIDSerials_DecodesHexSerial(t *testing.T) {
raw := []byte("RAIDMSCCInfo:redis_pcie_mscc_raid_info_t0:RAIDInfo:SerialNum\x80%@`5341523531314532 \x00tail")
got := parseRedisRAIDSerials(raw)
if len(got) != 1 {
t.Fatalf("expected 1 raid serial, got %d", len(got))
}
if got[0] != "SAR511E2" {
t.Fatalf("expected decoded serial SAR511E2, got %q", got[0])
}
}
func TestApplyRedisPCIeEnrichment_FillsStorageControllerSerial(t *testing.T) {
hw := &models.HardwareConfig{
PCIeDevices: []models.PCIeDevice{
{Slot: "#CPU1_PCIE9", BDF: "98:00.0", DeviceClass: "Smart Storage PQI SAS", PartNumber: "PM8222-SHBA"},
{Slot: "#CPU0_PCIE3", BDF: "32:00.0", DeviceClass: "Fibre Channel", PartNumber: "LPE32002"},
},
}
applyRedisPCIeEnrichment(hw, []string{"SAR511E2"})
if hw.PCIeDevices[0].SerialNumber != "SAR511E2" {
t.Fatalf("expected PM8222 serial SAR511E2, got %q", hw.PCIeDevices[0].SerialNumber)
}
if hw.PCIeDevices[1].SerialNumber != "" {
t.Fatalf("expected non-storage device serial untouched, got %q", hw.PCIeDevices[1].SerialNumber)
}
}
func TestParseRedisPCIESerialSnapshot_MapsPNToSN(t *testing.T) {
raw := []byte("" +
"AssetInfoPCIE:SNPN9:PN PM8222-SHBA\x00" +
"AssetInfoPCIE:SNPN9:SN SAR511E2\x00")
snap := parseRedisPCIESerialSnapshot(raw)
got := snap.ByPart["pm8222-shba"]
if got != "SAR511E2" {
t.Fatalf("expected SN SAR511E2 for PM8222-SHBA, got %q", got)
}
}
func TestApplyRedisPCIESNPNEnrichment_FillsByPartNumber(t *testing.T) {
hw := &models.HardwareConfig{
PCIeDevices: []models.PCIeDevice{
{Slot: "#CPU1_PCIE9", PartNumber: "PM8222-SHBA"},
},
}
snap := redisPCIESerialSnapshot{ByPart: map[string]string{"pm8222-shba": "SAR511E2"}}
applyRedisPCIESNPNEnrichment(hw, snap)
if hw.PCIeDevices[0].SerialNumber != "SAR511E2" {
t.Fatalf("expected serial SAR511E2, got %q", hw.PCIeDevices[0].SerialNumber)
}
}
func indexBytes(haystack, needle []byte) int {
for i := 0; i+len(needle) <= len(haystack); i++ {
match := true
for j := 0; j < len(needle); j++ {
if haystack[i+j] != needle[j] {
match = false
break
}
}
if match {
return i
}
}
return -1
}

View File

@@ -6,12 +6,19 @@ import (
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
// ParseSELList parses selelist.csv file with SEL events
// Format: ID, Date (MM/DD/YYYY), Time (HH:MM:SS), Sensor, Event, Status
// Example: 1,04/18/2025,09:31:18,Event Logging Disabled SEL_Status,Log area reset/cleared,Asserted
func ParseSELList(content []byte) []models.Event {
return ParseSELListWithLocation(content, parser.DefaultArchiveLocation())
}
// ParseSELListWithLocation parses selelist.csv using provided source timezone
// for timestamps that don't contain an explicit offset.
func ParseSELListWithLocation(content []byte, location *time.Location) []models.Event {
var events []models.Event
text := string(content)
@@ -48,7 +55,7 @@ func ParseSELList(content []byte) []models.Event {
status := strings.TrimSpace(records[5])
// Parse timestamp: MM/DD/YYYY HH:MM:SS
timestamp := parseSELTimestamp(dateStr, timeStr)
timestamp := parseSELTimestamp(dateStr, timeStr, location)
// Extract sensor type and name
sensorType, sensorName := parseSensorInfo(sensorStr)
@@ -76,12 +83,16 @@ func ParseSELList(content []byte) []models.Event {
}
// parseSELTimestamp parses MM/DD/YYYY and HH:MM:SS into time.Time
func parseSELTimestamp(dateStr, timeStr string) time.Time {
func parseSELTimestamp(dateStr, timeStr string, location *time.Location) time.Time {
// Combine date and time: MM/DD/YYYY HH:MM:SS
timestampStr := dateStr + " " + timeStr
if location == nil {
location = parser.DefaultArchiveLocation()
}
// Try parsing with MM/DD/YYYY format
t, err := time.Parse("01/02/2006 15:04:05", timestampStr)
t, err := time.ParseInLocation("01/02/2006 15:04:05", timestampStr, location)
if err != nil {
// Fallback to current time
return time.Now()

View File

@@ -0,0 +1,33 @@
package inspur
import (
"testing"
"time"
)
func TestParseSELListWithLocation_UsesProvidedTimezone(t *testing.T) {
content := []byte("sel elist:\n1,02/28/2026,04:18:18,Sensor X,Event,Asserted\n")
shanghai, err := time.LoadLocation("Asia/Shanghai")
if err != nil {
t.Fatalf("load location: %v", err)
}
events := ParseSELListWithLocation(content, shanghai)
if len(events) != 1 {
t.Fatalf("expected 1 event, got %d", len(events))
}
// 04:18:18 +08:00 == 20:18:18Z (previous day)
want := time.Date(2026, 2, 27, 20, 18, 18, 0, time.UTC)
if !events[0].Timestamp.UTC().Equal(want) {
t.Fatalf("unexpected timestamp: got %s want %s", events[0].Timestamp.UTC(), want)
}
}
func TestParseTimezoneConfigLocation(t *testing.T) {
content := []byte("[TimeZoneConfig]\ntimezone=Asia/Shanghai\n")
got := parseTimezoneConfigLocation(content)
if got != "Asia/Shanghai" {
t.Fatalf("unexpected timezone: %q", got)
}
}

View File

@@ -0,0 +1,92 @@
package inspur
import (
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var (
hostnameJSONRegex = regexp.MustCompile(`"_HOSTNAME"\s*:\s*"([^"]+)"`)
)
func inferBoardSerialFromFallbackLogs(files []parser.ExtractedFile) string {
// Prefer FRU dump when present.
if f := parser.FindFileByName(files, "fru.txt"); f != nil {
fruList := ParseFRU(f.Content)
for _, fru := range fruList {
serial := strings.TrimSpace(fru.SerialNumber)
if serial == "" || serial == "0" {
continue
}
desc := strings.ToLower(strings.TrimSpace(fru.Description))
if strings.Contains(desc, "builtin") || strings.Contains(desc, "fru device") {
return serial
}
}
}
// Fallback to explicit hostname file.
if f := parser.FindFileByName(files, "hostname"); f != nil {
if serial := sanitizeCandidateSerial(firstNonEmptyLine(string(f.Content))); serial != "" {
return serial
}
}
// Last-resort fallback from structured journal logs.
if f := parser.FindFileByName(files, "maintenance_json.log"); f != nil {
if m := hostnameJSONRegex.FindSubmatch(f.Content); len(m) == 2 {
if serial := sanitizeCandidateSerial(string(m[1])); serial != "" {
return serial
}
}
}
return ""
}
func inferBoardModelFromFallbackLogs(files []parser.ExtractedFile) string {
// Prefer FRU dump when present.
if f := parser.FindFileByName(files, "fru.txt"); f != nil {
fruList := ParseFRU(f.Content)
for _, fru := range fruList {
model := sanitizeCandidateModel(fru.ProductName)
if model == "" {
continue
}
desc := strings.ToLower(strings.TrimSpace(fru.Description))
if strings.Contains(desc, "builtin") || strings.Contains(desc, "fru device") {
return model
}
}
}
return ""
}
func firstNonEmptyLine(s string) string {
for _, line := range strings.Split(s, "\n") {
line = strings.TrimSpace(line)
if line != "" {
return line
}
}
return ""
}
func sanitizeCandidateSerial(s string) string {
s = strings.TrimSpace(s)
if s == "" || strings.EqualFold(s, "localhost") || strings.ContainsAny(s, " \t") {
return ""
}
return s
}
func sanitizeCandidateModel(s string) string {
s = strings.TrimSpace(s)
if s == "" || strings.EqualFold(s, "null") || s == "0" {
return ""
}
return s
}

View File

@@ -0,0 +1,76 @@
package inspur
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestInferBoardSerialFromFallbackLogs_PrefersFRU(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "component/fru.txt",
Content: []byte(`FRU Device Description : Builtin FRU Device (ID 0)
Product Serial : 23DB01639
`),
},
{
Path: "runningdata/RTOSDump/hostname",
Content: []byte("HOSTNAME-FALLBACK\n"),
},
{
Path: "log/bmc/struct-log/maintenance_json.log",
Content: []byte(`{ "_HOSTNAME": "JSON-FALLBACK" }`),
},
}
got := inferBoardSerialFromFallbackLogs(files)
if got != "23DB01639" {
t.Fatalf("expected FRU serial 23DB01639, got %q", got)
}
}
func TestInferBoardSerialFromFallbackLogs_UsesHostnameFile(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "runningdata/RTOSDump/hostname",
Content: []byte("23DB01639\n"),
},
}
got := inferBoardSerialFromFallbackLogs(files)
if got != "23DB01639" {
t.Fatalf("expected hostname serial 23DB01639, got %q", got)
}
}
func TestInferBoardSerialFromFallbackLogs_UsesMaintenanceJSON(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "log/bmc/struct-log/maintenance_json.log",
Content: []byte(`{ "_HOSTNAME": "23DB01639", "MESSAGE": "ok" }`),
},
}
got := inferBoardSerialFromFallbackLogs(files)
if got != "23DB01639" {
t.Fatalf("expected JSON hostname serial 23DB01639, got %q", got)
}
}
func TestInferBoardModelFromFallbackLogs_PrefersFRU(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "component/fru.txt",
Content: []byte(`FRU Device Description : Builtin FRU Device (ID 0)
Board Product : KR9288-X3-A0-F0-00
Product Name : KR9288-X3-A0-F0-00
`),
},
}
got := inferBoardModelFromFallbackLogs(files)
if got != "KR9288-X3-A0-F0-00" {
t.Fatalf("expected board model KR9288-X3-A0-F0-00, got %q", got)
}
}

View File

@@ -0,0 +1,148 @@
package inspur
import (
"regexp"
"sort"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var bpHDDSerialTokenRegex = regexp.MustCompile(`[A-Za-z0-9]{8,32}`)
func enrichStorageFromSerialFallbackFiles(files []parser.ExtractedFile, hw *models.HardwareConfig) {
if hw == nil {
return
}
f := parser.FindFileByName(files, "BpHDDSerialNumber.info")
if f == nil {
return
}
serials := extractBPHDDSerials(f.Content)
if len(serials) == 0 {
return
}
applyStorageSerialFallback(hw, serials)
}
func extractBPHDDSerials(content []byte) []string {
if len(content) == 0 {
return nil
}
matches := bpHDDSerialTokenRegex.FindAllString(string(content), -1)
if len(matches) == 0 {
return nil
}
out := make([]string, 0, len(matches))
seen := make(map[string]struct{}, len(matches))
for _, m := range matches {
v := normalizeRedisValue(m)
if !looksLikeStorageSerial(v) {
continue
}
key := strings.ToLower(v)
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
out = append(out, v)
}
return out
}
func looksLikeStorageSerial(v string) bool {
if len(v) < 8 {
return false
}
hasLetter := false
hasDigit := false
for _, r := range v {
switch {
case r >= 'A' && r <= 'Z':
hasLetter = true
case r >= 'a' && r <= 'z':
hasLetter = true
case r >= '0' && r <= '9':
hasDigit = true
default:
return false
}
}
return hasLetter && hasDigit
}
func applyStorageSerialFallback(hw *models.HardwareConfig, serials []string) {
if hw == nil || len(hw.Storage) == 0 || len(serials) == 0 {
return
}
existing := make(map[string]struct{}, len(hw.Storage))
for _, dev := range hw.Storage {
if sn := normalizeRedisValue(dev.SerialNumber); sn != "" {
existing[strings.ToLower(sn)] = struct{}{}
}
}
filtered := make([]string, 0, len(serials))
for _, sn := range serials {
key := strings.ToLower(sn)
if _, ok := existing[key]; ok {
continue
}
filtered = append(filtered, sn)
}
if len(filtered) == 0 {
return
}
type target struct {
index int
rank int
slot string
}
targets := make([]target, 0, len(hw.Storage))
for i := range hw.Storage {
dev := hw.Storage[i]
if normalizeRedisValue(dev.SerialNumber) != "" {
continue
}
if !dev.Present && strings.TrimSpace(dev.Slot) == "" {
continue
}
rank := 0
if !dev.Present {
rank += 10
}
if strings.EqualFold(strings.TrimSpace(dev.Type), "NVMe") {
rank += 5
}
if strings.TrimSpace(dev.Slot) == "" {
rank += 4
}
targets = append(targets, target{
index: i,
rank: rank,
slot: strings.ToLower(strings.TrimSpace(dev.Slot)),
})
}
if len(targets) == 0 {
return
}
sort.Slice(targets, func(i, j int) bool {
if targets[i].rank != targets[j].rank {
return targets[i].rank < targets[j].rank
}
return targets[i].slot < targets[j].slot
})
for i := 0; i < len(targets) && i < len(filtered); i++ {
dev := &hw.Storage[targets[i].index]
dev.SerialNumber = filtered[i]
if !dev.Present {
dev.Present = true
}
}
}

View File

@@ -0,0 +1,106 @@
package inspur
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestParseAssetJSON_HddSlotFallbackAndPresence(t *testing.T) {
content := []byte(`{
"HddInfo": [
{
"PresentBitmap": [1],
"SerialNumber": "",
"Manufacturer": "",
"ModelName": "",
"FirmwareVersion": "",
"Capacity": 0,
"Location": 2,
"DiskInterfaceType": 5,
"MediaType": 1,
"LocationString": ""
}
]
}`)
hw, err := ParseAssetJSON(content)
if err != nil {
t.Fatalf("ParseAssetJSON failed: %v", err)
}
if len(hw.Storage) != 1 {
t.Fatalf("expected 1 storage entry, got %d", len(hw.Storage))
}
if hw.Storage[0].Slot != "OB03" {
t.Fatalf("expected OB03 slot fallback, got %q", hw.Storage[0].Slot)
}
if !hw.Storage[0].Present {
t.Fatalf("expected fallback storage entry marked present")
}
if hw.Storage[0].Type != "NVMe" {
t.Fatalf("expected NVMe type, got %q", hw.Storage[0].Type)
}
}
func TestParseDiskBackplaneInfo_PopulatesOnlyMissingPresentDrives(t *testing.T) {
text := `RESTful diskbackplane info:
[
{ "port_count": 8, "driver_count": 4, "front": 1, "backplane_index": 0, "present": 1, "cpld_version": "3.1", "temperature": 18 },
{ "port_count": 8, "driver_count": 3, "front": 1, "backplane_index": 1, "present": 1, "cpld_version": "3.1", "temperature": 17 }
]
BMC`
hw := &models.HardwareConfig{
Storage: []models.Storage{
{Slot: "OB01", Type: "NVMe", Present: true},
{Slot: "OB02", Type: "NVMe", Present: true},
{Slot: "OB03", Type: "NVMe", Present: true},
{Slot: "OB04", Type: "NVMe", Present: true},
},
}
parseDiskBackplaneInfo(text, hw)
if len(hw.Storage) != 7 {
t.Fatalf("expected total storage count 7 after backplane merge, got %d", len(hw.Storage))
}
bpCount := 0
for _, dev := range hw.Storage {
if strings.HasPrefix(dev.Slot, "BP0:") || strings.HasPrefix(dev.Slot, "BP1:") {
bpCount++
}
}
if bpCount != 3 {
t.Fatalf("expected 3 synthetic backplane rows, got %d", bpCount)
}
}
func TestEnrichStorageFromSerialFallbackFiles_AssignsSerials(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "onekeylog/configuration/conf/BpHDDSerialNumber.info",
Content: []byte{
0xA0, 0xA1, 0xA2, 0xA3,
'S', '6', 'K', 'N', 'N', 'G', '0', 'W', '4', '2', '8', '5', '5', '2',
0x00,
'P', 'H', 'Y', 'I', '5', '2', '7', '1', '0', '0', '4', 'B', '1', 'P', '9', 'D', 'G', 'N',
},
},
}
hw := &models.HardwareConfig{
Storage: []models.Storage{
{Slot: "BP0:0", Type: "HDD", Present: true},
{Slot: "BP0:1", Type: "HDD", Present: true},
{Slot: "OB01", Type: "NVMe", Present: true},
},
}
enrichStorageFromSerialFallbackFiles(files, hw)
if hw.Storage[0].SerialNumber == "" || hw.Storage[1].SerialNumber == "" {
t.Fatalf("expected serials assigned to present storage entries, got %#v", hw.Storage)
}
}

View File

@@ -1,175 +0,0 @@
# NVIDIA Field Diagnostics Parser
Парсер для диагностических архивов NVIDIA HGX Field Diagnostics.
Универсальный парсер, не привязанный к конкретному производителю серверов.
## Поддерживаемые архивы
- NVIDIA HGX Field Diag (работает с любыми серверами: Supermicro, Dell, HPE, и т.д.)
- Архивы с результатами GPU диагностики NVIDIA
## Формат архива
Парсер работает с архивами в формате:
- `.tar` (несжатый tar)
- `.tar.gz` (сжатый gzip)
## Распознаваемые файлы
### Основные файлы
1. **output.log** - вывод dmidecode с информацией о системе
- Производитель сервера (Manufacturer)
- Модель сервера (Product Name) - например, SYS-821GE-TNHR
- Серийный номер сервера (Serial Number) - например, A514359X5A07900
- UUID, SKU Number, Family
2. **unified_summary.json** - детальная информация о системе и компонентах
- Информация о GPU (модель, производитель, VBIOS, PCI адреса)
- Информация о NVSwitch (VendorID, DeviceID, Link speed/width)
- Информация о производителе и модели сервера
3. **summary.json** - результаты тестов диагностики
- Результаты тестов GPU (inforom, checkinforom, gpumem, gpustress, pcie, nvlink, nvswitch, power)
- Коды ошибок и статусы тестов
4. **summary.csv** - альтернативный формат результатов тестов
### Дополнительные файлы
- `gpu_fieldiag/*.log` - детальные логи диагностики каждого GPU
- `inventory/*.json` - дополнительная информация о конфигурации
## Извлекаемые данные
### Hardware Configuration
#### GPUs
```json
{
"slot": "GPUSXM1",
"model": "NVIDIA Device 2335",
"manufacturer": "NVIDIA Corporation",
"firmware": "96.00.D0.00.03",
"bdf": "0000:3a:00.0"
}
```
#### NVSwitch (как PCIe устройства)
```json
{
"slot": "NVSWITCHNVSWITCH0",
"device_class": "NVSwitch",
"manufacturer": "NVIDIA Corporation",
"vendor_id": 4318,
"device_id": 8867,
"bdf": "0000:05:00.0",
"link_speed": "16GT/s",
"link_width": 2
}
```
### Events
События создаются для:
- **Предупреждений и ошибок** тестов диагностики
- Примеры событий:
- `Row remapping failed` - ошибка памяти GPU (Warning)
- Различные тесты: connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power
Уровни severity:
- `info` - информационные события (тесты прошли успешно)
- `warning` - предупреждения (например, Row remapping failed)
- `critical` - критические ошибки (коды ошибок 300+)
## Пример использования
```bash
# Запуск веб-интерфейса
./logpile --file /path/to/A514359X5A07900_logs-20260122-074208.tar
# Веб-интерфейс будет доступен на http://localhost:8082
```
## Автоопределение
Парсер автоматически определяет архивы NVIDIA Field Diag по наличию:
- `unified_summary.json` с маркером "HGX Field Diag"
- `summary.json` и `summary.csv` с результатами тестов
- Директории `gpu_fieldiag/`
Confidence score:
- `unified_summary.json` с маркером "HGX Field Diag": +40
- `summary.json`: +20
- `summary.csv`: +15
- `gpu_fieldiag/` directory: +15
## Версионирование
**Текущая версия парсера:** 1.1.0
При модификации логики парсера необходимо увеличивать версию в константе `parserVersion` в файле `parser.go`.
### История версий
- **1.1.0** - Добавлен парсинг output.log (dmidecode) для извлечения модели и серийного номера сервера
- **1.0.0** - Первоначальная версия с парсингом unified_summary.json и summary.json/csv
## Примеры данных
### Пример unified_summary.json
```json
{
"runInfo": {
"diagVersion": "24287-XXXX-FLD-42658",
"diagName": "HGX Field Diag",
"finalResult": "FAIL",
"errorCode": 363
},
"tests": [{
"virtualId": "inventory",
"components": [{
"componentId": "GPUSXM1",
"properties": [
{"id": "Manufacturer", "value": "Any Server Vendor"},
{"id": "VendorID", "value": "10de"},
{"id": "DeviceID", "value": "2335"}
]
}]
}]
}
```
### Пример summary.json
```json
[
{
"Error Code": "005-000-1-000000000363",
"Test": "gpumem",
"Component ID": "SXM5_SN_1653925025497",
"Notes": "Row remapping failed",
"Virtual ID": "gpumem"
}
]
```
## Известные ограничения
1. Парсер фокусируется на данных из `unified_summary.json` и `summary.json`
2. Детальные логи из `gpu_fieldiag/*.log` пока не парсятся
3. Информация о CPU, памяти и дисках не извлекается (в архиве отсутствует)
## Разработка
### Добавление новых полей
1. Изучите структуру JSON в архиве
2. Добавьте поля в структуры `Component` или `Property`
3. Обновите функции `parseGPUComponent` или `parseNVSwitchComponent`
4. Увеличьте версию парсера
### Добавление новых типов файлов
1. Создайте новый файл с парсером (например, `gpu_logs.go`)
2. Добавьте парсинг в функцию `Parse()` в `parser.go`
3. Обновите документацию

View File

@@ -0,0 +1,274 @@
package nvidia
import (
"regexp"
"strconv"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var verboseRunTestingLineRegex = regexp.MustCompile(`^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),\d+\s+-\s+Testing\s+([a-zA-Z0-9_]+)\s*$`)
var runLogStartTimeRegex = regexp.MustCompile(`^Start time\s+([A-Za-z]{3}, \d{2} [A-Za-z]{3} \d{4} \d{2}:\d{2}:\d{2})\s*$`)
var runLogTestDurationRegex = regexp.MustCompile(`^Testing\s+([a-zA-Z0-9_]+)\s+\S+\s+\[\s*([0-9]+):([0-9]{2})s\s*\]\s*$`)
var modsStartLineRegex = regexp.MustCompile(`(?m)^MODS start:\s+([A-Za-z]{3}\s+[A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}\s+\d{4})\s*$`)
var gpuFieldiagOutputPathRegex = regexp.MustCompile(`(?i)gpu_fieldiag[\\/]+sxm(\d+)_sn_([^\\/]+)[\\/]+output\.log$`)
var nvswitchDevnameRegex = regexp.MustCompile(`devname=[^,\s]+,(NVSWITCH\d+)`)
type componentCheckTimes struct {
GPUDefault time.Time
NVSwitchDefault time.Time
GPUBySerial map[string]time.Time // key: GPU serial
GPUBySlot map[string]time.Time // key: GPUSXM<idx>
NVSwitchBySlot map[string]time.Time // key: NVSWITCH<idx>
}
// CollectGPUAndNVSwitchCheckTimes extracts GPU/NVSwitch check timestamps from NVIDIA logs.
// Priority:
// 1) verbose_run.log "Testing <test>" timestamps
// 2) run.log start time + cumulative durations
func CollectGPUAndNVSwitchCheckTimes(files []parser.ExtractedFile) componentCheckTimes {
gpuBySerial := make(map[string]time.Time)
gpuBySlot := make(map[string]time.Time)
nvsBySlot := make(map[string]time.Time)
for _, f := range files {
path := strings.TrimSpace(f.Path)
pathLower := strings.ToLower(path)
// Per-GPU timestamp from gpu_fieldiag/<SXMx_SN_serial>/output.log
if strings.HasSuffix(pathLower, "output.log") && strings.Contains(pathLower, "gpu_fieldiag/") {
ts := parseModsStartTime(f.Content)
if ts.IsZero() {
continue
}
matches := gpuFieldiagOutputPathRegex.FindStringSubmatch(path)
if len(matches) == 3 {
slot := "GPUSXM" + strings.TrimSpace(matches[1])
serial := strings.TrimSpace(matches[2])
if slot != "" {
gpuBySlot[slot] = ts
}
if serial != "" {
gpuBySerial[serial] = ts
}
}
}
// Per-NVSwitch timestamp and slot list from nvswitch/output.log
if strings.HasSuffix(pathLower, "nvswitch/output.log") || strings.HasSuffix(pathLower, "nvswitch\\output.log") {
ts := parseModsStartTime(f.Content)
if ts.IsZero() {
continue
}
for _, slot := range parseNVSwitchSlotsFromOutput(f.Content) {
nvsBySlot[slot] = ts
}
}
}
testStarts := make(map[string]time.Time)
if f := parser.FindFileByName(files, "verbose_run.log"); f != nil {
for testName, ts := range parseVerboseRunTestStartTimes(f.Content) {
testStarts[strings.ToLower(strings.TrimSpace(testName))] = ts
}
}
if len(testStarts) == 0 {
if f := parser.FindFileByName(files, "run.log"); f != nil {
for testName, ts := range parseRunLogTestStartTimes(f.Content) {
testStarts[strings.ToLower(strings.TrimSpace(testName))] = ts
}
}
}
return componentCheckTimes{
GPUDefault: pickFirstTestTime(testStarts, "gpu_fieldiag", "gpumem", "gpustress", "pcie", "inventory"),
NVSwitchDefault: pickFirstTestTime(testStarts, "nvswitch", "inventory"),
GPUBySerial: gpuBySerial,
GPUBySlot: gpuBySlot,
NVSwitchBySlot: nvsBySlot,
}
}
func pickFirstTestTime(testStarts map[string]time.Time, names ...string) time.Time {
for _, name := range names {
if ts := testStarts[strings.ToLower(strings.TrimSpace(name))]; !ts.IsZero() {
return ts
}
}
return time.Time{}
}
func parseVerboseRunTestStartTimes(content []byte) map[string]time.Time {
result := make(map[string]time.Time)
lines := strings.Split(string(content), "\n")
for _, line := range lines {
matches := verboseRunTestingLineRegex.FindStringSubmatch(strings.TrimSpace(line))
if len(matches) != 3 {
continue
}
ts, err := parser.ParseInDefaultArchiveLocation("2006-01-02 15:04:05", strings.TrimSpace(matches[1]))
if err != nil {
continue
}
testName := strings.ToLower(strings.TrimSpace(matches[2]))
if testName == "" {
continue
}
if _, exists := result[testName]; !exists {
result[testName] = ts
}
}
return result
}
func parseRunLogTestStartTimes(content []byte) map[string]time.Time {
lines := strings.Split(string(content), "\n")
start := time.Time{}
for _, line := range lines {
matches := runLogStartTimeRegex.FindStringSubmatch(strings.TrimSpace(line))
if len(matches) != 2 {
continue
}
parsed, err := parser.ParseInDefaultArchiveLocation("Mon, 02 Jan 2006 15:04:05", strings.TrimSpace(matches[1]))
if err != nil {
continue
}
start = parsed
break
}
if start.IsZero() {
return nil
}
result := make(map[string]time.Time)
cursor := start
for _, line := range lines {
matches := runLogTestDurationRegex.FindStringSubmatch(strings.TrimSpace(line))
if len(matches) != 4 {
continue
}
testName := strings.ToLower(strings.TrimSpace(matches[1]))
minutes, errMin := strconv.Atoi(strings.TrimSpace(matches[2]))
seconds, errSec := strconv.Atoi(strings.TrimSpace(matches[3]))
if errMin != nil || errSec != nil {
continue
}
if _, exists := result[testName]; !exists {
result[testName] = cursor
}
cursor = cursor.Add(time.Duration(minutes)*time.Minute + time.Duration(seconds)*time.Second)
}
return result
}
func parseModsStartTime(content []byte) time.Time {
matches := modsStartLineRegex.FindSubmatch(content)
if len(matches) != 2 {
return time.Time{}
}
tsRaw := strings.TrimSpace(string(matches[1]))
if tsRaw == "" {
return time.Time{}
}
ts, err := parser.ParseInDefaultArchiveLocation("Mon Jan 2 15:04:05 2006", tsRaw)
if err != nil {
return time.Time{}
}
return ts
}
func parseNVSwitchSlotsFromOutput(content []byte) []string {
matches := nvswitchDevnameRegex.FindAllSubmatch(content, -1)
if len(matches) == 0 {
return nil
}
seen := make(map[string]struct{})
out := make([]string, 0, len(matches))
for _, m := range matches {
if len(m) != 2 {
continue
}
slot := strings.ToUpper(strings.TrimSpace(string(m[1])))
if slot == "" {
continue
}
if _, exists := seen[slot]; exists {
continue
}
seen[slot] = struct{}{}
out = append(out, slot)
}
return out
}
// ApplyGPUAndNVSwitchCheckTimes writes parsed check timestamps to component status metadata.
func ApplyGPUAndNVSwitchCheckTimes(result *models.AnalysisResult, times componentCheckTimes) {
if result == nil || result.Hardware == nil {
return
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
ts := time.Time{}
if serial := strings.TrimSpace(gpu.SerialNumber); serial != "" {
ts = times.GPUBySerial[serial]
}
if ts.IsZero() {
ts = times.GPUBySlot[strings.ToUpper(strings.TrimSpace(gpu.Slot))]
}
if ts.IsZero() {
ts = times.GPUDefault
}
if ts.IsZero() {
continue
}
gpu.StatusCheckedAt = &ts
status := strings.TrimSpace(gpu.Status)
if status == "" {
status = "Unknown"
}
gpu.StatusAtCollect = &models.StatusAtCollection{
Status: status,
At: ts,
}
}
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
slot := normalizeNVSwitchSlot(strings.TrimSpace(dev.Slot))
if slot == "" {
continue
}
slot = strings.ToUpper(slot)
if !strings.EqualFold(strings.TrimSpace(dev.DeviceClass), "NVSwitch") &&
!strings.HasPrefix(slot, "NVSWITCH") {
continue
}
ts := times.NVSwitchBySlot[slot]
if ts.IsZero() {
ts = times.NVSwitchDefault
}
if ts.IsZero() {
continue
}
dev.StatusCheckedAt = &ts
status := strings.TrimSpace(dev.Status)
if status == "" {
status = "Unknown"
}
dev.StatusAtCollect = &models.StatusAtCollection{
Status: status,
At: ts,
}
}
}

View File

@@ -0,0 +1,143 @@
package nvidia
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestParseVerboseRunTestStartTimes(t *testing.T) {
content := []byte(`
2026-01-22 09:11:32,458 - Testing nvswitch
2026-01-22 09:45:36,016 - Testing gpu_fieldiag
`)
got := parseVerboseRunTestStartTimes(content)
nvs := got["nvswitch"]
if nvs.IsZero() {
t.Fatalf("expected nvswitch timestamp")
}
gpu := got["gpu_fieldiag"]
if gpu.IsZero() {
t.Fatalf("expected gpu_fieldiag timestamp")
}
if nvs.UTC().Format(time.RFC3339) != "2026-01-22T06:11:32Z" {
t.Fatalf("unexpected nvswitch timestamp: %s", nvs.Format(time.RFC3339))
}
if gpu.UTC().Format(time.RFC3339) != "2026-01-22T06:45:36Z" {
t.Fatalf("unexpected gpu_fieldiag timestamp: %s", gpu.Format(time.RFC3339))
}
}
func TestParseRunLogTestStartTimes(t *testing.T) {
content := []byte(`
Start time Thu, 22 Jan 2026 07:42:26
Testing gpumem FAILED [ 26:12s ]
Testing gpustress OK [ 7:10s ]
Testing nvswitch OK [ 9:25s ]
`)
got := parseRunLogTestStartTimes(content)
if got["gpumem"].UTC().Format(time.RFC3339) != "2026-01-22T04:42:26Z" {
t.Fatalf("unexpected gpumem start: %s", got["gpumem"].Format(time.RFC3339))
}
if got["gpustress"].UTC().Format(time.RFC3339) != "2026-01-22T05:08:38Z" {
t.Fatalf("unexpected gpustress start: %s", got["gpustress"].Format(time.RFC3339))
}
if got["nvswitch"].UTC().Format(time.RFC3339) != "2026-01-22T05:15:48Z" {
t.Fatalf("unexpected nvswitch start: %s", got["nvswitch"].Format(time.RFC3339))
}
}
func TestApplyGPUAndNVSwitchCheckTimes(t *testing.T) {
gpuTs := time.Date(2026, 1, 22, 9, 45, 36, 0, time.UTC)
nvsTs := time.Date(2026, 1, 22, 9, 11, 32, 0, time.UTC)
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "GPUSXM5", Status: "FAIL"},
},
PCIeDevices: []models.PCIeDevice{
{Slot: "NVSWITCH0", DeviceClass: "NVSwitch", Status: "PASS"},
{Slot: "NIC0", DeviceClass: "NetworkController", Status: "PASS"},
},
},
}
ApplyGPUAndNVSwitchCheckTimes(result, componentCheckTimes{
GPUBySlot: map[string]time.Time{"GPUSXM5": gpuTs},
NVSwitchBySlot: map[string]time.Time{"NVSWITCH0": nvsTs},
})
if got := result.Hardware.GPUs[0].StatusCheckedAt; got == nil || !got.Equal(gpuTs) {
t.Fatalf("expected gpu status_checked_at %s, got %v", gpuTs.Format(time.RFC3339), got)
}
if result.Hardware.GPUs[0].StatusAtCollect == nil || !result.Hardware.GPUs[0].StatusAtCollect.At.Equal(gpuTs) {
t.Fatalf("expected gpu status_at_collection.at %s", gpuTs.Format(time.RFC3339))
}
if got := result.Hardware.PCIeDevices[0].StatusCheckedAt; got == nil || !got.Equal(nvsTs) {
t.Fatalf("expected nvswitch status_checked_at %s, got %v", nvsTs.Format(time.RFC3339), got)
}
if result.Hardware.PCIeDevices[0].StatusAtCollect == nil || !result.Hardware.PCIeDevices[0].StatusAtCollect.At.Equal(nvsTs) {
t.Fatalf("expected nvswitch status_at_collection.at %s", nvsTs.Format(time.RFC3339))
}
if result.Hardware.PCIeDevices[1].StatusCheckedAt != nil {
t.Fatalf("expected non-nvswitch device status_checked_at to stay nil")
}
}
func TestCollectGPUAndNVSwitchCheckTimes_FromVerboseRun(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "verbose_run.log",
Content: []byte(`
2026-01-22 09:11:32,458 - Testing nvswitch
2026-01-22 09:45:36,016 - Testing gpu_fieldiag
`),
},
}
got := CollectGPUAndNVSwitchCheckTimes(files)
if got.GPUDefault.UTC().Format(time.RFC3339) != "2026-01-22T06:45:36Z" {
t.Fatalf("unexpected GPU check time: %s", got.GPUDefault.Format(time.RFC3339))
}
if got.NVSwitchDefault.UTC().Format(time.RFC3339) != "2026-01-22T06:11:32Z" {
t.Fatalf("unexpected NVSwitch check time: %s", got.NVSwitchDefault.Format(time.RFC3339))
}
}
func TestCollectGPUAndNVSwitchCheckTimes_FromComponentOutputLogs(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "gpu_fieldiag/SXM5_SN_1653925025497/output.log",
Content: []byte(`
$ some command
MODS start: Thu Jan 22 09:45:36 2026
`),
},
{
Path: "nvswitch/output.log",
Content: []byte(`
$ cmd devname=0000:08:00.0,NVSWITCH3 devname=0000:07:00.0,NVSWITCH2 devname=0000:06:00.0,NVSWITCH1 devname=0000:05:00.0,NVSWITCH0
MODS start: Thu Jan 22 09:11:32 2026
`),
},
}
got := CollectGPUAndNVSwitchCheckTimes(files)
if got.GPUBySerial["1653925025497"].UTC().Format(time.RFC3339) != "2026-01-22T06:45:36Z" {
t.Fatalf("unexpected GPU serial check time: %s", got.GPUBySerial["1653925025497"].Format(time.RFC3339))
}
if got.GPUBySlot["GPUSXM5"].UTC().Format(time.RFC3339) != "2026-01-22T06:45:36Z" {
t.Fatalf("unexpected GPU slot check time: %s", got.GPUBySlot["GPUSXM5"].Format(time.RFC3339))
}
if got.NVSwitchBySlot["NVSWITCH0"].UTC().Format(time.RFC3339) != "2026-01-22T06:11:32Z" {
t.Fatalf("unexpected NVSwitch0 check time: %s", got.NVSwitchBySlot["NVSWITCH0"].Format(time.RFC3339))
}
if got.NVSwitchBySlot["NVSWITCH3"].UTC().Format(time.RFC3339) != "2026-01-22T06:11:32Z" {
t.Fatalf("unexpected NVSwitch3 check time: %s", got.NVSwitchBySlot["NVSWITCH3"].Format(time.RFC3339))
}
}

View File

@@ -0,0 +1,374 @@
package nvidia
import (
"encoding/json"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var (
gpuNameWithSerialRegex = regexp.MustCompile(`^SXM(\d+)_SN_(.+)$`)
gpuNameSlotOnlyRegex = regexp.MustCompile(`^SXM(\d+)$`)
skuCodeRegex = regexp.MustCompile(`^(G\d{3})[.-](\d{4})`)
skuCodeInsideRegex = regexp.MustCompile(`(?:^|[^A-Z0-9])(?:\d)?(G\d{3})[.-](\d{4})(?:[^A-Z0-9]|$)`)
inforomPathRegex = regexp.MustCompile(`(?i)(?:^|[\\/])(checkinforom|inforom)[\\/](SXM(\d+))(?:_SN_([^\\/]+))?[\\/]fieldiag\.jso$`)
inforomProductPNRegex = regexp.MustCompile(`"product_part_num"\s*:\s*"([^"]+)"`)
inforomSerialRegex = regexp.MustCompile(`"serial_number"\s*:\s*"([^"]+)"`)
)
type testSpecData struct {
Actions []struct {
VirtualID string `json:"virtual_id"`
Args struct {
SKUToFile map[string]string `json:"sku_to_sku_json_file_map"`
ModsMapping map[string]json.RawMessage `json:"mods_mapping"`
} `json:"args"`
} `json:"actions"`
}
type inventoryFieldDiagSummary struct {
ModsRuns []struct {
ModsHeader []struct {
GPUName string `json:"GpuName"`
BoardInfo string `json:"BoardInfo"`
} `json:"ModsHeader"`
} `json:"ModsRuns"`
}
var hardcodedSKUToFileMap = map[string]string{
"G520-0200": "sku_hgx-h100-8-gpu_80g_aircooled_field.json",
"G520-0201": "sku_hgx-h100-8-gpu_80g_aircooled_field.json",
"G520-0202": "sku_hgx-h100-8-gpu_80g_tpol_field.json",
"G520-0203": "sku_hgx-h100-8-gpu_80g_tpol_field.json",
"G520-0205": "sku_hgx-h800-8-gpu_80g_aircooled_field.json",
"G520-0207": "sku_hgx-h800-8-gpu_80g_tpol_field.json",
"G520-0221": "sku_hgx-h100-8-gpu_96g_aircooled_field.json",
"G520-0236": "sku_hgx-h20-8-gpu_96g_aircooled_field.json",
"G520-0238": "sku_hgx-h20-8-gpu_96g_tpol_field.json",
"G520-0266": "sku_hgx-h20-8-gpu_141g_aircooled_field.json",
"G520-0280": "sku_hgx-h200-8-gpu_141g_aircooled_field.json",
"G520-0282": "sku_hgx-h200-8-gpu_141g_tpol_field.json",
"G520-0292": "sku_hgx-h100-8-gpu_sku_292_field.json",
}
// ApplyGPUModelsFromSKU updates GPU model names using SKU mapping from testspec.json.
// Mapping source:
// - inventory/fieldiag_summary.json: GPUName -> BoardInfo(SKU)
// - hardcoded SKU mapping
// - testspec.json: SKU -> sku_hgx-... filename (fallback for unknown hardcoded SKU)
// - inforom/*/fieldiag.jso: product_part_num (full P/N with embedded SKU)
// - testspec.json gpu_fieldiag.mods_mapping: DeviceID -> GPU generation (last fallback for description)
func ApplyGPUModelsFromSKU(files []parser.ExtractedFile, result *models.AnalysisResult) {
if result == nil || result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
return
}
skuToFile := parseSKUToFileMap(files)
generationByDeviceID := parseGenerationByDeviceID(files)
serialToSKU, slotToSKU, serialToPartNumber, slotToPartNumber := parseGPUSKUMapping(files)
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
slot := strings.TrimSpace(gpu.Slot)
serial := strings.TrimSpace(gpu.SerialNumber)
if gpu.PartNumber == "" && serial != "" {
if pn := strings.TrimSpace(serialToPartNumber[serial]); pn != "" {
gpu.PartNumber = pn
}
}
if gpu.PartNumber == "" {
if pn := strings.TrimSpace(slotToPartNumber[slot]); pn != "" {
gpu.PartNumber = pn
}
}
if partNumber := strings.TrimSpace(gpu.PartNumber); partNumber != "" {
gpu.Model = partNumber
}
sku := extractSKUFromPartNumber(gpu.PartNumber)
if sku == "" && serial != "" {
sku = serialToSKU[serial]
}
if sku == "" {
sku = slotToSKU[slot]
}
if sku != "" {
if desc := resolveDescriptionFromSKU(sku, skuToFile); desc != "" {
gpu.Description = desc
continue
}
}
if gen := resolveGenerationDescription(gpu.DeviceID, generationByDeviceID); gen != "" {
gpu.Description = gen
}
}
}
func parseSKUToFileMap(files []parser.ExtractedFile) map[string]string {
result := make(map[string]string, len(hardcodedSKUToFileMap))
for sku, file := range hardcodedSKUToFileMap {
result[normalizeSKUCode(sku)] = strings.TrimSpace(file)
}
specFile := parser.FindFileByName(files, "testspec.json")
if specFile == nil {
return result
}
var spec testSpecData
if err := json.Unmarshal(specFile.Content, &spec); err != nil {
return result
}
for _, action := range spec.Actions {
for sku, file := range action.Args.SKUToFile {
normSKU := normalizeSKUCode(sku)
if normSKU == "" {
continue
}
// Priority: hardcoded mapping wins, testspec extends unknown SKU list.
if _, exists := result[normSKU]; !exists {
result[normSKU] = strings.TrimSpace(file)
}
}
}
return result
}
func parseGenerationByDeviceID(files []parser.ExtractedFile) map[string]string {
specFile := parser.FindFileByName(files, "testspec.json")
if specFile == nil {
return nil
}
var spec testSpecData
if err := json.Unmarshal(specFile.Content, &spec); err != nil {
return nil
}
familyToGeneration := make(map[string]string)
deviceToGeneration := make(map[string]string)
for _, action := range spec.Actions {
if strings.TrimSpace(strings.ToLower(action.VirtualID)) != "gpu_fieldiag" {
continue
}
for key, raw := range action.Args.ModsMapping {
if strings.HasPrefix(key, "#mods.") {
family := strings.TrimSpace(strings.TrimPrefix(key, "#mods."))
if family == "" {
continue
}
var generation string
if err := json.Unmarshal(raw, &generation); err == nil {
generation = strings.TrimSpace(generation)
if generation != "" {
familyToGeneration[family] = generation
}
}
}
}
for key, raw := range action.Args.ModsMapping {
family := strings.TrimSpace(key)
if family == "" || strings.HasPrefix(family, "#") {
continue
}
generation := strings.TrimSpace(familyToGeneration[family])
if generation == "" {
continue
}
var deviceIDs []string
if err := json.Unmarshal(raw, &deviceIDs); err != nil {
continue
}
for _, id := range deviceIDs {
norm := normalizeDeviceIDHex(id)
if norm != "" {
deviceToGeneration[norm] = generation
}
}
}
}
return deviceToGeneration
}
func parseGPUSKUMapping(files []parser.ExtractedFile) (map[string]string, map[string]string, map[string]string, map[string]string) {
serialToSKU := make(map[string]string)
slotToSKU := make(map[string]string)
serialToPartNumber := make(map[string]string)
slotToPartNumber := make(map[string]string)
// 1) inventory/fieldiag_summary.json mapping (GPUName/BoardInfo).
var summaryFile *parser.ExtractedFile
for _, f := range files {
path := strings.ToLower(f.Path)
if strings.Contains(path, "inventory/fieldiag_summary.json") ||
strings.Contains(path, "inventory\\fieldiag_summary.json") {
summaryFile = &f
break
}
}
if summaryFile == nil {
// Continue: inforom may still contain usable part numbers.
} else {
var summaries []inventoryFieldDiagSummary
if err := json.Unmarshal(summaryFile.Content, &summaries); err == nil {
for _, summary := range summaries {
addSummaryMapping(summary, serialToSKU, slotToSKU)
}
} else {
var summary inventoryFieldDiagSummary
if err := json.Unmarshal(summaryFile.Content, &summary); err == nil {
addSummaryMapping(summary, serialToSKU, slotToSKU)
}
}
}
// 2) inforom/checkinforom fieldiag.jso mapping (full product_part_num).
for _, f := range files {
path := strings.TrimSpace(f.Path)
m := inforomPathRegex.FindStringSubmatch(path)
if len(m) == 0 {
continue
}
slot := "GPU" + strings.ToUpper(strings.TrimSpace(m[2])) // SXM7 -> GPUSXM7
serialFromPath := strings.TrimSpace(m[4])
productPNMatch := inforomProductPNRegex.FindSubmatch(f.Content)
if len(productPNMatch) == 2 {
partNumber := strings.TrimSpace(string(productPNMatch[1]))
if partNumber != "" {
slotToPartNumber[slot] = partNumber
if serialFromPath != "" {
serialToPartNumber[serialFromPath] = partNumber
}
if sku := extractSKUFromPartNumber(partNumber); sku != "" {
slotToSKU[slot] = sku
if serialFromPath != "" {
serialToSKU[serialFromPath] = sku
}
}
}
}
serialMatch := inforomSerialRegex.FindSubmatch(f.Content)
if len(serialMatch) == 2 {
serial := strings.TrimSpace(string(serialMatch[1]))
if serial != "" {
if sku := slotToSKU[slot]; sku != "" {
serialToSKU[serial] = sku
}
if pn := slotToPartNumber[slot]; pn != "" {
serialToPartNumber[serial] = pn
}
}
}
}
return serialToSKU, slotToSKU, serialToPartNumber, slotToPartNumber
}
func addSummaryMapping(summary inventoryFieldDiagSummary, serialToSKU map[string]string, slotToSKU map[string]string) {
for _, run := range summary.ModsRuns {
for _, h := range run.ModsHeader {
sku := normalizeSKUCode(h.BoardInfo)
if sku == "" {
continue
}
gpuName := strings.TrimSpace(h.GPUName)
if matches := gpuNameWithSerialRegex.FindStringSubmatch(gpuName); len(matches) == 3 {
slotToSKU["GPUSXM"+matches[1]] = sku
serialToSKU[strings.TrimSpace(matches[2])] = sku
continue
}
if matches := gpuNameSlotOnlyRegex.FindStringSubmatch(gpuName); len(matches) == 2 {
slotToSKU["GPUSXM"+matches[1]] = sku
}
}
}
}
func resolveDescriptionFromSKU(sku string, skuToFile map[string]string) string {
file := strings.ToLower(strings.TrimSpace(skuToFile[normalizeSKUCode(sku)]))
if file == "" {
return ""
}
return skuFilenameToDescription(file)
}
func normalizeSKUCode(v string) string {
s := strings.TrimSpace(strings.ToUpper(v))
if s == "" {
return ""
}
if m := skuCodeRegex.FindStringSubmatch(s); len(m) == 3 {
return m[1] + "-" + m[2]
}
return s
}
func extractSKUFromPartNumber(partNumber string) string {
s := strings.TrimSpace(strings.ToUpper(partNumber))
if s == "" {
return ""
}
if m := skuCodeInsideRegex.FindStringSubmatch(s); len(m) == 3 {
return m[1] + "-" + m[2]
}
return ""
}
func skuFilenameToDescription(file string) string {
s := strings.TrimSpace(strings.ToLower(file))
if s == "" {
return ""
}
s = strings.TrimSuffix(s, ".json")
s = strings.TrimSuffix(s, "_field")
s = strings.TrimPrefix(s, "sku_")
s = strings.ReplaceAll(s, "-", " ")
s = strings.ReplaceAll(s, "_", " ")
s = strings.Join(strings.Fields(s), " ")
return strings.TrimSpace(s)
}
func resolveGenerationDescription(deviceID int, deviceToGeneration map[string]string) string {
if deviceID <= 0 || len(deviceToGeneration) == 0 {
return ""
}
return strings.TrimSpace(deviceToGeneration[normalizeDeviceIDHex(strconv.FormatInt(int64(deviceID), 16))])
}
func normalizeDeviceIDHex(v string) string {
s := strings.TrimSpace(strings.ToLower(v))
s = strings.TrimPrefix(s, "0x")
if s == "" {
return ""
}
n, err := strconv.ParseUint(s, 16, 32)
if err != nil {
return ""
}
return "0x" + strings.ToLower(strconv.FormatUint(n, 16))
}

View File

@@ -0,0 +1,207 @@
package nvidia
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestApplyGPUModelsFromSKU(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "inventory/fieldiag_summary.json",
Content: []byte(`{
"ModsRuns":[
{"ModsHeader":[
{"GpuName":"SXM5_SN_1653925025497","BoardInfo":"G520-0280"}
]}
]
}`),
},
{
Path: "testspec.json",
Content: []byte(`{
"actions":[
{
"virtual_id":"inventory",
"args":{
"sku_to_sku_json_file_map":{
"G520-0280":"sku_hgx-h200-8-gpu_141g_aircooled_field.json"
}
}
}
]
}`),
},
}
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
SerialNumber: "1653925025497",
Model: "NVIDIA Device 2335",
},
},
},
}
ApplyGPUModelsFromSKU(files, result)
if got := result.Hardware.GPUs[0].Model; got != "NVIDIA Device 2335" {
t.Fatalf("expected model NVIDIA Device 2335, got %q", got)
}
if got := result.Hardware.GPUs[0].Description; got != "hgx h200 8 gpu 141g aircooled" {
t.Fatalf("expected description hgx h200 8 gpu 141g aircooled, got %q", got)
}
}
func TestApplyGPUModelsFromSKU_FromPartNumber(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "inforom/SXM5/fieldiag.jso",
Content: []byte(`[
[
{
"__tag__":"inforom",
"serial_number":"1653925025497",
"product_part_num":"692-2G520-0280-501"
}
]
]`),
},
{
Path: "testspec.json",
Content: []byte(`{
"actions":[
{
"virtual_id":"inventory",
"args":{
"sku_to_sku_json_file_map":{
"G520-0280":"sku_hgx-h200-8-gpu_141g_aircooled_field.json"
}
}
}
]
}`),
},
}
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
SerialNumber: "1653925025497",
Model: "NVIDIA Device 2335",
},
},
},
}
ApplyGPUModelsFromSKU(files, result)
if got := result.Hardware.GPUs[0].Model; got != "692-2G520-0280-501" {
t.Fatalf("expected model 692-2G520-0280-501, got %q", got)
}
if got := result.Hardware.GPUs[0].PartNumber; got != "692-2G520-0280-501" {
t.Fatalf("expected part number 692-2G520-0280-501, got %q", got)
}
if got := result.Hardware.GPUs[0].Description; got != "hgx h200 8 gpu 141g aircooled" {
t.Fatalf("expected description hgx h200 8 gpu 141g aircooled, got %q", got)
}
}
func TestApplyGPUModelsFromSKU_FieldDiagSummaryArrayFormat(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "inventory/fieldiag_summary.json",
Content: []byte(`[
{
"ModsRuns":[
{"ModsHeader":[
{"GpuName":"SXM5_SN_1653925025497","BoardInfo":"G520-0280"}
]}
]
}
]`),
},
{
Path: "testspec.json",
Content: []byte(`{
"actions":[
{
"virtual_id":"inventory",
"args":{
"sku_to_sku_json_file_map":{
"G520-0280":"sku_hgx-h200-8-gpu_141g_aircooled_field.json"
}
}
}
]
}`),
},
}
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
SerialNumber: "1653925025497",
Model: "NVIDIA Device 2335",
},
},
},
}
ApplyGPUModelsFromSKU(files, result)
if got := result.Hardware.GPUs[0].Model; got != "NVIDIA Device 2335" {
t.Fatalf("expected model NVIDIA Device 2335, got %q", got)
}
if got := result.Hardware.GPUs[0].Description; got != "hgx h200 8 gpu 141g aircooled" {
t.Fatalf("expected description hgx h200 8 gpu 141g aircooled, got %q", got)
}
}
func TestApplyGPUModelsFromSKU_FallbackToGenerationFromModsMapping(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "testspec.json",
Content: []byte(`{
"actions":[
{
"virtual_id":"gpu_fieldiag",
"args":{
"mods_mapping":{
"#mods.525":"Hopper",
"525":["0x2335"]
}
}
}
]
}`),
},
}
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
Model: "NVIDIA Device 2335",
DeviceID: 0x2335,
},
},
},
}
ApplyGPUModelsFromSKU(files, result)
if got := result.Hardware.GPUs[0].Description; got != "Hopper" {
t.Fatalf("expected description Hopper, got %q", got)
}
}

View File

@@ -0,0 +1,155 @@
package nvidia
import (
"bufio"
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var (
// Regex to extract devname mappings from fieldiag command line
// Example: "devname=0000:ba:00.0,SXM5_SN_1653925027099"
devnameRegex = regexp.MustCompile(`devname=([\da-fA-F:\.]+),(\w+)`)
// Regex to capture BDF from commands like:
// "$ lspci -vvvs 0000:05:00.0" or "$ lspci -vvs 0000:05:00.0"
lspciBDFRegex = regexp.MustCompile(`^\$\s+lspci\s+-[^\s]*\s+([0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}\.[0-7])\s*$`)
// Example: "Capabilities: [2f0 v1] Device Serial Number 99-d3-61-c8-ac-2d-b0-48"
deviceSerialRegex = regexp.MustCompile(`Device Serial Number\s+([0-9a-fA-F\-:]+)`)
)
// ParseInventoryLog parses inventory/output.log to extract GPU serial numbers
// from fieldiag devname parameters (e.g., "SXM5_SN_1653925027099")
func ParseInventoryLog(content []byte, result *models.AnalysisResult) error {
if result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
// No GPUs to update
return nil
}
scanner := bufio.NewScanner(strings.NewReader(string(content)))
// First pass: build mapping of PCI BDF -> Slot name and serial number from fieldiag command line
pciToSlot := make(map[string]string)
pciToSerial := make(map[string]string)
for scanner.Scan() {
line := scanner.Text()
// Look for fieldiag command with devname parameters
if strings.Contains(line, "devname=") && strings.Contains(line, "fieldiag") {
matches := devnameRegex.FindAllStringSubmatch(line, -1)
for _, match := range matches {
if len(match) == 3 {
pciBDF := match[1]
slotName := match[2]
// Extract slot number and serial from name like "SXM5_SN_1653925027099"
if strings.HasPrefix(slotName, "SXM") {
parts := strings.Split(slotName, "_")
if len(parts) >= 1 {
// Convert "SXM5" to "GPUSXM5"
slot := "GPU" + parts[0]
pciToSlot[pciBDF] = slot
}
// Extract serial number from "SXM5_SN_1653925027099"
if len(parts) == 3 && parts[1] == "SN" {
serial := parts[2]
pciToSerial[pciBDF] = serial
}
}
}
}
}
}
// Second pass: assign serial numbers to GPUs based on slot mapping
for i := range result.Hardware.GPUs {
slot := result.Hardware.GPUs[i].Slot
// Find the PCI BDF for this slot
var foundSerial string
for pciBDF, mappedSlot := range pciToSlot {
if mappedSlot == slot {
// Found matching slot, get serial number
if serial, ok := pciToSerial[pciBDF]; ok {
foundSerial = serial
break
}
}
}
if foundSerial != "" {
result.Hardware.GPUs[i].SerialNumber = foundSerial
}
}
// Third pass: parse lspci "Device Serial Number" by BDF (useful for NVSwitch serials).
bdfToDeviceSerial := make(map[string]string)
currentBDF := ""
scanner = bufio.NewScanner(strings.NewReader(string(content)))
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
if m := lspciBDFRegex.FindStringSubmatch(line); len(m) == 2 {
currentBDF = strings.ToLower(strings.TrimSpace(m[1]))
continue
}
if currentBDF == "" {
continue
}
if m := deviceSerialRegex.FindStringSubmatch(line); len(m) == 2 {
serial := strings.TrimSpace(m[1])
if serial != "" {
bdfToDeviceSerial[currentBDF] = serial
}
currentBDF = ""
}
}
// Apply to PCIe devices first (includes NVSwitch).
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
if strings.TrimSpace(dev.SerialNumber) != "" {
continue
}
bdf := strings.ToLower(strings.TrimSpace(dev.BDF))
if bdf == "" {
continue
}
if serial := bdfToDeviceSerial[bdf]; serial != "" {
dev.SerialNumber = serial
}
}
// Apply to GPUs only if GPU serial is still empty (do not overwrite prod serial from devname).
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
if strings.TrimSpace(gpu.SerialNumber) != "" {
continue
}
bdf := strings.ToLower(strings.TrimSpace(gpu.BDF))
if bdf == "" {
continue
}
if serial := bdfToDeviceSerial[bdf]; serial != "" {
gpu.SerialNumber = serial
}
}
return scanner.Err()
}
// findInventoryOutputLog finds the inventory/output.log file
func findInventoryOutputLog(files []parser.ExtractedFile) *parser.ExtractedFile {
for _, f := range files {
// Look for inventory/output.log
path := strings.ToLower(f.Path)
if strings.Contains(path, "inventory/output.log") ||
strings.Contains(path, "inventory\\output.log") {
return &f
}
}
return nil
}

View File

@@ -0,0 +1,126 @@
package nvidia
import (
"os"
"path/filepath"
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestParseInventoryLog(t *testing.T) {
// Test with the real archive
archivePath := filepath.Join("../../../../example", "A514359X5A09844_logs-20260115-151707.tar")
// Check if file exists
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
// Extract files from archive
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
// Find inventory/output.log
var inventoryLog *parser.ExtractedFile
for _, f := range files {
if strings.Contains(f.Path, "inventory/output.log") {
inventoryLog = &f
break
}
}
if inventoryLog == nil {
t.Fatal("inventory/output.log not found")
}
content := string(inventoryLog.Content)
// Test devname regex - this extracts both slot mapping and serial numbers
t.Log("Testing devname extraction:")
lines := strings.Split(content, "\n")
serialCount := 0
for i, line := range lines {
if strings.Contains(line, "devname=") && strings.Contains(line, "fieldiag") {
t.Logf("Line %d: Found fieldiag command", i)
matches := devnameRegex.FindAllStringSubmatch(line, -1)
t.Logf(" Found %d devname matches", len(matches))
for _, match := range matches {
if len(match) == 3 {
pciBDF := match[1]
slotName := match[2]
t.Logf(" PCI: %s -> Slot: %s", pciBDF, slotName)
// Extract serial number from slot name
if strings.HasPrefix(slotName, "SXM") {
parts := strings.Split(slotName, "_")
if len(parts) == 3 && parts[1] == "SN" {
serial := parts[2]
t.Logf(" Serial: %s", serial)
serialCount++
}
}
}
}
break
}
}
t.Logf("\nTotal GPU serials extracted: %d", serialCount)
if serialCount == 0 {
t.Error("Expected to find GPU serial numbers, but found none")
}
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
func TestParseInventoryLog_AssignsNVSwitchSerialByBDF(t *testing.T) {
content := []byte(`
$ lspci -vvvs 0000:05:00.0
05:00.0 Bridge: NVIDIA Corporation Device 22a3 (rev a1)
Capabilities: [2f0 v1] Device Serial Number 99-d3-61-c8-ac-2d-b0-48
/tmp/fieldiag devname=0000:ba:00.0,SXM5_SN_1653925025497 fieldiag
`)
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
BDF: "0000:ba:00.0",
SerialNumber: "",
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "NVSWITCH0",
BDF: "0000:05:00.0",
SerialNumber: "",
},
},
},
}
if err := ParseInventoryLog(content, result); err != nil {
t.Fatalf("ParseInventoryLog failed: %v", err)
}
if got := result.Hardware.PCIeDevices[0].SerialNumber; got != "99-d3-61-c8-ac-2d-b0-48" {
t.Fatalf("expected NVSwitch serial 99-d3-61-c8-ac-2d-b0-48, got %q", got)
}
// GPU serial should come from fieldiag devname mapping.
if got := result.Hardware.GPUs[0].SerialNumber; got != "1653925025497" {
t.Fatalf("expected GPU serial 1653925025497, got %q", got)
}
}

View File

@@ -0,0 +1,370 @@
package nvidia
import (
"bufio"
"fmt"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var (
nvflashAdapterRegex = regexp.MustCompile(`^Adapter:\s+.+\(([\da-fA-F]+),([\da-fA-F]+),([\da-fA-F]+),([\da-fA-F]+)\)\s+S:([0-9A-Fa-f]{2}),B:([0-9A-Fa-f]{2}),D:([0-9A-Fa-f]{2}),F:([0-9A-Fa-f])`)
gpuPCIIDRegex = regexp.MustCompile(`^GPU_SXM(\d+)_PCIID:\s*(\S+)$`)
nvsPCIIDRegex = regexp.MustCompile(`^NVSWITCH_NVSWITCH(\d+)_PCIID:\s*(\S+)$`)
)
var nvswitchProjectToPartNumber = map[string]string{
"5612-0002": "965-25612-0002-000",
}
type nvflashDeviceRecord struct {
BDF string
VendorID int
DeviceID int
SSVendorID int
SSDeviceID int
Version string
BoardID string
HierarchyID string
ChipSKU string
Project string
}
// ParseNVFlashVerboseLog parses inventory/nvflash_verbose.log and applies firmware versions
// to already discovered devices using PCI BDF with optional ID checks.
func ParseNVFlashVerboseLog(content []byte, result *models.AnalysisResult) error {
if result == nil || result.Hardware == nil {
return nil
}
records := parseNVFlashRecords(content)
if len(records) == 0 {
return nil
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
bdf := normalizePCIBDF(gpu.BDF)
if bdf == "" {
continue
}
rec, ok := records[bdf]
if !ok {
continue
}
if gpu.DeviceID != 0 && rec.DeviceID != 0 && gpu.DeviceID != rec.DeviceID {
continue
}
if gpu.VendorID != 0 && rec.VendorID != 0 && gpu.VendorID != rec.VendorID {
continue
}
if strings.TrimSpace(rec.Version) != "" {
gpu.Firmware = strings.TrimSpace(rec.Version)
}
}
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
bdf := normalizePCIBDF(dev.BDF)
if bdf == "" {
continue
}
rec, ok := records[bdf]
if !ok {
continue
}
if dev.DeviceID != 0 && rec.DeviceID != 0 && dev.DeviceID != rec.DeviceID {
continue
}
if dev.VendorID != 0 && rec.VendorID != 0 && dev.VendorID != rec.VendorID {
continue
}
if strings.EqualFold(strings.TrimSpace(dev.DeviceClass), "NVSwitch") || strings.HasPrefix(strings.ToUpper(strings.TrimSpace(dev.Slot)), "NVSWITCH") {
if mappedPN := mapNVSwitchPartNumberByProject(rec.Project); mappedPN != "" {
dev.PartNumber = mappedPN
}
}
if strings.TrimSpace(rec.Version) != "" && strings.TrimSpace(dev.PartNumber) == "" {
// Fallback for non-NVSwitch devices where part number is unknown.
dev.PartNumber = strings.TrimSpace(rec.Version)
}
}
appendNVFlashFirmwareEntries(result, records)
return nil
}
// ApplyInventoryPCIIDs enriches devices with PCI BDFs from inventory/inventory.log.
func ApplyInventoryPCIIDs(content []byte, result *models.AnalysisResult) error {
if result == nil || result.Hardware == nil {
return nil
}
slotToBDF := parseInventoryPCIIDs(content)
if len(slotToBDF) == 0 {
return nil
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
if strings.TrimSpace(gpu.BDF) != "" {
continue
}
if bdf := slotToBDF[strings.TrimSpace(gpu.Slot)]; bdf != "" {
gpu.BDF = bdf
}
}
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
if strings.TrimSpace(dev.BDF) != "" {
continue
}
if bdf := slotToBDF[normalizeNVSwitchSlot(strings.TrimSpace(dev.Slot))]; bdf != "" {
dev.BDF = bdf
}
}
return nil
}
func parseNVFlashRecords(content []byte) map[string]nvflashDeviceRecord {
scanner := bufio.NewScanner(strings.NewReader(string(content)))
records := make(map[string]nvflashDeviceRecord)
var current *nvflashDeviceRecord
commit := func() {
if current == nil {
return
}
if current.BDF == "" || strings.TrimSpace(current.Version) == "" {
return
}
records[current.BDF] = *current
}
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
if m := nvflashAdapterRegex.FindStringSubmatch(line); len(m) == 9 {
commit()
vendorID, _ := parseHexInt(m[1])
deviceID, _ := parseHexInt(m[2])
ssVendorID, _ := parseHexInt(m[3])
ssDeviceID, _ := parseHexInt(m[4])
current = &nvflashDeviceRecord{
BDF: fmt.Sprintf("0000:%s:%s.%s", strings.ToLower(m[6]), strings.ToLower(m[7]), strings.ToLower(m[8])),
VendorID: vendorID,
DeviceID: deviceID,
SSVendorID: ssVendorID,
SSDeviceID: ssDeviceID,
}
continue
}
if current == nil {
continue
}
if !strings.Contains(line, ":") {
continue
}
parts := strings.SplitN(line, ":", 2)
key := strings.TrimSpace(parts[0])
val := strings.TrimSpace(parts[1])
if key == "" || val == "" {
continue
}
switch key {
case "Version":
current.Version = val
case "Board ID":
current.BoardID = strings.ToLower(strings.TrimPrefix(val, "0x"))
case "Vendor ID":
if v, err := parseHexInt(val); err == nil {
current.VendorID = v
}
case "Device ID":
if v, err := parseHexInt(val); err == nil {
current.DeviceID = v
}
case "Hierarchy ID":
current.HierarchyID = val
case "Chip SKU":
current.ChipSKU = val
case "Project":
current.Project = val
}
}
commit()
return records
}
func parseInventoryPCIIDs(content []byte) map[string]string {
scanner := bufio.NewScanner(strings.NewReader(string(content)))
slotToBDF := make(map[string]string)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
if m := gpuPCIIDRegex.FindStringSubmatch(line); len(m) == 3 {
slotToBDF["GPUSXM"+m[1]] = normalizePCIBDF(m[2])
continue
}
if m := nvsPCIIDRegex.FindStringSubmatch(line); len(m) == 3 {
slotToBDF["NVSWITCH"+m[1]] = normalizePCIBDF(m[2])
}
}
return slotToBDF
}
func normalizePCIBDF(v string) string {
s := strings.TrimSpace(strings.ToLower(v))
if s == "" {
return ""
}
// bus:device.func -> 0000:bus:device.func
short := regexp.MustCompile(`^([0-9a-f]{2}:[0-9a-f]{2}\.[0-7])$`)
if m := short.FindStringSubmatch(s); len(m) == 2 {
return "0000:" + m[1]
}
full := regexp.MustCompile(`^([0-9a-f]{4}:[0-9a-f]{2}:[0-9a-f]{2}\.[0-7])$`)
if m := full.FindStringSubmatch(s); len(m) == 2 {
return m[1]
}
return s
}
func parseHexInt(v string) (int, error) {
s := strings.TrimSpace(strings.ToLower(v))
s = strings.TrimPrefix(s, "0x")
if s == "" {
return 0, fmt.Errorf("empty hex value")
}
n, err := strconv.ParseInt(s, 16, 32)
if err != nil {
return 0, err
}
return int(n), nil
}
func findNVFlashVerboseLog(files []parser.ExtractedFile) *parser.ExtractedFile {
for _, f := range files {
path := strings.ToLower(f.Path)
if strings.Contains(path, "inventory/nvflash_verbose.log") ||
strings.Contains(path, "inventory\\nvflash_verbose.log") {
return &f
}
}
return nil
}
func findInventoryInfoLog(files []parser.ExtractedFile) *parser.ExtractedFile {
for _, f := range files {
path := strings.ToLower(f.Path)
if strings.Contains(path, "inventory/inventory.log") ||
strings.Contains(path, "inventory\\inventory.log") {
return &f
}
}
return nil
}
func appendNVFlashFirmwareEntries(result *models.AnalysisResult, records map[string]nvflashDeviceRecord) {
if result == nil || result.Hardware == nil {
return
}
if result.Hardware.Firmware == nil {
result.Hardware.Firmware = make([]models.FirmwareInfo, 0)
}
seen := make(map[string]struct{})
for _, fw := range result.Hardware.Firmware {
key := strings.ToLower(strings.TrimSpace(fw.DeviceName)) + "|" + strings.TrimSpace(fw.Version)
seen[key] = struct{}{}
}
for _, gpu := range result.Hardware.GPUs {
version := strings.TrimSpace(gpu.Firmware)
if version == "" {
continue
}
model := strings.TrimSpace(gpu.PartNumber)
if model == "" {
model = strings.TrimSpace(gpu.Model)
}
if model == "" {
model = strings.TrimSpace(gpu.Slot)
}
deviceName := fmt.Sprintf("GPU %s (%s)", strings.TrimSpace(gpu.Slot), model)
key := strings.ToLower(deviceName) + "|" + version
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
DeviceName: deviceName,
Version: version,
})
}
for _, dev := range result.Hardware.PCIeDevices {
bdf := normalizePCIBDF(dev.BDF)
rec, ok := records[bdf]
if !ok {
continue
}
version := strings.TrimSpace(rec.Version)
if version == "" {
continue
}
slot := strings.TrimSpace(dev.Slot)
deviceClass := strings.TrimSpace(dev.DeviceClass)
if strings.EqualFold(deviceClass, "NVSwitch") || strings.HasPrefix(strings.ToUpper(slot), "NVSWITCH") {
model := slot
if pn := strings.TrimSpace(dev.PartNumber); pn != "" {
model = pn
}
deviceName := fmt.Sprintf("NVSwitch %s (%s)", slot, model)
key := strings.ToLower(deviceName) + "|" + version
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
DeviceName: deviceName,
Version: version,
})
}
}
}
func mapNVSwitchPartNumberByProject(project string) string {
key := strings.TrimSpace(strings.ToLower(project))
if key == "" {
return ""
}
return strings.TrimSpace(nvswitchProjectToPartNumber[key])
}

View File

@@ -0,0 +1,93 @@
package nvidia
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestApplyInventoryPCIIDsAndNVFlashFirmware(t *testing.T) {
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
DeviceID: 0x2335,
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "NVSWITCHNVSWITCH2",
DeviceID: 0x22a3,
},
},
},
}
inventoryLog := []byte(`
GPU_SXM5_PCIID: 0000:ba:00.0
NVSWITCH_NVSWITCH2_PCIID: 0000:07:00.0
`)
nvflashLog := []byte(`
Adapter: Graphics Device (10DE,2335,10DE,18BE) S:00,B:BA,D:00,F:00
Version : 96.00.D0.00.03
Board ID : 0x053C
Vendor ID : 0x10DE
Device ID : 0x2335
Hierarchy ID : Normal Board
Chip SKU : 895-0
Project : G520-0280
Adapter: Graphics Device (10DE,22A3,10DE,1796) S:00,B:07,D:00,F:00
Version : 96.10.6D.00.01
Board ID : 0x03B7
Vendor ID : 0x10DE
Device ID : 0x22A3
Hierarchy ID : Normal Board
Chip SKU : 890-0
Project : 5612-0002
`)
if err := ApplyInventoryPCIIDs(inventoryLog, result); err != nil {
t.Fatalf("ApplyInventoryPCIIDs failed: %v", err)
}
if err := ParseNVFlashVerboseLog(nvflashLog, result); err != nil {
t.Fatalf("ParseNVFlashVerboseLog failed: %v", err)
}
if got := result.Hardware.GPUs[0].BDF; got != "0000:ba:00.0" {
t.Fatalf("expected GPU BDF 0000:ba:00.0, got %q", got)
}
if got := result.Hardware.GPUs[0].Firmware; got != "96.00.D0.00.03" {
t.Fatalf("expected GPU firmware 96.00.D0.00.03, got %q", got)
}
if got := result.Hardware.PCIeDevices[0].BDF; got != "0000:07:00.0" {
t.Fatalf("expected NVSwitch BDF 0000:07:00.0, got %q", got)
}
if got := result.Hardware.PCIeDevices[0].PartNumber; got != "965-25612-0002-000" {
t.Fatalf("expected NVSwitch part number 965-25612-0002-000, got %q", got)
}
if len(result.Hardware.Firmware) == 0 {
t.Fatalf("expected firmware entries to be populated from nvflash log")
}
hasGPUFW := false
hasNVSwitchFW := false
for _, fw := range result.Hardware.Firmware {
if fw.Version == "96.00.D0.00.03" {
hasGPUFW = true
}
if fw.Version == "96.10.6D.00.01" {
hasNVSwitchFW = true
}
}
if !hasGPUFW {
t.Fatalf("expected GPU firmware version 96.00.D0.00.03 in hardware firmware list")
}
if !hasNVSwitchFW {
t.Fatalf("expected NVSwitch firmware version 96.10.6D.00.01 in hardware firmware list")
}
}

View File

@@ -14,7 +14,7 @@ import (
// parserVersion - version of this parser module
// IMPORTANT: Increment this version when making changes to parser logic!
const parserVersion = "1.1.0"
const parserVersion = "1.4"
func init() {
parser.Register(&Parser{})
@@ -70,7 +70,7 @@ func (p *Parser) Detect(files []parser.ExtractedFile) int {
if strings.HasSuffix(path, "output.log") {
// Check if it contains dmidecode output
if strings.Contains(string(f.Content), "dmidecode") ||
strings.Contains(string(f.Content), "System Information") {
strings.Contains(string(f.Content), "System Information") {
confidence += 10
}
}
@@ -105,6 +105,9 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
result.Hardware = &models.HardwareConfig{
GPUs: make([]models.GPU, 0),
}
gpuStatuses := make(map[string]string)
gpuFailureDetails := make(map[string]string)
nvswitchStatuses := make(map[string]string)
// Parse output.log first (contains dmidecode system info)
// Find the output.log file that contains dmidecode output
@@ -124,18 +127,75 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
}
}
// Parse inventory/output.log (contains GPU serial numbers from lspci)
inventoryLogFile := findInventoryOutputLog(files)
if inventoryLogFile != nil {
if err := ParseInventoryLog(inventoryLogFile.Content, result); err != nil {
// Log error but continue parsing other files
_ = err // Ignore error for now
}
}
// Parse inventory/inventory.log to enrich PCI BDF mapping for components.
inventoryInfoLog := findInventoryInfoLog(files)
if inventoryInfoLog != nil {
if err := ApplyInventoryPCIIDs(inventoryInfoLog.Content, result); err != nil {
_ = err
}
}
// Enhance GPU model names using SKU mapping from testspec + inventory summary.
ApplyGPUModelsFromSKU(files, result)
// Parse inventory/nvflash_verbose.log and apply firmware versions by BDF + IDs.
// This runs after GPU model/part-number enrichment so firmware tab uses final model labels.
nvflashVerbose := findNVFlashVerboseLog(files)
if nvflashVerbose != nil {
if err := ParseNVFlashVerboseLog(nvflashVerbose.Content, result); err != nil {
_ = err
}
}
// Parse summary.json (test results summary)
if f := parser.FindFileByName(files, "summary.json"); f != nil {
events := ParseSummaryJSON(f.Content)
result.Events = append(result.Events, events...)
for componentID, status := range CollectGPUStatusesFromSummaryJSON(f.Content) {
gpuStatuses[componentID] = mergeGPUStatus(gpuStatuses[componentID], status)
}
for slot, status := range CollectNVSwitchStatusesFromSummaryJSON(f.Content) {
nvswitchStatuses[slot] = mergeGPUStatus(nvswitchStatuses[slot], status)
}
for componentID, detail := range CollectGPUFailureDetailsFromSummaryJSON(f.Content) {
if _, exists := gpuFailureDetails[componentID]; !exists && strings.TrimSpace(detail) != "" {
gpuFailureDetails[componentID] = strings.TrimSpace(detail)
}
}
}
// Parse summary.csv (alternative format)
if f := parser.FindFileByName(files, "summary.csv"); f != nil {
csvEvents := ParseSummaryCSV(f.Content)
result.Events = append(result.Events, csvEvents...)
for componentID, status := range CollectGPUStatusesFromSummaryCSV(f.Content) {
gpuStatuses[componentID] = mergeGPUStatus(gpuStatuses[componentID], status)
}
for slot, status := range CollectNVSwitchStatusesFromSummaryCSV(f.Content) {
nvswitchStatuses[slot] = mergeGPUStatus(nvswitchStatuses[slot], status)
}
for componentID, detail := range CollectGPUFailureDetailsFromSummaryCSV(f.Content) {
if _, exists := gpuFailureDetails[componentID]; !exists && strings.TrimSpace(detail) != "" {
gpuFailureDetails[componentID] = strings.TrimSpace(detail)
}
}
}
// Apply per-GPU PASS/FAIL status derived from summary files.
ApplyGPUStatuses(result, gpuStatuses)
ApplyGPUFailureDetails(result, gpuFailureDetails)
ApplyNVSwitchStatuses(result, nvswitchStatuses)
ApplyGPUAndNVSwitchCheckTimes(result, CollectGPUAndNVSwitchCheckTimes(files))
// Parse GPU field diagnostics logs
gpuFieldiagFiles := parser.FindFileByPattern(files, "gpu_fieldiag/", ".log")
for _, f := range gpuFieldiagFiles {
@@ -158,7 +218,7 @@ func findDmidecodeOutputLog(files []parser.ExtractedFile) *parser.ExtractedFile
// Check if it contains dmidecode output
content := string(f.Content)
if strings.Contains(content, "dmidecode") &&
strings.Contains(content, "System Information") {
strings.Contains(content, "System Information") {
return &f
}
}

View File

@@ -0,0 +1,291 @@
package nvidia
import (
"os"
"path/filepath"
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestNVIDIAParser_RealArchive(t *testing.T) {
// Test with the real archive that was reported as problematic
archivePath := filepath.Join("../../../../example", "A514359X5A09844_logs-20260115-151707.tar")
// Check if file exists
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
// Extract files from archive
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
// Check if inventory/output.log exists
hasInventoryLog := false
for _, f := range files {
if filepath.Base(f.Path) == "output.log" {
t.Logf("Found file: %s", f.Path)
}
if f.Path == "./inventory/output.log" || f.Path == "inventory/output.log" {
hasInventoryLog = true
t.Logf("Found inventory/output.log with %d bytes", len(f.Content))
}
}
if !hasInventoryLog {
t.Error("inventory/output.log not found in extracted files")
}
// Create parser and parse
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
// Verify basic system info
if result.Hardware.BoardInfo.Manufacturer == "" {
t.Error("Expected Manufacturer to be set")
}
if result.Hardware.BoardInfo.ProductName == "" {
t.Error("Expected ProductName to be set")
}
if result.Hardware.BoardInfo.SerialNumber == "" {
t.Error("Expected SerialNumber to be set")
}
t.Logf("System Info:")
t.Logf(" Manufacturer: %s", result.Hardware.BoardInfo.Manufacturer)
t.Logf(" Product: %s", result.Hardware.BoardInfo.ProductName)
t.Logf(" Serial: %s", result.Hardware.BoardInfo.SerialNumber)
// Verify GPUs were found
if len(result.Hardware.GPUs) == 0 {
t.Error("Expected to find GPUs")
}
t.Logf("\nFound %d GPUs:", len(result.Hardware.GPUs))
gpusWithSerials := 0
for _, gpu := range result.Hardware.GPUs {
t.Logf(" %s: %s (Firmware: %s, Serial: %s, BDF: %s)",
gpu.Slot, gpu.Model, gpu.Firmware, gpu.SerialNumber, gpu.BDF)
if gpu.SerialNumber != "" {
gpusWithSerials++
}
}
// Verify that GPU serial numbers were extracted
if gpusWithSerials == 0 {
t.Error("Expected at least some GPUs to have serial numbers")
}
t.Logf("\nGPUs with serial numbers: %d/%d", gpusWithSerials, len(result.Hardware.GPUs))
// Check events for SXM2 failures
t.Logf("\nTotal events: %d", len(result.Events))
// Look for the specific serial or SXM2
sxm2Events := 0
for _, event := range result.Events {
desc := event.Description + " " + event.RawData + " " + event.EventType
if contains(desc, "SXM2") || contains(desc, "1653925025827") {
t.Logf(" SXM2 Event: [%s] %s (Severity: %s)", event.EventType, event.Description, event.Severity)
sxm2Events++
}
}
if sxm2Events == 0 {
t.Error("Expected to find events for SXM2 (faulty GPU 1653925025827)")
}
t.Logf("\nSXM2 failure events: %d", sxm2Events)
}
func TestNVIDIAParser_GPUStatusFromSummary_RealArchive07900(t *testing.T) {
archivePath := filepath.Join("../../../../example", "A514359X5A07900_logs-20260122-074208.tar")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
if result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
t.Fatalf("expected GPUs in parsed result")
}
statusBySerial := make(map[string]string, len(result.Hardware.GPUs))
for _, gpu := range result.Hardware.GPUs {
if gpu.SerialNumber != "" {
statusBySerial[gpu.SerialNumber] = gpu.Status
}
}
if got := statusBySerial["1653925025497"]; got != "FAIL" {
t.Fatalf("expected GPU serial 1653925025497 status FAIL, got %q", got)
}
for serial, st := range statusBySerial {
if serial == "1653925025497" {
continue
}
if st != "PASS" {
t.Fatalf("expected non-failing GPU serial %s status PASS, got %q", serial, st)
}
}
}
func TestNVIDIAParser_GPUErrorDetailsFromSummary_RealArchive07900(t *testing.T) {
archivePath := filepath.Join("../../../../example", "A514359X5A07900_logs-20260122-074208.tar")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
if result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
t.Fatalf("expected GPUs in parsed result")
}
errBySerial := make(map[string]string, len(result.Hardware.GPUs))
for _, gpu := range result.Hardware.GPUs {
if gpu.SerialNumber != "" {
errBySerial[gpu.SerialNumber] = gpu.ErrorDescription
}
}
if got := errBySerial["1653925025497"]; got != "Row remapping failed" {
t.Fatalf("expected GPU serial 1653925025497 error Row remapping failed, got %q", got)
}
}
func TestNVIDIAParser_GPUModelFromSKU_RealArchive07900(t *testing.T) {
archivePath := filepath.Join("../../../../example", "A514359X5A07900_logs-20260122-074208.tar")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
if result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
t.Fatalf("expected GPUs in parsed result")
}
found := false
for _, gpu := range result.Hardware.GPUs {
if gpu.Model == "692-2G520-0280-501" && gpu.Description == "hgx h200 8 gpu 141g aircooled" {
found = true
break
}
}
if !found {
t.Fatalf("expected at least one GPU with model 692-2G520-0280-501 and description hgx h200 8 gpu 141g aircooled")
}
}
func TestNVIDIAParser_ComponentCheckTimes_RealArchive07900(t *testing.T) {
archivePath := filepath.Join("../../../../example", "A514359X5A07900_logs-20260122-074208.tar")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware in parsed result")
}
expectedGPU := time.Date(2026, 1, 22, 6, 45, 36, 0, time.UTC)
expectedNVSwitch := time.Date(2026, 1, 22, 6, 11, 32, 0, time.UTC)
if len(result.Hardware.GPUs) == 0 {
t.Fatalf("expected GPUs in parsed result")
}
for _, gpu := range result.Hardware.GPUs {
if !gpu.StatusCheckedAt.Equal(expectedGPU) {
t.Fatalf("expected GPU %s status_checked_at %s, got %s", gpu.Slot, expectedGPU.Format(time.RFC3339), gpu.StatusCheckedAt.Format(time.RFC3339))
}
if gpu.StatusAtCollect == nil || !gpu.StatusAtCollect.At.Equal(expectedGPU) {
t.Fatalf("expected GPU %s status_at_collection.at %s", gpu.Slot, expectedGPU.Format(time.RFC3339))
}
}
nvsCount := 0
for _, dev := range result.Hardware.PCIeDevices {
slot := normalizeNVSwitchSlot(dev.Slot)
if slot == "" {
continue
}
if dev.DeviceClass != "NVSwitch" && len(slot) < len("NVSWITCH") {
continue
}
if dev.DeviceClass != "NVSwitch" && slot[:len("NVSWITCH")] != "NVSWITCH" {
continue
}
nvsCount++
if !dev.StatusCheckedAt.Equal(expectedNVSwitch) {
t.Fatalf("expected NVSwitch %s status_checked_at %s, got %s", dev.Slot, expectedNVSwitch.Format(time.RFC3339), dev.StatusCheckedAt.Format(time.RFC3339))
}
if dev.StatusAtCollect == nil || !dev.StatusAtCollect.At.Equal(expectedNVSwitch) {
t.Fatalf("expected NVSwitch %s status_at_collection.at %s", dev.Slot, expectedNVSwitch.Format(time.RFC3339))
}
}
if nvsCount == 0 {
t.Fatalf("expected NVSwitch devices in parsed result")
}
}
func contains(s, substr string) bool {
return len(s) >= len(substr) && (s == substr || len(s) > len(substr) &&
(s[:len(substr)] == substr || s[len(s)-len(substr):] == substr ||
findSubstring(s, substr)))
}
func findSubstring(s, substr string) bool {
for i := 0; i <= len(s)-len(substr); i++ {
if s[i:i+len(substr)] == substr {
return true
}
}
return false
}

View File

@@ -4,6 +4,7 @@ import (
"encoding/csv"
"encoding/json"
"fmt"
"regexp"
"strings"
"time"
@@ -20,6 +21,9 @@ type SummaryEntry struct {
IgnoreError string `json:"Ignore Error"`
}
var gpuComponentIDRegex = regexp.MustCompile(`^SXM(\d+)_SN_(.+)$`)
var nvswitchInventoryComponentRegex = regexp.MustCompile(`^NVSWITCH_(NVSWITCH\d+)_`)
// ParseSummaryJSON parses summary.json file and returns events
func ParseSummaryJSON(content []byte) []models.Event {
var entries []SummaryEntry
@@ -92,6 +96,340 @@ func ParseSummaryCSV(content []byte) []models.Event {
return events
}
// CollectGPUStatusesFromSummaryJSON extracts per-GPU PASS/FAIL status from summary.json.
// Key format in returned map is component ID from summary (e.g. "SXM5_SN_1653925025497").
func CollectGPUStatusesFromSummaryJSON(content []byte) map[string]string {
var entries []SummaryEntry
if err := json.Unmarshal(content, &entries); err != nil {
return nil
}
statuses := make(map[string]string)
for _, entry := range entries {
component := strings.TrimSpace(entry.ComponentID)
if component == "" || !gpuComponentIDRegex.MatchString(component) {
continue
}
current := statuses[component]
next := "PASS"
if !isSummaryJSONRecordPassing(entry.ErrorCode, entry.Notes) {
next = "FAIL"
}
statuses[component] = mergeGPUStatus(current, next)
}
return statuses
}
// CollectGPUFailureDetailsFromSummaryJSON extracts per-GPU failure details from summary.json.
// Key format in returned map is component ID from summary (e.g. "SXM5_SN_1653925025497").
func CollectGPUFailureDetailsFromSummaryJSON(content []byte) map[string]string {
var entries []SummaryEntry
if err := json.Unmarshal(content, &entries); err != nil {
return nil
}
details := make(map[string]string)
for _, entry := range entries {
component := strings.TrimSpace(entry.ComponentID)
if component == "" || !gpuComponentIDRegex.MatchString(component) {
continue
}
if isSummaryJSONRecordPassing(entry.ErrorCode, entry.Notes) {
continue
}
note := strings.TrimSpace(entry.Notes)
if note == "" || strings.EqualFold(note, "OK") {
note = strings.TrimSpace(entry.ErrorCode)
}
if note == "" {
continue
}
// Keep first non-empty detail to avoid noisy overrides.
if _, exists := details[component]; !exists {
details[component] = note
}
}
return details
}
// CollectGPUStatusesFromSummaryCSV extracts per-GPU PASS/FAIL status from summary.csv.
// Key format in returned map is component ID from summary (e.g. "SXM5_SN_1653925025497").
func CollectGPUStatusesFromSummaryCSV(content []byte) map[string]string {
reader := csv.NewReader(strings.NewReader(string(content)))
records, err := reader.ReadAll()
if err != nil {
return nil
}
statuses := make(map[string]string)
for i, record := range records {
if i == 0 || len(record) < 7 {
continue
}
component := strings.TrimSpace(record[5])
if component == "" || !gpuComponentIDRegex.MatchString(component) {
continue
}
errorCode := strings.TrimSpace(record[0])
notes := strings.TrimSpace(record[6])
current := statuses[component]
next := "PASS"
if !isSummaryCSVRecordPassing(errorCode, notes) {
next = "FAIL"
}
statuses[component] = mergeGPUStatus(current, next)
}
return statuses
}
// CollectNVSwitchStatusesFromSummaryJSON extracts per-NVSwitch PASS/FAIL status from summary.json.
// Key format in returned map is normalized switch slot (e.g. "NVSWITCH0").
func CollectNVSwitchStatusesFromSummaryJSON(content []byte) map[string]string {
var entries []SummaryEntry
if err := json.Unmarshal(content, &entries); err != nil {
return nil
}
statuses := make(map[string]string)
for _, entry := range entries {
component := strings.TrimSpace(entry.ComponentID)
matches := nvswitchInventoryComponentRegex.FindStringSubmatch(component)
if len(matches) != 2 {
continue
}
slot := strings.TrimSpace(matches[1])
if slot == "" {
continue
}
current := statuses[slot]
next := "PASS"
if !isSummaryJSONRecordPassing(entry.ErrorCode, entry.Notes) {
next = "FAIL"
}
statuses[slot] = mergeGPUStatus(current, next)
}
return statuses
}
// CollectNVSwitchStatusesFromSummaryCSV extracts per-NVSwitch PASS/FAIL status from summary.csv.
// Key format in returned map is normalized switch slot (e.g. "NVSWITCH0").
func CollectNVSwitchStatusesFromSummaryCSV(content []byte) map[string]string {
reader := csv.NewReader(strings.NewReader(string(content)))
records, err := reader.ReadAll()
if err != nil {
return nil
}
statuses := make(map[string]string)
for i, record := range records {
if i == 0 || len(record) < 7 {
continue
}
component := strings.TrimSpace(record[5])
matches := nvswitchInventoryComponentRegex.FindStringSubmatch(component)
if len(matches) != 2 {
continue
}
slot := strings.TrimSpace(matches[1])
if slot == "" {
continue
}
errorCode := strings.TrimSpace(record[0])
notes := strings.TrimSpace(record[6])
current := statuses[slot]
next := "PASS"
if !isSummaryCSVRecordPassing(errorCode, notes) {
next = "FAIL"
}
statuses[slot] = mergeGPUStatus(current, next)
}
return statuses
}
// CollectGPUFailureDetailsFromSummaryCSV extracts per-GPU failure details from summary.csv.
// Key format in returned map is component ID from summary (e.g. "SXM5_SN_1653925025497").
func CollectGPUFailureDetailsFromSummaryCSV(content []byte) map[string]string {
reader := csv.NewReader(strings.NewReader(string(content)))
records, err := reader.ReadAll()
if err != nil {
return nil
}
details := make(map[string]string)
for i, record := range records {
if i == 0 || len(record) < 7 {
continue
}
component := strings.TrimSpace(record[5])
if component == "" || !gpuComponentIDRegex.MatchString(component) {
continue
}
errorCode := strings.TrimSpace(record[0])
notes := strings.TrimSpace(record[6])
if isSummaryCSVRecordPassing(errorCode, notes) {
continue
}
note := notes
if note == "" || strings.EqualFold(note, "OK") {
note = errorCode
}
if note == "" {
continue
}
if _, exists := details[component]; !exists {
details[component] = note
}
}
return details
}
func isSummaryJSONRecordPassing(errorCode, notes string) bool {
_ = errorCode
return strings.TrimSpace(notes) == "OK"
}
func isSummaryCSVRecordPassing(errorCode, notes string) bool {
_ = errorCode
return strings.TrimSpace(notes) == "OK"
}
func mergeGPUStatus(current, next string) string {
// FAIL has highest priority.
if current == "FAIL" || next == "FAIL" {
return "FAIL"
}
if current == "PASS" || next == "PASS" {
return "PASS"
}
return ""
}
// ApplyGPUStatuses applies aggregated PASS/FAIL statuses from summary components to parsed GPUs.
func ApplyGPUStatuses(result *models.AnalysisResult, componentStatuses map[string]string) {
if result == nil || result.Hardware == nil || len(result.Hardware.GPUs) == 0 || len(componentStatuses) == 0 {
return
}
slotStatus := make(map[string]string) // key: GPUSXM<idx>
serialStatus := make(map[string]string) // key: GPU serial
for componentID, status := range componentStatuses {
matches := gpuComponentIDRegex.FindStringSubmatch(strings.TrimSpace(componentID))
if len(matches) != 3 {
continue
}
slotKey := "GPUSXM" + matches[1]
serialKey := strings.TrimSpace(matches[2])
slotStatus[slotKey] = mergeGPUStatus(slotStatus[slotKey], status)
if serialKey != "" {
serialStatus[serialKey] = mergeGPUStatus(serialStatus[serialKey], status)
}
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
next := ""
if serial := strings.TrimSpace(gpu.SerialNumber); serial != "" {
next = serialStatus[serial]
}
if next == "" {
next = slotStatus[strings.TrimSpace(gpu.Slot)]
}
if next != "" {
gpu.Status = next
}
}
}
// ApplyNVSwitchStatuses applies aggregated PASS/FAIL statuses from summary components to parsed NVSwitch devices.
func ApplyNVSwitchStatuses(result *models.AnalysisResult, switchStatuses map[string]string) {
if result == nil || result.Hardware == nil || len(result.Hardware.PCIeDevices) == 0 || len(switchStatuses) == 0 {
return
}
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
slot := normalizeNVSwitchSlot(strings.TrimSpace(dev.Slot))
if slot == "" {
continue
}
if !strings.HasPrefix(strings.ToUpper(slot), "NVSWITCH") {
continue
}
if st := switchStatuses[slot]; st != "" {
dev.Status = st
}
}
}
// ApplyGPUFailureDetails maps parsed failure details from summary components to GPUs.
func ApplyGPUFailureDetails(result *models.AnalysisResult, componentDetails map[string]string) {
if result == nil || result.Hardware == nil || len(result.Hardware.GPUs) == 0 || len(componentDetails) == 0 {
return
}
slotDetails := make(map[string]string) // key: GPUSXM<idx>
serialDetails := make(map[string]string) // key: GPU serial
for componentID, detail := range componentDetails {
matches := gpuComponentIDRegex.FindStringSubmatch(strings.TrimSpace(componentID))
if len(matches) != 3 {
continue
}
detail = strings.TrimSpace(detail)
if detail == "" {
continue
}
slotKey := "GPUSXM" + matches[1]
serialKey := strings.TrimSpace(matches[2])
if _, exists := slotDetails[slotKey]; !exists {
slotDetails[slotKey] = detail
}
if serialKey != "" {
if _, exists := serialDetails[serialKey]; !exists {
serialDetails[serialKey] = detail
}
}
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
detail := ""
if serial := strings.TrimSpace(gpu.SerialNumber); serial != "" {
detail = serialDetails[serial]
}
if detail == "" {
detail = slotDetails[strings.TrimSpace(gpu.Slot)]
}
if detail != "" {
gpu.ErrorDescription = detail
}
}
}
// formatSummaryDescription creates a human-readable description from summary entry
func formatSummaryDescription(entry SummaryEntry) string {
component := entry.ComponentID

View File

@@ -0,0 +1,122 @@
package nvidia
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestApplyGPUStatuses_FromSummaryCSV_FailAndPass(t *testing.T) {
csvData := strings.Join([]string{
"ErrorCode,Test,VirtualID,SubTest,Type,ComponentID,Notes,Level,,,IgnoreError",
"0,gpumem,gpumem,,GPU,SXM1_SN_111,OK,1,,,False",
"363,gpumem,gpumem,,GPU,SXM5_SN_1653925025497,Row remapping failed,1,,,False",
"0,gpu_fieldiag,gpu_fieldiag,,GPU,SXM1_SN_111,OK,1,,,False",
"0,gpu_fieldiag,gpu_fieldiag,,GPU,SXM2_SN_222,OK,1,,,False",
}, "\n")
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "GPUSXM1", SerialNumber: "111"},
{Slot: "GPUSXM2", SerialNumber: "222"},
{Slot: "GPUSXM5", SerialNumber: "1653925025497"},
},
},
}
statuses := CollectGPUStatusesFromSummaryCSV([]byte(csvData))
ApplyGPUStatuses(result, statuses)
bySerial := map[string]string{}
for _, gpu := range result.Hardware.GPUs {
bySerial[gpu.SerialNumber] = gpu.Status
}
if bySerial["1653925025497"] != "FAIL" {
t.Fatalf("expected serial 1653925025497 status FAIL, got %q", bySerial["1653925025497"])
}
if bySerial["111"] != "PASS" {
t.Fatalf("expected serial 111 status PASS, got %q", bySerial["111"])
}
if bySerial["222"] != "PASS" {
t.Fatalf("expected serial 222 status PASS, got %q", bySerial["222"])
}
}
func TestApplyGPUFailureDetails_FromSummaryJSON_BySerial(t *testing.T) {
jsonData := []byte(`[
{
"Error Code": "005-000-1-000000000363",
"Test": "gpumem",
"Component ID": "SXM5_SN_1653925025497",
"Notes": "Row remapping failed",
"Virtual ID": "gpumem",
"Ignore Error": "False"
}
]`)
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "GPUSXM5", SerialNumber: "1653925025497"},
{Slot: "GPUSXM2", SerialNumber: "1653925024190"},
},
},
}
details := CollectGPUFailureDetailsFromSummaryJSON(jsonData)
ApplyGPUFailureDetails(result, details)
if got := result.Hardware.GPUs[0].ErrorDescription; got != "Row remapping failed" {
t.Fatalf("expected serial 1653925025497 error Row remapping failed, got %q", got)
}
if got := result.Hardware.GPUs[1].ErrorDescription; got != "" {
t.Fatalf("expected no error description for healthy GPU, got %q", got)
}
}
func TestApplyNVSwitchStatuses_FromSummaryJSON(t *testing.T) {
jsonData := []byte(`[
{
"Error Code": "0",
"Test": "inventory",
"Component ID": "NVSWITCH_NVSWITCH0_VendorID",
"Notes": "OK",
"Virtual ID": "inventory",
"Ignore Error": "False"
},
{
"Error Code": "1",
"Test": "inventory",
"Component ID": "NVSWITCH_NVSWITCH1_LinkState",
"Notes": "Link down",
"Virtual ID": "inventory",
"Ignore Error": "False"
}
]`)
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
PCIeDevices: []models.PCIeDevice{
{Slot: "NVSWITCH0", Status: "Unknown"},
{Slot: "NVSWITCH1", Status: "Unknown"},
{Slot: "NVSWITCH2", Status: "Unknown"},
},
},
}
statuses := CollectNVSwitchStatusesFromSummaryJSON(jsonData)
ApplyNVSwitchStatuses(result, statuses)
if got := result.Hardware.PCIeDevices[0].Status; got != "PASS" {
t.Fatalf("expected NVSWITCH0 status PASS, got %q", got)
}
if got := result.Hardware.PCIeDevices[1].Status; got != "FAIL" {
t.Fatalf("expected NVSWITCH1 status FAIL, got %q", got)
}
if got := result.Hardware.PCIeDevices[2].Status; got != "Unknown" {
t.Fatalf("expected NVSWITCH2 status unchanged Unknown, got %q", got)
}
}

View File

@@ -3,6 +3,7 @@ package nvidia
import (
"encoding/json"
"fmt"
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
@@ -53,6 +54,8 @@ type Property struct {
Value interface{} `json:"value"` // Can be string or number
}
var nvswitchComponentIDRegex = regexp.MustCompile(`^(NVSWITCH\d+|NVSWITCHNVSWITCH\d+)$`)
// GetValueAsString returns the value as a string
func (p *Property) GetValueAsString() string {
switch v := p.Value.(type) {
@@ -107,7 +110,7 @@ func parseInventoryComponents(components []Component, result *models.AnalysisRes
}
// Parse NVSwitch components
if strings.HasPrefix(comp.ComponentID, "NVSWITCHNVSWITCH") {
if isNVSwitchComponentID(comp.ComponentID) {
nvswitch := parseNVSwitchComponent(comp)
if nvswitch != nil {
// Add as PCIe device for now
@@ -152,7 +155,7 @@ func parseSystemInfo(comp Component, result *models.AnalysisResult) bool {
// Don't overwrite real data from output.log with generic data
// Only set if empty or still has the default placeholder value
if result.Hardware.BoardInfo.ProductName == "" ||
result.Hardware.BoardInfo.ProductName == "GPU Server (Field Diag)" {
result.Hardware.BoardInfo.ProductName == "GPU Server (Field Diag)" {
result.Hardware.BoardInfo.ProductName = value
}
case "SerialNumber", "Serial", "BoardSerial", "SystemSerial":
@@ -183,6 +186,9 @@ func parseGPUComponent(comp Component) *models.GPU {
switch prop.ID {
case "DeviceID":
deviceID = prop.GetValueAsString()
if deviceID != "" {
fmt.Sscanf(deviceID, "%x", &gpu.DeviceID)
}
case "Vendor":
gpu.Manufacturer = prop.GetValueAsString()
case "DeviceName":
@@ -217,7 +223,7 @@ func parseGPUComponent(comp Component) *models.GPU {
// parseNVSwitchComponent parses NVSwitch component information
func parseNVSwitchComponent(comp Component) *models.PCIeDevice {
device := &models.PCIeDevice{
Slot: comp.ComponentID, // e.g., "NVSWITCHNVSWITCH0"
Slot: normalizeNVSwitchSlot(comp.ComponentID),
}
var vendorIDStr, deviceIDStr, vbios, pciID string
@@ -279,3 +285,15 @@ func parseNVSwitchComponent(comp Component) *models.PCIeDevice {
return device
}
func normalizeNVSwitchSlot(componentID string) string {
slot := strings.TrimSpace(componentID)
if strings.HasPrefix(slot, "NVSWITCHNVSWITCH") {
return strings.Replace(slot, "NVSWITCHNVSWITCH", "NVSWITCH", 1)
}
return slot
}
func isNVSwitchComponentID(componentID string) bool {
return nvswitchComponentIDRegex.MatchString(strings.TrimSpace(componentID))
}

View File

@@ -0,0 +1,46 @@
package nvidia
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestParseInventoryComponents_IgnoresNVSwitchPropertyChecks(t *testing.T) {
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{},
}
components := []Component{
{
ComponentID: "NVSWITCHNVSWITCH1",
Properties: []Property{
{ID: "VendorID", Value: "10de"},
{ID: "DeviceID", Value: "22a3"},
{ID: "PCIID", Value: "0000:06:00.0"},
},
},
{
ComponentID: "NVSWITCHNum",
Properties: []Property{
{ID: "NVSWITCHNum", Value: 4},
},
},
{
ComponentID: "NVSWITCH_NVSWITCH1_VendorID",
Properties: []Property{
{ID: "NVSWITCH_NVSWITCH1_VendorID", Value: "10de"},
},
},
}
parseInventoryComponents(components, result)
if got := len(result.Hardware.PCIeDevices); got != 1 {
t.Fatalf("expected exactly 1 parsed NVSwitch device, got %d", got)
}
if result.Hardware.PCIeDevices[0].Slot != "NVSWITCH1" {
t.Fatalf("expected slot NVSWITCH1, got %q", result.Hardware.PCIeDevices[0].Slot)
}
}

View File

@@ -0,0 +1,35 @@
package nvidia
import "testing"
func TestParseNVSwitchComponent_NormalizesDuplicatedPrefixInSlot(t *testing.T) {
comp := Component{
ComponentID: "NVSWITCHNVSWITCH1",
Properties: []Property{
{ID: "VendorID", Value: "10de"},
{ID: "DeviceID", Value: "22a3"},
{ID: "Vendor", Value: "NVIDIA Corporation"},
{ID: "PCIID", Value: "0000:06:00.0"},
{ID: "PCISpeed", Value: "16GT/s"},
{ID: "PCIWidth", Value: "x2"},
{ID: "VBIOS_version", Value: "96.10.6D.00.01"},
},
}
device := parseNVSwitchComponent(comp)
if device == nil {
t.Fatal("expected non-nil NVSwitch device")
}
if device.Slot != "NVSWITCH1" {
t.Fatalf("expected normalized slot NVSWITCH1, got %q", device.Slot)
}
if device.BDF != "0000:06:00.0" {
t.Fatalf("expected BDF 0000:06:00.0, got %q", device.BDF)
}
if device.DeviceClass != "NVSwitch" {
t.Fatalf("expected device class NVSwitch, got %q", device.DeviceClass)
}
}

View File

@@ -1,275 +0,0 @@
# NVIDIA Bug Report Parser
Парсер для файлов nvidia-bug-report, генерируемых скриптом `nvidia-bug-report.sh`.
## Назначение
Этот парсер обрабатывает диагностические логи NVIDIA драйверов и извлекает:
- Информацию о модулях памяти (из dmidecode)
- Информацию о GPU устройствах
- Версию NVIDIA драйвера
## Формат файла
- Имя файла: `nvidia-bug-report-*.log.gz`
- Формат: Gzip-сжатый текстовый файл
- Генерируется: `nvidia-bug-report.sh` скриптом
## Confidence Score
**85** - высокий приоритет для файлов nvidia-bug-report
## Извлекаемые данные
### 1. System Information (из dmidecode)
Информация о сервере:
- **Serial Number**: Серийный номер сервера (например, 2KD501412)
- **UUID**: Уникальный идентификатор системы (например, 2e4054bc-1dd2-11b2-0284-6b0a21737950)
- **Manufacturer**: Производитель сервера
- **Product Name**: Модель сервера
- **Version**: Версия системы
### 2. CPU Information (из dmidecode)
Для каждого процессора извлекается:
- **Model**: Модель процессора (например, Intel(R) Xeon(R) Platinum 8480+)
- **Serial Number**: Серийный номер (например, 5DB0D6C0DD30ABD8)
- **Core Count**: Количество ядер (например, 56)
- **Thread Count**: Количество потоков (например, 112)
- **Max Speed**: Максимальная частота (например, 3800 MHz)
- **Current Speed**: Текущая частота (например, 2000 MHz)
Пример:
```
Socket 0: Intel(R) Xeon(R) Platinum 8480+
Serial Number: 5DB0D6C0DD30ABD8
Cores: 56, Threads: 112
Frequency: 2000 MHz (Max: 3800 MHz)
```
### 3. Memory Modules (из dmidecode)
Для каждого модуля памяти извлекается:
- **Slot/Location**: Например, CPU0_C0D0
- **Size**: Размер в GB (например, 64 GB)
- **Type**: Тип памяти (DDR5, DDR4, etc.)
- **Manufacturer**: Производитель (Hynix, Samsung, Micron, etc.)
- **Part Number**: P/N модуля (например, HMCG94AGBRA179N)
- **Serial Number**: S/N модуля (например, 80AD0224322B3834E6)
- **Speed**: Max/Current скорость (например, 5600/4400 MHz)
- **Ranks**: Количество рангов
Пример:
```
Slot: CPU0_C0D0
Size: 64 GB
Type: DDR5
Manufacturer: Hynix
Part Number: HMCG94AGBRA179N
Serial Number: 80AD0224322B3834E6
Speed: 5600 MT/s (configured: 4400 MT/s)
Ranks: 2
```
### 4. Power Supplies (из dmidecode)
Для каждого блока питания извлекается:
- **Location**: Позиция (например, PSU0, PSU1)
- **Manufacturer**: Производитель (например, DELTA, Great Wall)
- **Model Part Number**: Модель БП (например, V0310DT000000000)
- **Serial Number**: Серийный номер (например, DGPLV251500LZ)
- **Max Power Capacity**: Максимальная мощность (например, 2700 W)
- **Revision**: Версия прошивки (например, 00.01.04)
- **Status**: Статус (например, Present, OK)
Пример:
```
PSU0: V0310DT000000000 (DELTA)
Serial Number: DGPLV251500LZ
Power: 2700 W, Revision: 00.01.04
Status: Present, OK
```
### 5. Network Adapters (из lspci)
Для каждого сетевого адаптера (Ethernet, Network, InfiniBand) извлекается:
- **Model**: Полное название модели из VPD (например, "NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP, PCIe 5.0 x16")
- **Location**: PCI BDF адрес (например, 0000:0e:00.0)
- **Slot**: Физический слот (например, 108)
- **Part Number**: P/N адаптера (например, MCX75310AAS-NEAT)
- **Serial Number**: S/N адаптера (например, MT2430600249)
- **Vendor**: Производитель (Mellanox, NVIDIA)
- **Vendor ID / Device ID**: PCI идентификаторы (например, 15b3:1021)
- **Port Count**: Количество портов (определяется из модели: Dual-port = 2, Single-port = 1)
- **Port Type**: Тип портов (QSFP56, OSFP, SFP+)
Пример:
```
0000:0e:00.0: NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP
Slot: 108
P/N: MCX75310AAS-NEAT
S/N: MT2430600249
Ports: 1 x OSFP
```
### 6. GPU Devices
Для каждого GPU извлекается:
- **Model**: Модель GPU (например, NVIDIA H100 80GB HBM3)
- **BDF (Bus:Device.Function)**: PCI адрес (например, 0000:0f:00.0)
- **UUID**: Уникальный идентификатор GPU (например, GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3)
- **Video BIOS**: Версия BIOS видеокарты (например, 96.00.99.00.01)
- **IRQ**: Прерывание (например, 17)
- **Bus Type**: Тип шины (PCIe)
- **DMA Size**: Размер DMA (например, 52 bits)
- **DMA Mask**: Маска DMA (например, 0xfffffffffffff)
- **Device Minor**: Номер устройства (например, 0)
- **Manufacturer**: NVIDIA
Пример:
```
0000:0f:00.0: NVIDIA H100 80GB HBM3
UUID: GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3
Video BIOS: 96.00.99.00.01
IRQ: 17
```
### 7. Events
- **Memory Configuration**: Сводка по модулям памяти (количество, производители, общий размер)
- **GPU Detection**: Обнаруженные GPU устройства
- **Driver Version**: Версия NVIDIA драйвера
## Пример использования
```bash
# Запуск с nvidia-bug-report файлом
./logpile --file nvidia-bug-report-2KD501412.log.gz
# Веб-интерфейс будет доступен на http://localhost:8082
```
## Пример вывода
```
✓ Detected vendor: NVIDIA Bug Report Parser
✓ CPUs: 2
✓ Memory: 32 modules
✓ Power Supplies: 8
✓ GPUs: 8
✓ Network Adapters: 12
System Information:
Serial Number: 2KD501412
UUID: 2e4054bc-1dd2-11b2-0284-6b0a21737950
Version: 0
CPU Information:
Socket 0: Intel(R) Xeon(R) Platinum 8480+
S/N: 5DB0D6C0DD30ABD8, Cores: 56, Threads: 112
Socket 1: Intel(R) Xeon(R) Platinum 8480+
S/N: 5DB017C05685B3ED, Cores: 56, Threads: 112
Power Supplies:
PSU0: V0310DT000000000 (DELTA)
S/N: DGPLV251500LZ
Power: 2700 W, Revision: 00.01.04
Status: Present, OK
PSU1: V0310DT000000000 (DELTA)
S/N: DGPLV251500GY
Power: 2700 W, Revision: 00.01.04
Status: Present, OK
[... 6 more PSUs ...]
Memory Modules:
CPU0_C0D0: 64 GB, Hynix
P/N: HMCG94AGBRA179N, S/N: 80AD0224322B3834E6
Type: DDR5, Speed: 4400/5600 MHz
[... 31 more modules ...]
Network Adapters: 12 devices
0000:0e:00.0: NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP
Slot: 108
P/N: MCX75310AAS-NEAT
S/N: MT2430600249
Ports: 1 x OSFP
0000:1f:00.0: ConnectX-6 Dx EN adapter card, 100GbE, Dual-port QSFP56
Slot: 12
P/N: MCX623106AN-CDAT
S/N: MT2434J00PCD
Ports: 2 x QSFP56
[... 10 more adapters ...]
GPUs: 8 devices
0000:0f:00.0: NVIDIA H100 80GB HBM3
UUID: GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3
Video BIOS: 96.00.99.00.01
IRQ: 17
0000:34:00.0: NVIDIA H100 80GB HBM3
UUID: GPU-fa796345-c23a-54aa-1b67-709ac2542852
Video BIOS: 96.00.99.00.01
IRQ: 16
[... 6 more GPUs ...]
```
## Версионирование
**Текущая версия парсера:** 1.0.0
### История версий
- **1.0.0** - Первоначальная версия с парсингом System Info, CPU, Memory, PSU, GPU, Network Adapters и Driver
## Структура данных
Парсер использует следующие секции в bug report:
1. **dmidecode output (System Information)** - для извлечения информации о сервере
2. **dmidecode output (Processor Information)** - для извлечения информации о CPU
3. **dmidecode output (Memory Device)** - для извлечения информации о памяти
4. **dmidecode output (System Power Supply)** - для извлечения информации о блоках питания
5. **lspci -vvv output (Ethernet/Network/Infiniband controller)** - для извлечения информации о сетевых адаптерах
6. **lspci VPD (Vital Product Data)** - для извлечения P/N, S/N и модели сетевых адаптеров
7. **/proc/driver/nvidia/gpus/.../information** - для детальной информации о GPU
8. **NVRM version** - для версии драйвера
## Известные ограничения
1. Ошибки и предупреждения из логов пока не извлекаются
2. Некоторые специфичные характеристики GPU (температура, утилизация) не парсятся
3. Информация о производительности и метрики GPU требуют парсинга других секций
## Расширение
Для добавления новых возможностей:
1. **Ошибки драйвера**: Парсить секции с ошибками NVIDIA драйвера
2. **nvidia-smi output**: Извлекать детальную информацию из вывода nvidia-smi (температура, утилизация)
3. **GPU производительность**: Парсить метрики производительности и использования памяти GPU
4. **PCIe информация**: Извлекать детали о PCIe конфигурации (скорость линка, ширина)
## Пример структуры файла
```
Start of NVIDIA bug report log file
nvidia-bug-report.sh Version: 34275561
Date: Thu Jul 17 18:18:18 EDT 2025
[... system info ...]
Memory Device
Data Width: 64 bits
Size: 64 GB
Form Factor: DIMM
Locator: CPU0_C0D0
Type: DDR5
Speed: 5600 MT/s
Manufacturer: Hynix
Serial Number: 80AD0224322B3834E6
Part Number: HMCG94AGBRA179N
[... more memory modules ...]
*** /proc/driver/nvidia/./gpus/0000:0f:00.0/power
[... GPU info ...]
```

View File

@@ -106,6 +106,8 @@ func parseGPUInfo(content string, result *models.AnalysisResult) {
result.Hardware.GPUs = append(result.Hardware.GPUs, *currentGPU)
}
applyGPUSerialNumbers(content, result.Hardware.GPUs)
// Create event for GPU summary
if len(result.Hardware.GPUs) > 0 {
result.Events = append(result.Events, models.Event{
@@ -168,3 +170,138 @@ func formatGPUSummary(gpus []models.GPU) string {
return summary.String()
}
func applyGPUSerialNumbers(content string, gpus []models.GPU) {
if len(gpus) == 0 {
return
}
serialByBDF := parseGPUSerialsFromNvidiaSMI(content)
if len(serialByBDF) == 0 {
serialByBDF = parseGPUSerialsFromSummary(content)
}
if len(serialByBDF) == 0 {
return
}
for i := range gpus {
bdf := normalizeGPUAddress(gpus[i].BDF)
if bdf == "" {
continue
}
if serial, ok := serialByBDF[bdf]; ok && serial != "" {
gpus[i].SerialNumber = serial
}
}
}
func parseGPUSerialsFromNvidiaSMI(content string) map[string]string {
scanner := bufio.NewScanner(strings.NewReader(content))
reGPU := regexp.MustCompile(`^GPU\s+([0-9A-F]{8}:[0-9A-F]{2}:[0-9A-F]{2}\.[0-9A-F])$`)
serialByBDF := make(map[string]string)
currentBDF := ""
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
if matches := reGPU.FindStringSubmatch(line); len(matches) == 2 {
currentBDF = normalizeGPUAddress(matches[1])
continue
}
if currentBDF == "" {
continue
}
if strings.HasPrefix(line, "Serial Number") {
parts := strings.SplitN(line, ":", 2)
if len(parts) != 2 {
continue
}
serial := strings.TrimSpace(parts[1])
if serial != "" && !strings.EqualFold(serial, "N/A") {
serialByBDF[currentBDF] = serial
}
}
}
return serialByBDF
}
func parseGPUSerialsFromSummary(content string) map[string]string {
scanner := bufio.NewScanner(strings.NewReader(content))
serialByBDF := make(map[string]string)
inGPUDetails := false
for scanner.Scan() {
line := scanner.Text()
trimmed := strings.TrimSpace(line)
if strings.HasPrefix(trimmed, "NVIDIA GPU Details") {
inGPUDetails = true
}
if !inGPUDetails {
continue
}
if strings.HasPrefix(trimmed, "NVIDIA Switch Details") {
break
}
parts := strings.Split(line, "|")
if len(parts) < 2 {
continue
}
payload := strings.TrimSpace(parts[len(parts)-1])
if payload == "" {
continue
}
fields := strings.Split(payload, ",")
if len(fields) < 6 {
continue
}
bdf := normalizeGPUAddress(strings.TrimSpace(fields[4]))
serial := strings.TrimSpace(fields[5])
if bdf == "" || serial == "" || strings.EqualFold(serial, "N/A") {
continue
}
serialByBDF[bdf] = serial
}
return serialByBDF
}
func normalizeGPUAddress(addr string) string {
addr = strings.TrimSpace(addr)
if addr == "" {
return ""
}
parts := strings.Split(addr, ":")
if len(parts) != 3 {
return strings.ToLower(addr)
}
domain := parts[0]
bus := parts[1]
devFn := parts[2]
devFnParts := strings.Split(devFn, ".")
if len(devFnParts) != 2 {
return strings.ToLower(addr)
}
device := devFnParts[0]
fn := devFnParts[1]
if len(domain) == 8 {
domain = domain[4:]
}
return strings.ToLower(domain + ":" + bus + ":" + device + "." + fn)
}

View File

@@ -0,0 +1,54 @@
package nvidia_bug_report
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestApplyGPUSerialNumbers_FromNvidiaSMI(t *testing.T) {
content := `
/usr/bin/nvidia-smi --query
GPU 00000000:18:00.0
Serial Number : 1653925025827
GPU 00000000:2A:00.0
Serial Number : 1653925050608
`
gpus := []models.GPU{
{BDF: "0000:18:00.0"},
{BDF: "0000:2a:00.0"},
}
applyGPUSerialNumbers(content, gpus)
if gpus[0].SerialNumber != "1653925025827" {
t.Fatalf("unexpected serial for gpu0: %q", gpus[0].SerialNumber)
}
if gpus[1].SerialNumber != "1653925050608" {
t.Fatalf("unexpected serial for gpu1: %q", gpus[1].SerialNumber)
}
}
func TestApplyGPUSerialNumbers_FromSummaryFallback(t *testing.T) {
content := `
NVIDIA GPU Details | NVIDIA H200, 570.172.08, 143771 MiB, 96.00.D0.00.03, 00000000:18:00.0, 1653925025827
| NVIDIA H200, 570.172.08, 143771 MiB, 96.00.D0.00.03, 00000000:2A:00.0, 1653925050608
NVIDIA Switch Details | No devices matching query 'Quantum'
`
gpus := []models.GPU{
{BDF: "0000:18:00.0"},
{BDF: "0000:2a:00.0"},
}
applyGPUSerialNumbers(content, gpus)
if gpus[0].SerialNumber != "1653925025827" {
t.Fatalf("unexpected serial for gpu0: %q", gpus[0].SerialNumber)
}
if gpus[1].SerialNumber != "1653925050608" {
t.Fatalf("unexpected serial for gpu1: %q", gpus[1].SerialNumber)
}
}

View File

@@ -3,14 +3,33 @@
package nvidia_bug_report
import (
"fmt"
"regexp"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
// parserVersion - version of this parser module
const parserVersion = "1.0.0"
const parserVersion = "1.2"
var bugReportDateLineRegex = regexp.MustCompile(`(?m)^Date:\s+(.+?)\s*$`)
var dateWithTZAbbrevRegex = regexp.MustCompile(`^([A-Za-z]{3}\s+[A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})\s+([A-Za-z]{2,5})\s+(\d{4})$`)
var timezoneAbbrevToOffset = map[string]string{
"UTC": "+00:00",
"GMT": "+00:00",
"EST": "-05:00",
"EDT": "-04:00",
"CST": "-06:00",
"CDT": "-05:00",
"MST": "-07:00",
"MDT": "-06:00",
"PST": "-08:00",
"PDT": "-07:00",
}
func init() {
parser.Register(&Parser{})
@@ -81,6 +100,10 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
}
content := string(files[0].Content)
if collectedAt, tzOffset, ok := parseBugReportCollectedAt(content); ok {
result.CollectedAt = collectedAt.UTC()
result.SourceTimezone = tzOffset
}
// Parse system information
parseSystemInfo(content, result)
@@ -105,3 +128,49 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
return result, nil
}
func parseBugReportCollectedAt(content string) (time.Time, string, bool) {
matches := bugReportDateLineRegex.FindStringSubmatch(content)
if len(matches) != 2 {
return time.Time{}, "", false
}
raw := strings.TrimSpace(matches[1])
if raw == "" {
return time.Time{}, "", false
}
if m := dateWithTZAbbrevRegex.FindStringSubmatch(raw); len(m) == 4 {
if offset, ok := timezoneAbbrevToOffset[strings.ToUpper(strings.TrimSpace(m[2]))]; ok {
layout := "Mon Jan 2 15:04:05 -07:00 2006"
normalized := strings.TrimSpace(m[1]) + " " + offset + " " + strings.TrimSpace(m[3])
if ts, err := time.Parse(layout, normalized); err == nil {
return ts, offset, true
}
}
}
layouts := []string{
"Mon Jan 2 15:04:05 MST 2006",
"Mon Jan 2 15:04:05 2006",
}
for _, layout := range layouts {
ts, err := time.Parse(layout, raw)
if err != nil {
continue
}
return ts, formatOffset(ts), true
}
return time.Time{}, "", false
}
func formatOffset(t time.Time) string {
_, sec := t.Zone()
sign := '+'
if sec < 0 {
sign = '-'
sec = -sec
}
h := sec / 3600
m := (sec % 3600) / 60
return fmt.Sprintf("%c%02d:%02d", sign, h, m)
}

View File

@@ -0,0 +1,54 @@
package nvidia_bug_report
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestParseBugReportCollectedAt(t *testing.T) {
content := `
Start of NVIDIA bug report log file
Date: Fri Dec 12 10:14:49 EST 2025
`
got, tz, ok := parseBugReportCollectedAt(content)
if !ok {
t.Fatalf("expected collected_at to be parsed")
}
if tz != "-05:00" {
t.Fatalf("expected tz offset -05:00, got %q", tz)
}
wantUTC := time.Date(2025, 12, 12, 15, 14, 49, 0, time.UTC)
if !got.UTC().Equal(wantUTC) {
t.Fatalf("expected %s, got %s", wantUTC, got.UTC())
}
}
func TestNvidiaBugReportParser_SetsCollectedAtAndTimezone(t *testing.T) {
p := &Parser{}
files := []parser.ExtractedFile{
{
Path: "nvidia-bug-report-1653925023938.log",
Content: []byte(`
Start of NVIDIA bug report log file
nvidia-bug-report.sh Version: 34275561
Date: Fri Dec 12 10:14:49 EST 2025
`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.SourceTimezone != "-05:00" {
t.Fatalf("expected source timezone -05:00, got %q", result.SourceTimezone)
}
wantUTC := time.Date(2025, 12, 12, 15, 14, 49, 0, time.UTC)
if !result.CollectedAt.Equal(wantUTC) {
t.Fatalf("expected collected_at %s, got %s", wantUTC, result.CollectedAt)
}
}

41507
internal/parser/vendors/pciids/pci.ids vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1,12 +1,27 @@
package pciids
import (
"bufio"
_ "embed"
"fmt"
"os"
"strconv"
"strings"
"sync"
)
var (
//go:embed pci.ids
embeddedPCIIDs string
loadOnce sync.Once
vendors map[int]string
devices map[string]string
)
// VendorName returns vendor name by PCI Vendor ID
func VendorName(vendorID int) string {
loadPCIIDs()
if name, ok := vendors[vendorID]; ok {
return name
}
@@ -15,6 +30,7 @@ func VendorName(vendorID int) string {
// DeviceName returns device name by Vendor ID and Device ID
func DeviceName(vendorID, deviceID int) string {
loadPCIIDs()
key := fmt.Sprintf("%04x:%04x", vendorID, deviceID)
if name, ok := devices[key]; ok {
return name
@@ -46,7 +62,6 @@ func VendorNameFromString(s string) string {
} else if c >= 'a' && c <= 'f' {
id = id*16 + int(c-'a'+10)
} else {
// Not a valid hex string, return original
return ""
}
}
@@ -54,124 +69,99 @@ func VendorNameFromString(s string) string {
return VendorName(id)
}
// Common PCI Vendor IDs
// Source: https://pci-ids.ucw.cz/
var vendors = map[int]string{
// Storage controllers and SSDs
0x1E0F: "KIOXIA",
0x144D: "Samsung Electronics",
0x1C5C: "SK Hynix",
0x15B7: "SanDisk (Western Digital)",
0x1179: "Toshiba",
0x8086: "Intel",
0x1344: "Micron Technology",
0x126F: "Silicon Motion",
0x1987: "Phison Electronics",
0x1CC1: "ADATA Technology",
0x2646: "Kingston Technology",
0x1E95: "Solid State Storage Technology",
0x025E: "Solidigm",
0x1D97: "Shenzhen Longsys Electronics",
0x1E4B: "MAXIO Technology",
func loadPCIIDs() {
loadOnce.Do(func() {
vendors = make(map[int]string)
devices = make(map[string]string)
// Network adapters
0x15B3: "Mellanox Technologies",
0x14E4: "Broadcom",
0x10EC: "Realtek Semiconductor",
0x1077: "QLogic",
0x19A2: "Emulex",
0x1137: "Cisco Systems",
0x1924: "Solarflare Communications",
0x177D: "Cavium",
0x1D6A: "Aquantia",
0x1FC9: "Tehuti Networks",
0x18D4: "Chelsio Communications",
parsePCIIDs(strings.NewReader(embeddedPCIIDs), vendors, devices)
// GPU / Graphics
0x10DE: "NVIDIA",
0x1002: "AMD/ATI",
0x102B: "Matrox Electronics",
0x1A03: "ASPEED Technology",
// Storage controllers (RAID/HBA)
0x1000: "LSI Logic / Broadcom",
0x9005: "Adaptec / Microsemi",
0x1028: "Dell",
0x103C: "Hewlett-Packard",
0x17D3: "Areca Technology",
0x1CC4: "Union Memory",
// Server vendors
0x1014: "IBM",
0x15D9: "Supermicro",
0x8088: "Inspur",
// Other common
0x1022: "AMD",
0x1106: "VIA Technologies",
0x10B5: "PLX Technology",
0x1B21: "ASMedia Technology",
0x1B4B: "Marvell Technology",
0x197B: "JMicron Technology",
for _, path := range candidatePCIIDsPaths() {
f, err := os.Open(path)
if err != nil {
continue
}
parsePCIIDs(f, vendors, devices)
_ = f.Close()
}
})
}
// Device IDs (vendor:device -> name)
var devices = map[string]string{
// NVIDIA GPUs (0x10DE)
"10de:26b9": "L40S 48GB",
"10de:26b1": "L40 48GB",
"10de:2684": "RTX 4090",
"10de:2704": "RTX 4080",
"10de:2782": "RTX 4070 Ti",
"10de:2786": "RTX 4070",
"10de:27b8": "RTX 4060 Ti",
"10de:2882": "RTX 4060",
"10de:2204": "RTX 3090",
"10de:2208": "RTX 3080 Ti",
"10de:2206": "RTX 3080",
"10de:2484": "RTX 3070",
"10de:2503": "RTX 3060",
"10de:20b0": "A100 80GB",
"10de:20b2": "A100 40GB",
"10de:20f1": "A10",
"10de:2236": "A10G",
"10de:25b6": "A16",
"10de:20b5": "A30",
"10de:20b7": "A30X",
"10de:1db4": "V100 32GB",
"10de:1db1": "V100 16GB",
"10de:1e04": "RTX 2080 Ti",
"10de:1e07": "RTX 2080",
"10de:1f02": "RTX 2070",
"10de:26ba": "L40S-PCIE-48G",
"10de:2330": "H100 80GB PCIe",
"10de:2331": "H100 80GB SXM5",
"10de:2322": "H100 NVL",
"10de:2324": "H200",
func candidatePCIIDsPaths() []string {
paths := []string{
"pci.ids",
"/usr/share/hwdata/pci.ids",
"/usr/share/misc/pci.ids",
"/opt/homebrew/share/pciids/pci.ids",
}
// AMD GPUs (0x1002)
"1002:744c": "Instinct MI250X",
"1002:7408": "Instinct MI100",
"1002:73a5": "RX 6950 XT",
"1002:73bf": "RX 6900 XT",
"1002:73df": "RX 6700 XT",
"1002:7480": "RX 7900 XTX",
"1002:7483": "RX 7900 XT",
// ASPEED (0x1A03) - BMC VGA
"1a03:2000": "AST2500 VGA",
"1a03:1150": "AST2600 VGA",
// Intel GPUs
"8086:56c0": "Data Center GPU Flex 170",
"8086:56c1": "Data Center GPU Flex 140",
// Mellanox/NVIDIA NICs (0x15B3)
"15b3:1017": "ConnectX-5 100GbE",
"15b3:1019": "ConnectX-5 Ex",
"15b3:101b": "ConnectX-6",
"15b3:101d": "ConnectX-6 Dx",
"15b3:101f": "ConnectX-6 Lx",
"15b3:1021": "ConnectX-7",
"15b3:a2d6": "ConnectX-4 Lx",
// Env paths have highest priority, so they are applied last.
if env := strings.TrimSpace(os.Getenv("LOGPILE_PCI_IDS_PATH")); env != "" {
for _, p := range strings.Split(env, string(os.PathListSeparator)) {
p = strings.TrimSpace(p)
if p != "" {
paths = append(paths, p)
}
}
}
return paths
}
func parsePCIIDs(r interface{ Read([]byte) (int, error) }, outVendors map[int]string, outDevices map[string]string) {
scanner := bufio.NewScanner(r)
currentVendor := -1
for scanner.Scan() {
line := scanner.Text()
if line == "" || strings.HasPrefix(line, "#") {
continue
}
// Subdevice line (tab-tab) - ignored for now
if strings.HasPrefix(line, "\t\t") {
continue
}
// Device line
if strings.HasPrefix(line, "\t") {
if currentVendor < 0 {
continue
}
trimmed := strings.TrimLeft(line, "\t")
fields := strings.Fields(trimmed)
if len(fields) < 2 {
continue
}
deviceID, err := strconv.ParseInt(fields[0], 16, 32)
if err != nil {
continue
}
name := strings.TrimSpace(trimmed[len(fields[0]):])
if name == "" {
continue
}
key := fmt.Sprintf("%04x:%04x", currentVendor, int(deviceID))
outDevices[key] = name
continue
}
// Vendor line
fields := strings.Fields(line)
if len(fields) < 2 {
currentVendor = -1
continue
}
vendorID, err := strconv.ParseInt(fields[0], 16, 32)
if err != nil {
currentVendor = -1
continue
}
name := strings.TrimSpace(line[len(fields[0]):])
if name == "" {
currentVendor = -1
continue
}
currentVendor = int(vendorID)
outVendors[currentVendor] = name
}
}

View File

@@ -0,0 +1,38 @@
package pciids
import (
"os"
"path/filepath"
"sync"
"testing"
)
func TestExternalPCIIDsLookup(t *testing.T) {
dir := t.TempDir()
idsPath := filepath.Join(dir, "pci.ids")
content := "" +
"# sample\n" +
"10de NVIDIA Corporation\n" +
"\t233b NVIDIA H200 SXM\n" +
"8086 Intel Corporation\n" +
"\t1521 I350 Gigabit Network Connection\n"
if err := os.WriteFile(idsPath, []byte(content), 0o644); err != nil {
t.Fatalf("write pci.ids: %v", err)
}
t.Setenv("LOGPILE_PCI_IDS_PATH", idsPath)
loadOnce = sync.Once{}
vendors = nil
devices = nil
if got := DeviceName(0x10de, 0x233b); got != "NVIDIA H200 SXM" {
t.Fatalf("expected external device name, got %q", got)
}
if got := VendorName(0x10de); got != "NVIDIA Corporation" {
t.Fatalf("expected external vendor name, got %q", got)
}
if got := DeviceName(0x8086, 0x1521); got != "I350 Gigabit Network Connection" {
t.Fatalf("expected external intel device name, got %q", got)
}
}

View File

@@ -1,133 +0,0 @@
# SMC Crash Dump Parser
Парсер для архивов Supermicro (SMC) BMC Crash Dump.
## Поддерживаемые серверы
- Supermicro SYS-821GE-TNHR
- Другие серверы Supermicro с BMC Crashdump функциональностью
## Формат архива
Парсер работает с архивами в формате:
- `.tgz` / `.tar.gz` (сжатый tar)
- `.tar` (несжатый tar)
## Распознаваемые файлы
### Основные файлы
1. **CDump.txt** - JSON файл с данными crashdump
- Metadata (BMC, BIOS, ME версии firmware)
- CPU информация (CPUID, количество ядер, microcode версия, PPIN)
- MCA (Machine Check Architecture) данные - ошибки процессоров
## Извлекаемые данные
### Hardware Configuration
#### CPUs
```json
{
"slot": "CPU0",
"model": "CPUID: 0xc06f2",
"cores": 56,
"manufacturer": "Intel",
"firmware": "Microcode: 0x210002b3"
}
```
### FRU Information
- BMC Firmware Version
- BIOS Version
- ME Firmware Version
- CPU PPIN (Protected Processor Inventory Number)
### Events
События создаются для:
- **Crashdump collection** - когда был собран crashdump
- **MCA Errors** - ошибки Machine Check Architecture
- Corrected errors (Warning severity)
- Uncorrected errors (Critical severity)
Уровни severity:
- `info` - информационные события (crashdump по запросу)
- `warning` - предупреждения (corrected MCA errors, reset detected)
- `critical` - критические ошибки (uncorrected MCA errors)
## Пример использования
```bash
# Запуск веб-интерфейса
./logpile --file /path/to/CDump_090859_01302026.tgz
# Веб-интерфейс будет доступен на http://localhost:8082
```
## Автоопределение
Парсер автоматически определяет архивы SMC Crash Dump по наличию:
- `CDump.txt` с маркерами "crash_data", "METADATA", "bmc_fw_ver"
Confidence score:
- `CDump.txt` с маркерами crashdump: +80
## Версионирование
**Текущая версия парсера:** 1.0.0
При модификации логики парсера необходимо увеличивать версию в константе `parserVersion` в файле `parser.go`.
## Примеры данных
### Пример CDump.txt (metadata)
```json
{
"crash_data": {
"METADATA": {
"cpu0": {
"cpuid": "0xc06f2",
"core_count": "0x38",
"ppin": "0xa3ccbe7d45026592",
"ucode_patch_ver": "0x210002b3"
},
"bmc_fw_ver": "01.03.18",
"bios_id": "BIOS Date: 08/04/2025 Rev 2.7",
"me_fw_ver": "6.1.4.204",
"timestamp": "2026-01-30T09:06:52Z",
"trigger_type": "On-Demand"
}
}
}
```
### MCA Error Detection
Парсер проверяет регистры MCA status на наличие ошибок:
- Bit 63 (Valid) - индикатор валидной ошибки
- Bit 61 (UC) - uncorrected error
- Bit 60 (EN) - error enabled
## Известные ограничения
1. Парсер фокусируется на данных из `CDump.txt`
2. Детальный анализ MCA errors пока упрощен (только проверка status регистров)
3. TOR dump и другие расширенные данные пока не парсятся
## Разработка
### Добавление новых полей
1. Изучите структуру JSON в CDump.txt
2. Добавьте поля в структуры `Metadata`, `CPUMetadata`, или `MCAData`
3. Обновите функции парсинга
4. Увеличьте версию парсера
### Расширение MCA анализа
Для более детального анализа MCA ошибок можно:
1. Добавить декодирование MCA error codes
2. Парсить MISC и ADDR регистры
3. Добавить корреляцию ошибок между банками

View File

@@ -1,261 +0,0 @@
package supermicro
import (
"encoding/json"
"fmt"
"strconv"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
// CrashDumpData represents the structure of CDump.txt
type CrashDumpData struct {
CrashData struct {
METADATA Metadata `json:"METADATA"`
PROCESSORS ProcessorsData `json:"PROCESSORS"`
} `json:"crash_data"`
}
// ProcessorsData contains processor crash data
type ProcessorsData struct {
Version string `json:"_version"`
CPU0 Processors `json:"cpu0"`
CPU1 Processors `json:"cpu1"`
}
// Metadata contains crashdump metadata
type Metadata struct {
CPU0 CPUMetadata `json:"cpu0"`
CPU1 CPUMetadata `json:"cpu1"`
BMCFWVer string `json:"bmc_fw_ver"`
BIOSId string `json:"bios_id"`
MEFWVer string `json:"me_fw_ver"`
Timestamp string `json:"timestamp"`
TriggerType string `json:"trigger_type"`
PlatformName string `json:"platform_name"`
CrashdumpVer string `json:"crashdump_ver"`
ResetDetected string `json:"_reset_detected"`
}
// CPUMetadata contains CPU metadata
type CPUMetadata struct {
CPUID string `json:"cpuid"`
CoreMask string `json:"core_mask"`
CHACount string `json:"cha_count"`
CoreCount string `json:"core_count"`
PPIN string `json:"ppin"`
UcodePatchVer string `json:"ucode_patch_ver"`
}
// Processors contains processor crash data
type Processors struct {
MCA MCAData `json:"MCA"`
}
// MCAData contains Machine Check Architecture data
type MCAData struct {
Uncore map[string]interface{} `json:"uncore"`
}
// ParseCrashDump parses CDump.txt file
func ParseCrashDump(content []byte, result *models.AnalysisResult) error {
var data CrashDumpData
if err := json.Unmarshal(content, &data); err != nil {
return fmt.Errorf("failed to parse CDump.txt: %w", err)
}
// Initialize Hardware.Firmware slice if nil
if result.Hardware.Firmware == nil {
result.Hardware.Firmware = make([]models.FirmwareInfo, 0)
}
// Parse metadata
parseMetadata(&data.CrashData.METADATA, result)
// Parse CPU information
parseCPUInfo(&data.CrashData.METADATA, result)
// Parse MCA errors
parseMCAErrors(&data.CrashData, result)
return nil
}
// parseMetadata extracts metadata information
func parseMetadata(metadata *Metadata, result *models.AnalysisResult) {
// Store firmware versions in HardwareConfig.Firmware
if metadata.BMCFWVer != "" {
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
DeviceName: "BMC",
Version: metadata.BMCFWVer,
})
}
if metadata.BIOSId != "" {
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
DeviceName: "BIOS",
Version: metadata.BIOSId,
})
}
if metadata.MEFWVer != "" {
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
DeviceName: "ME",
Version: metadata.MEFWVer,
})
}
// Create event for crashdump trigger
timestamp := time.Now()
if metadata.Timestamp != "" {
if t, err := time.Parse(time.RFC3339, metadata.Timestamp); err == nil {
timestamp = t
}
}
triggerType := metadata.TriggerType
if triggerType == "" {
triggerType = "Unknown"
}
severity := models.SeverityInfo
if metadata.ResetDetected != "" && metadata.ResetDetected != "NONE" {
severity = models.SeverityWarning
}
result.Events = append(result.Events, models.Event{
Timestamp: timestamp,
Source: "Crashdump",
EventType: "System Crashdump",
Description: fmt.Sprintf("Crashdump collected (%s)", triggerType),
Severity: severity,
RawData: fmt.Sprintf("Version: %s, Reset: %s", metadata.CrashdumpVer, metadata.ResetDetected),
})
}
// parseCPUInfo extracts CPU information
func parseCPUInfo(metadata *Metadata, result *models.AnalysisResult) {
cpus := []struct {
socket int
data CPUMetadata
}{
{0, metadata.CPU0},
{1, metadata.CPU1},
}
for _, cpu := range cpus {
if cpu.data.CPUID == "" {
continue
}
// Parse core count
coreCount := 0
if cpu.data.CoreCount != "" {
if count, err := strconv.ParseInt(strings.TrimPrefix(cpu.data.CoreCount, "0x"), 16, 64); err == nil {
coreCount = int(count)
}
}
cpuModel := models.CPU{
Socket: cpu.socket,
Model: fmt.Sprintf("Intel CPU (CPUID: %s)", cpu.data.CPUID),
Cores: coreCount,
}
// Add PPIN
if cpu.data.PPIN != "" && cpu.data.PPIN != "0x0" {
cpuModel.PPIN = cpu.data.PPIN
}
result.Hardware.CPUs = append(result.Hardware.CPUs, cpuModel)
// Add microcode version to firmware list
if cpu.data.UcodePatchVer != "" {
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
DeviceName: fmt.Sprintf("CPU%d Microcode", cpu.socket),
Version: cpu.data.UcodePatchVer,
})
}
}
}
// parseMCAErrors extracts Machine Check Architecture errors
func parseMCAErrors(crashData *struct {
METADATA Metadata `json:"METADATA"`
PROCESSORS ProcessorsData `json:"PROCESSORS"`
}, result *models.AnalysisResult) {
timestamp := time.Now()
if crashData.METADATA.Timestamp != "" {
if t, err := time.Parse(time.RFC3339, crashData.METADATA.Timestamp); err == nil {
timestamp = t
}
}
// Parse each CPU's MCA data
cpuProcs := []struct {
name string
data Processors
}{
{"cpu0", crashData.PROCESSORS.CPU0},
{"cpu1", crashData.PROCESSORS.CPU1},
}
for _, cpu := range cpuProcs {
if cpu.data.MCA.Uncore == nil {
continue
}
// Check each MCA bank for errors
for bankName, bankDataRaw := range cpu.data.MCA.Uncore {
bankData, ok := bankDataRaw.(map[string]interface{})
if !ok {
continue
}
// Look for status register
statusKey := strings.ToLower(bankName) + "_status"
statusRaw, ok := bankData[statusKey]
if !ok {
continue
}
statusStr, ok := statusRaw.(string)
if !ok {
continue
}
// Parse status value
status, err := strconv.ParseUint(strings.TrimPrefix(statusStr, "0x"), 16, 64)
if err != nil {
continue
}
// Check if MCA error is valid (bit 63 = Valid)
if status&(1<<63) != 0 {
// MCA error detected
severity := models.SeverityWarning
if status&(1<<61) != 0 { // UC bit = uncorrected error
severity = models.SeverityCritical
}
description := fmt.Sprintf("MCA Error in %s bank %s", cpu.name, bankName)
if status&(1<<61) != 0 {
description += " (Uncorrected)"
} else {
description += " (Corrected)"
}
result.Events = append(result.Events, models.Event{
Timestamp: timestamp,
Source: "MCA",
EventType: "Machine Check",
Description: description,
Severity: severity,
RawData: fmt.Sprintf("Status: %s, CPU: %s, Bank: %s", statusStr, cpu.name, bankName),
})
}
}
}
}

View File

@@ -1,98 +0,0 @@
// Package supermicro provides parser for Supermicro BMC crashdump archives
// Tested with: Supermicro SYS-821GE-TNHR (Crashdump format)
//
// IMPORTANT: Increment parserVersion when modifying parser logic!
// This helps track which version was used to parse specific logs.
package supermicro
import (
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
// parserVersion - version of this parser module
// IMPORTANT: Increment this version when making changes to parser logic!
const parserVersion = "1.0.0"
func init() {
parser.Register(&Parser{})
}
// Parser implements VendorParser for Supermicro servers
type Parser struct{}
// Name returns human-readable parser name
func (p *Parser) Name() string {
return "SMC Crash Dump Parser"
}
// Vendor returns vendor identifier
func (p *Parser) Vendor() string {
return "supermicro"
}
// Version returns parser version
// IMPORTANT: Update parserVersion constant when modifying parser logic!
func (p *Parser) Version() string {
return parserVersion
}
// Detect checks if archive matches Supermicro crashdump format
// Returns confidence 0-100
func (p *Parser) Detect(files []parser.ExtractedFile) int {
confidence := 0
for _, f := range files {
path := strings.ToLower(f.Path)
// Strong indicator for Supermicro Crashdump format
if strings.HasSuffix(path, "cdump.txt") {
// Check if it's really Supermicro crashdump format
if containsCrashdumpMarkers(f.Content) {
confidence += 80
}
}
// Cap at 100
if confidence >= 100 {
return 100
}
}
return confidence
}
// containsCrashdumpMarkers checks if content has Supermicro crashdump markers
func containsCrashdumpMarkers(content []byte) bool {
s := string(content)
// Check for typical Supermicro Crashdump structure
return strings.Contains(s, "crash_data") &&
strings.Contains(s, "METADATA") &&
(strings.Contains(s, "bmc_fw_ver") || strings.Contains(s, "crashdump_ver"))
}
// Parse parses Supermicro crashdump archive
func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, error) {
result := &models.AnalysisResult{
Events: make([]models.Event, 0),
FRU: make([]models.FRUInfo, 0),
Sensors: make([]models.SensorReading, 0),
}
// Initialize hardware config
result.Hardware = &models.HardwareConfig{
CPUs: make([]models.CPU, 0),
}
// Parse CDump.txt (JSON crashdump)
if f := parser.FindFileByName(files, "CDump.txt"); f != nil {
if err := ParseCrashDump(f.Content, result); err != nil {
// Log error but continue parsing other files
_ = err // Ignore error for now
}
}
return result, nil
}

1040
internal/parser/vendors/unraid/parser.go vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,393 @@
package unraid
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestDetect(t *testing.T) {
tests := []struct {
name string
files []parser.ExtractedFile
wantMin int
wantMax int
shouldFind bool
}{
{
name: "typical unraid diagnostics",
files: []parser.ExtractedFile{
{
Path: "box3-diagnostics-20260205-2333/unraid-7.2.0.txt",
Content: []byte("7.2.0\n"),
},
{
Path: "box3-diagnostics-20260205-2333/system/vars.txt",
Content: []byte("[parity] => Array\n[disk1] => Array\n"),
},
},
wantMin: 50,
wantMax: 100,
shouldFind: true,
},
{
name: "unraid with kernel marker",
files: []parser.ExtractedFile{
{
Path: "diagnostics/system/lscpu.txt",
Content: []byte("Unraid kernel build 6.12.54"),
},
},
wantMin: 50,
wantMax: 100,
shouldFind: true,
},
{
name: "not unraid",
files: []parser.ExtractedFile{
{
Path: "some/random/file.txt",
Content: []byte("just some random content"),
},
},
wantMin: 0,
wantMax: 0,
shouldFind: false,
},
}
p := &Parser{}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := p.Detect(tt.files)
if tt.shouldFind && got < tt.wantMin {
t.Errorf("Detect() = %v, want at least %v", got, tt.wantMin)
}
if got > tt.wantMax {
t.Errorf("Detect() = %v, want at most %v", got, tt.wantMax)
}
if !tt.shouldFind && got > 0 {
t.Errorf("Detect() = %v, want 0 (should not detect)", got)
}
})
}
}
func TestParse_Version(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "unraid-7.2.0.txt",
Content: []byte("7.2.0\n"),
},
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse() error = %v", err)
}
if len(result.Hardware.Firmware) == 0 {
t.Fatal("expected firmware info")
}
fw := result.Hardware.Firmware[0]
if fw.DeviceName != "Unraid OS" {
t.Errorf("DeviceName = %v, want 'Unraid OS'", fw.DeviceName)
}
if fw.Version != "7.2.0" {
t.Errorf("Version = %v, want '7.2.0'", fw.Version)
}
}
func TestParse_CPU(t *testing.T) {
lscpuContent := `Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 16
Model name: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Core(s) per socket: 8
Socket(s): 1
CPU max MHz: 3400.0000
`
files := []parser.ExtractedFile{
{
Path: "diagnostics/system/lscpu.txt",
Content: []byte(lscpuContent),
},
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse() error = %v", err)
}
if len(result.Hardware.CPUs) == 0 {
t.Fatal("expected CPU info")
}
cpu := result.Hardware.CPUs[0]
if cpu.Model != "Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz" {
t.Errorf("Model = %v", cpu.Model)
}
if cpu.Cores != 8 {
t.Errorf("Cores = %v, want 8", cpu.Cores)
}
if cpu.Threads != 16 {
t.Errorf("Threads = %v, want 16", cpu.Threads)
}
if cpu.FrequencyMHz != 3400 {
t.Errorf("FrequencyMHz = %v, want 3400", cpu.FrequencyMHz)
}
}
func TestParse_Memory(t *testing.T) {
memContent := ` total used free shared buff/cache available
Mem: 50Gi 11Gi 1.4Gi 565Mi 39Gi 39Gi
Swap: 0B 0B 0B
Total: 50Gi 11Gi 1.4Gi
`
files := []parser.ExtractedFile{
{
Path: "diagnostics/system/memory.txt",
Content: []byte(memContent),
},
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse() error = %v", err)
}
if len(result.Hardware.Memory) == 0 {
t.Fatal("expected memory info")
}
mem := result.Hardware.Memory[0]
expectedSizeMB := 50 * 1024 // 50 GiB in MB
if mem.SizeMB != expectedSizeMB {
t.Errorf("SizeMB = %v, want %v", mem.SizeMB, expectedSizeMB)
}
if mem.Type != "DRAM" {
t.Errorf("Type = %v, want 'DRAM'", mem.Type)
}
}
func TestParse_SMART(t *testing.T) {
smartContent := `smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.54-Unraid] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: ST4000NM000B-2TF100
Serial Number: WX103EC9
LU WWN Device Id: 5 000c50 0ed59db60
Firmware Version: TNA1
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
`
files := []parser.ExtractedFile{
{
Path: "diagnostics/smart/ST4000NM000B-2TF100_WX103EC9-20260205-2333 disk1 (sdi).txt",
Content: []byte(smartContent),
},
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse() error = %v", err)
}
if len(result.Hardware.Storage) == 0 {
t.Fatal("expected storage info")
}
disk := result.Hardware.Storage[0]
if disk.Model != "ST4000NM000B-2TF100" {
t.Errorf("Model = %v, want 'ST4000NM000B-2TF100'", disk.Model)
}
if disk.SerialNumber != "WX103EC9" {
t.Errorf("SerialNumber = %v, want 'WX103EC9'", disk.SerialNumber)
}
if disk.Firmware != "TNA1" {
t.Errorf("Firmware = %v, want 'TNA1'", disk.Firmware)
}
if disk.SizeGB != 4000 {
t.Errorf("SizeGB = %v, want 4000", disk.SizeGB)
}
if disk.Type != "hdd" {
t.Errorf("Type = %v, want 'hdd'", disk.Type)
}
// Check that no health warnings were generated (PASSED health)
healthWarnings := 0
for _, event := range result.Events {
if event.EventType == "Disk Health" && event.Severity == "warning" {
healthWarnings++
}
}
if healthWarnings != 0 {
t.Errorf("Expected no health warnings for PASSED disk, got %v", healthWarnings)
}
}
func TestParser_Metadata(t *testing.T) {
p := &Parser{}
if p.Name() != "Unraid Parser" {
t.Errorf("Name() = %v, want 'Unraid Parser'", p.Name())
}
if p.Vendor() != "unraid" {
t.Errorf("Vendor() = %v, want 'unraid'", p.Vendor())
}
if p.Version() == "" {
t.Error("Version() should not be empty")
}
}
func TestParse_MemoryDIMMsFromMeminfo(t *testing.T) {
memInfo := `MemTotal: 53393436 kB
Handle 0x002D, DMI type 17, 34 bytes
Memory Device
Size: 16 GB
Locator: Node0_Dimm1
Bank Locator: Node0_Bank0
Type: DDR3
Speed: 1333 MT/s
Manufacturer: Samsung
Serial Number: 238F7649
Part Number: M393B2G70BH0-
Rank: 4
Configured Memory Speed: 1333 MT/s
Handle 0x002F, DMI type 17, 34 bytes
Memory Device
Size: No Module Installed
Locator: Node0_Dimm2
`
files := []parser.ExtractedFile{
{Path: "diagnostics/system/memory.txt", Content: []byte("Mem: 50Gi")},
{Path: "diagnostics/system/meminfo.txt", Content: []byte(memInfo)},
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse() error = %v", err)
}
if got := len(result.Hardware.Memory); got != 1 {
t.Fatalf("expected only installed DIMM entries, got %d entries", got)
}
dimm := result.Hardware.Memory[0]
if dimm.Slot != "Node0_Dimm1" {
t.Errorf("Slot = %q, want Node0_Dimm1", dimm.Slot)
}
if dimm.SizeMB != 16*1024 {
t.Errorf("SizeMB = %d, want %d", dimm.SizeMB, 16*1024)
}
if dimm.Type != "DDR3" {
t.Errorf("Type = %q, want DDR3", dimm.Type)
}
if dimm.SerialNumber != "238F7649" {
t.Errorf("SerialNumber = %q, want 238F7649", dimm.SerialNumber)
}
}
func TestParse_NetworkAndPCIeFromLSPCIAndEthtool(t *testing.T) {
lspci := `03:00.0 SCSI storage controller [0100]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
07:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)
`
ethtool := `Settings for eth0:
Speed: 1000Mb/s
Link detected: yes
driver: r8168
firmware-version:
bus-info: 0000:07:00.0
--------------------------------
`
files := []parser.ExtractedFile{
{Path: "diagnostics/system/lspci.txt", Content: []byte(lspci)},
{Path: "diagnostics/system/ethtool.txt", Content: []byte(ethtool)},
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse() error = %v", err)
}
if len(result.Hardware.NetworkAdapters) != 1 {
t.Fatalf("expected 1 network adapter, got %d", len(result.Hardware.NetworkAdapters))
}
nic := result.Hardware.NetworkAdapters[0]
if nic.Location != "0000:07:00.0" {
t.Errorf("Location = %q, want 0000:07:00.0", nic.Location)
}
if nic.Model == "" {
t.Error("Model should not be empty")
}
if nic.Vendor == "" {
t.Error("Vendor should not be empty")
}
if len(result.Hardware.PCIeDevices) < 2 {
t.Fatalf("expected at least 2 PCIe devices, got %d", len(result.Hardware.PCIeDevices))
}
}
func TestParse_HostSerialFallbackFromVarsUUID(t *testing.T) {
vars := ` [flashGUID] => 1...
[regGUID] => 1...7
[uuid] => 2713440667722491190
`
files := []parser.ExtractedFile{
{Path: "diagnostics/system/vars.txt", Content: []byte(vars)},
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse() error = %v", err)
}
if result.Hardware.BoardInfo.SerialNumber != "2713440667722491190" {
t.Fatalf("BoardInfo.SerialNumber = %q, want vars uuid", result.Hardware.BoardInfo.SerialNumber)
}
if result.Hardware.BoardInfo.UUID != "2713440667722491190" {
t.Fatalf("BoardInfo.UUID = %q, want vars uuid", result.Hardware.BoardInfo.UUID)
}
}

View File

@@ -4,17 +4,17 @@ package vendors
import (
// Import vendor modules to trigger their init() registration
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/h3c"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/nvidia"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/nvidia_bug_report"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/supermicro"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/unraid"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/xigmanas"
// Generic fallback parser (must be last for lowest priority)
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/generic"
// Future vendors:
// _ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
// _ "git.mchus.pro/mchus/logpile/internal/parser/vendors/hpe"
// _ "git.mchus.pro/mchus/logpile/internal/parser/vendors/lenovo"
)

View File

@@ -1,46 +0,0 @@
# Xigmanas Parser
Parser for Xigmanas (FreeBSD-based NAS) system logs.
## Supported Files
- `xigmanas` - Main system log file with configuration and status information
- `dmesg` - Kernel messages and hardware initialization information
- SMART data from disk monitoring
## Features
This parser extracts the following information from Xigmanas logs:
### System Information
- Firmware version
- System uptime
- CPU model and specifications
- Memory configuration
- Hardware platform information
### Storage Information
- Disk models and serial numbers
- Disk capacity and health status
- SMART temperature readings
### Hardware Configuration
- CPU information
- Memory modules
- Storage devices
## Detection Logic
The parser detects Xigmanas format by looking for:
- Files with "xigmanas", "system", or "dmesg" in their names
- Content containing "XigmaNAS" or "FreeBSD" strings
- SMART-related information in log content
## Example Output
The parser populates the following fields in AnalysisResult:
- `Hardware.Firmware` - Firmware versions
- `Hardware.CPUs` - CPU information
- `Hardware.Memory` - Memory configuration
- `Hardware.Storage` - Storage devices with SMART data
- `Sensors` - Temperature readings from SMART data

View File

@@ -12,7 +12,7 @@ import (
)
// parserVersion - increment when parsing logic changes.
const parserVersion = "2.1.0"
const parserVersion = "2.2"
func init() {
parser.Register(&Parser{})
@@ -431,7 +431,7 @@ func parseEventTimestamp(line string) time.Time {
prefixRe := regexp.MustCompile(`^[A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}`)
if prefix := prefixRe.FindString(line); prefix != "" {
year := time.Now().Year()
if ts, err := time.Parse("Jan 2 15:04:05 2006", prefix+" "+strconv.Itoa(year)); err == nil {
if ts, err := parser.ParseInDefaultArchiveLocation("Jan 2 15:04:05 2006", prefix+" "+strconv.Itoa(year)); err == nil {
return ts
}
}

View File

@@ -0,0 +1,779 @@
package server
import (
"fmt"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
type slotFirmwareInfo struct {
Model string
Version string
Category string
}
var (
psuFirmwareRe = regexp.MustCompile(`(?i)^PSU\s*([0-9A-Za-z_-]+)\s*(?:\(([^)]+)\))?$`)
nicFirmwareRe = regexp.MustCompile(`(?i)^NIC\s+([^()]+?)\s*(?:\(([^)]+)\))?$`)
gpuFirmwareRe = regexp.MustCompile(`(?i)^GPU\s+([^()]+?)\s*(?:\(([^)]+)\))?$`)
nvsFirmwareRe = regexp.MustCompile(`(?i)^NVSwitch\s+([^()]+?)\s*(?:\(([^)]+)\))?$`)
)
func BuildHardwareDevices(hw *models.HardwareConfig) []models.HardwareDevice {
if hw == nil {
return nil
}
all := make([]models.HardwareDevice, 0, 1+len(hw.CPUs)+len(hw.Memory)+len(hw.Storage)+len(hw.PCIeDevices)+len(hw.GPUs)+len(hw.NetworkAdapters)+len(hw.PowerSupply))
fwBySlot := buildFirmwareBySlot(hw.Firmware)
nextID := 0
add := func(d models.HardwareDevice) {
d.ID = fmt.Sprintf("%s:%d", d.Kind, nextID)
nextID++
all = append(all, d)
}
add(models.HardwareDevice{
Kind: models.DeviceKindBoard,
Source: "board",
Slot: "board",
Model: strings.TrimSpace(hw.BoardInfo.ProductName),
PartNumber: strings.TrimSpace(hw.BoardInfo.PartNumber),
Manufacturer: strings.TrimSpace(hw.BoardInfo.Manufacturer),
SerialNumber: strings.TrimSpace(hw.BoardInfo.SerialNumber),
Details: map[string]any{
"description": strings.TrimSpace(hw.BoardInfo.Description),
"version": strings.TrimSpace(hw.BoardInfo.Version),
"uuid": strings.TrimSpace(hw.BoardInfo.UUID),
},
})
for _, cpu := range hw.CPUs {
add(models.HardwareDevice{
Kind: models.DeviceKindCPU,
Source: "cpus",
Slot: fmt.Sprintf("CPU%d", cpu.Socket),
Model: cpu.Model,
SerialNumber: cpu.SerialNumber,
Cores: cpu.Cores,
Threads: cpu.Threads,
FrequencyMHz: cpu.FrequencyMHz,
MaxFreqMHz: cpu.MaxFreqMHz,
Status: cpu.Status,
StatusCheckedAt: cpu.StatusCheckedAt,
StatusChangedAt: cpu.StatusChangedAt,
StatusAtCollect: cpu.StatusAtCollect,
StatusHistory: cpu.StatusHistory,
ErrorDescription: cpu.ErrorDescription,
Details: map[string]any{
"description": cpu.Description,
"socket": cpu.Socket,
"l1_cache_kb": cpu.L1CacheKB,
"l2_cache_kb": cpu.L2CacheKB,
"l3_cache_kb": cpu.L3CacheKB,
"tdp_w": cpu.TDP,
"ppin": cpu.PPIN,
},
})
}
for _, mem := range hw.Memory {
if !mem.Present || mem.SizeMB == 0 {
continue
}
present := mem.Present
add(models.HardwareDevice{
Kind: models.DeviceKindMemory,
Source: "memory",
Slot: mem.Slot,
Location: mem.Location,
Manufacturer: mem.Manufacturer,
SerialNumber: mem.SerialNumber,
PartNumber: mem.PartNumber,
Type: mem.Type,
Present: &present,
SizeMB: mem.SizeMB,
Status: mem.Status,
StatusCheckedAt: mem.StatusCheckedAt,
StatusChangedAt: mem.StatusChangedAt,
StatusAtCollect: mem.StatusAtCollect,
StatusHistory: mem.StatusHistory,
ErrorDescription: mem.ErrorDescription,
Details: map[string]any{
"description": mem.Description,
"technology": mem.Technology,
"max_speed_mhz": mem.MaxSpeedMHz,
"current_speed_mhz": mem.CurrentSpeedMHz,
"ranks": mem.Ranks,
},
})
}
for _, stor := range hw.Storage {
if !stor.Present {
continue
}
present := stor.Present
add(models.HardwareDevice{
Kind: models.DeviceKindStorage,
Source: "storage",
Slot: stor.Slot,
Location: stor.Location,
Model: stor.Model,
Manufacturer: stor.Manufacturer,
SerialNumber: stor.SerialNumber,
Firmware: stor.Firmware,
Type: stor.Type,
Interface: stor.Interface,
Present: &present,
SizeGB: stor.SizeGB,
Status: stor.Status,
StatusCheckedAt: stor.StatusCheckedAt,
StatusChangedAt: stor.StatusChangedAt,
StatusAtCollect: stor.StatusAtCollect,
StatusHistory: stor.StatusHistory,
ErrorDescription: stor.ErrorDescription,
Details: map[string]any{
"description": stor.Description,
"backplane_id": stor.BackplaneID,
},
})
}
for _, p := range hw.PCIeDevices {
if isEmptyPCIeDevice(p) {
continue
}
slotKey := normalizeSlotKey(p.Slot)
fwInfo := fwBySlot[slotKey]
model := strings.TrimSpace(p.PartNumber)
if model == "" {
model = strings.TrimSpace(p.DeviceClass)
}
if model == "" {
model = strings.TrimSpace(p.Description)
}
if model == "" && fwInfo.Model != "" {
model = fwInfo.Model
}
add(models.HardwareDevice{
Kind: models.DeviceKindPCIe,
Source: "pcie_devices",
Slot: p.Slot,
BDF: p.BDF,
DeviceClass: p.DeviceClass,
VendorID: p.VendorID,
DeviceID: p.DeviceID,
Model: model,
PartNumber: p.PartNumber,
Manufacturer: p.Manufacturer,
SerialNumber: p.SerialNumber,
Firmware: fwInfo.Version,
MACAddresses: p.MACAddresses,
LinkWidth: p.LinkWidth,
LinkSpeed: p.LinkSpeed,
MaxLinkWidth: p.MaxLinkWidth,
MaxLinkSpeed: p.MaxLinkSpeed,
Status: p.Status,
StatusCheckedAt: p.StatusCheckedAt,
StatusChangedAt: p.StatusChangedAt,
StatusAtCollect: p.StatusAtCollect,
StatusHistory: p.StatusHistory,
ErrorDescription: p.ErrorDescription,
Details: map[string]any{
"description": p.Description,
"fw_category": fwInfo.Category,
},
})
}
for _, gpu := range hw.GPUs {
add(models.HardwareDevice{
Kind: models.DeviceKindGPU,
Source: "gpus",
Slot: gpu.Slot,
Location: gpu.Location,
BDF: gpu.BDF,
DeviceClass: "DisplayController",
VendorID: gpu.VendorID,
DeviceID: gpu.DeviceID,
Model: gpu.Model,
PartNumber: gpu.PartNumber,
Manufacturer: gpu.Manufacturer,
SerialNumber: gpu.SerialNumber,
Firmware: gpu.Firmware,
LinkWidth: gpu.CurrentLinkWidth,
LinkSpeed: gpu.CurrentLinkSpeed,
MaxLinkWidth: gpu.MaxLinkWidth,
MaxLinkSpeed: gpu.MaxLinkSpeed,
Status: gpu.Status,
StatusCheckedAt: gpu.StatusCheckedAt,
StatusChangedAt: gpu.StatusChangedAt,
StatusAtCollect: gpu.StatusAtCollect,
StatusHistory: gpu.StatusHistory,
ErrorDescription: gpu.ErrorDescription,
Details: map[string]any{
"description": gpu.Description,
"uuid": gpu.UUID,
"video_bios": gpu.VideoBIOS,
"irq": gpu.IRQ,
"bus_type": gpu.BusType,
"dma_size": gpu.DMASize,
"dma_mask": gpu.DMAMask,
"device_minor": gpu.DeviceMinor,
"temperature": gpu.Temperature,
"mem_temperature": gpu.MemTemperature,
"power": gpu.Power,
"max_power": gpu.MaxPower,
"clock_speed": gpu.ClockSpeed,
},
})
}
for _, nic := range hw.NetworkAdapters {
if !nic.Present {
continue
}
present := nic.Present
add(models.HardwareDevice{
Kind: models.DeviceKindNetwork,
Source: "network_adapters",
Slot: nic.Slot,
Location: nic.Location,
VendorID: nic.VendorID,
DeviceID: nic.DeviceID,
Model: nic.Model,
PartNumber: nic.PartNumber,
Manufacturer: nic.Vendor,
SerialNumber: nic.SerialNumber,
Firmware: nic.Firmware,
PortCount: nic.PortCount,
PortType: nic.PortType,
MACAddresses: nic.MACAddresses,
Present: &present,
Status: nic.Status,
StatusCheckedAt: nic.StatusCheckedAt,
StatusChangedAt: nic.StatusChangedAt,
StatusAtCollect: nic.StatusAtCollect,
StatusHistory: nic.StatusHistory,
ErrorDescription: nic.ErrorDescription,
Details: map[string]any{
"description": nic.Description,
},
})
}
for _, psu := range hw.PowerSupply {
if !psu.Present {
continue
}
present := psu.Present
add(models.HardwareDevice{
Kind: models.DeviceKindPSU,
Source: "power_supplies",
Slot: psu.Slot,
Model: psu.Model,
PartNumber: psu.PartNumber,
Manufacturer: psu.Vendor,
SerialNumber: psu.SerialNumber,
Firmware: psu.Firmware,
Present: &present,
WattageW: psu.WattageW,
InputType: psu.InputType,
InputPowerW: psu.InputPowerW,
OutputPowerW: psu.OutputPowerW,
InputVoltage: psu.InputVoltage,
TemperatureC: psu.TemperatureC,
Status: psu.Status,
StatusCheckedAt: psu.StatusCheckedAt,
StatusChangedAt: psu.StatusChangedAt,
StatusAtCollect: psu.StatusAtCollect,
StatusHistory: psu.StatusHistory,
ErrorDescription: psu.ErrorDescription,
Details: map[string]any{
"description": psu.Description,
"output_voltage": psu.OutputVoltage,
},
})
}
return annotateDuplicateSerials(dedupeDevices(all))
}
func isEmptyPCIeDevice(p models.PCIeDevice) bool {
if isNumericSlot(strings.TrimSpace(p.Slot)) &&
strings.TrimSpace(p.BDF) == "" &&
p.VendorID == 0 &&
p.DeviceID == 0 &&
normalizedSerial(p.SerialNumber) == "" &&
!hasMeaningfulText(p.PartNumber) &&
!hasMeaningfulText(p.Manufacturer) &&
!hasMeaningfulText(p.Description) &&
len(p.MACAddresses) == 0 &&
p.LinkWidth == 0 &&
p.MaxLinkWidth == 0 {
return true
}
if strings.TrimSpace(p.BDF) != "" {
return false
}
if p.VendorID != 0 || p.DeviceID != 0 {
return false
}
if normalizedSerial(p.SerialNumber) != "" {
return false
}
if hasMeaningfulText(p.PartNumber) {
return false
}
if hasMeaningfulText(p.Manufacturer) {
return false
}
if hasMeaningfulText(p.Description) {
return false
}
if strings.TrimSpace(p.DeviceClass) != "" {
class := strings.ToLower(strings.TrimSpace(p.DeviceClass))
if class != "unknown" && class != "other" && class != "pcie device" {
return false
}
}
return true
}
func isNumericSlot(slot string) bool {
if slot == "" {
return false
}
for _, r := range slot {
if r < '0' || r > '9' {
return false
}
}
return true
}
func hasMeaningfulText(v string) bool {
s := strings.ToLower(strings.TrimSpace(v))
if s == "" {
return false
}
switch s {
case "-", "n/a", "na", "none", "null", "unknown":
return false
default:
return true
}
}
func dedupeDevices(items []models.HardwareDevice) []models.HardwareDevice {
if len(items) < 2 {
return items
}
parent := make([]int, len(items))
for i := range parent {
parent[i] = i
}
find := func(x int) int {
for parent[x] != x {
parent[x] = parent[parent[x]]
x = parent[x]
}
return x
}
union := func(a, b int) {
ra := find(a)
rb := find(b)
if ra != rb {
parent[rb] = ra
}
}
for i := 0; i < len(items); i++ {
for j := i + 1; j < len(items); j++ {
if shouldMergeDevices(items[i], items[j]) {
union(i, j)
}
}
}
groups := make(map[int][]int, len(items))
order := make([]int, 0, len(items))
for i := range items {
root := find(i)
if _, ok := groups[root]; !ok {
order = append(order, root)
}
groups[root] = append(groups[root], i)
}
out := make([]models.HardwareDevice, 0, len(order))
for _, root := range order {
indices := groups[root]
bestIdx := indices[0]
bestScore := qualityScore(items[bestIdx])
for _, idx := range indices[1:] {
if s := qualityScore(items[idx]); s > bestScore {
bestIdx = idx
bestScore = s
}
}
merged := items[bestIdx]
for _, idx := range indices {
if idx == bestIdx {
continue
}
merged = mergeDevices(merged, items[idx])
}
out = append(out, merged)
}
for i := range out {
out[i].ID = out[i].Kind + ":" + strconv.Itoa(i)
}
return out
}
func shouldMergeDevices(a, b models.HardwareDevice) bool {
aSN := strings.ToLower(normalizedSerial(a.SerialNumber))
bSN := strings.ToLower(normalizedSerial(b.SerialNumber))
aBDF := strings.ToLower(strings.TrimSpace(a.BDF))
bBDF := strings.ToLower(strings.TrimSpace(b.BDF))
aSlot := normalizeSlot(a.Slot)
bSlot := normalizeSlot(b.Slot)
// Memory DIMMs can legitimately share serial number in some dumps.
// Never merge DIMMs with different slots.
if a.Kind == models.DeviceKindMemory && b.Kind == models.DeviceKindMemory {
if aSlot != "" && bSlot != "" && aSlot != bSlot {
return false
}
}
// Hard conflicts.
if aSN != "" && bSN != "" && aSN == bSN {
if a.Kind == models.DeviceKindMemory && b.Kind == models.DeviceKindMemory {
return aSlot != "" && bSlot != "" && aSlot == bSlot
}
return true
}
if aSN != "" && bSN != "" && aSN != bSN {
return false
}
if aBDF != "" && bBDF != "" && aBDF != bBDF {
return false
}
// Strong identities.
if aBDF != "" && aBDF == bBDF {
return true
}
// If both have no strong IDs, be conservative.
if aSN == "" && bSN == "" && aBDF == "" && bBDF == "" {
if hasMACOverlap(a.MACAddresses, b.MACAddresses) {
return true
}
if aSlot != "" && aSlot == bSlot {
return true
}
return false
}
score := 0
if samePCIID(a, b) {
score += 4
}
if sameModel(a, b) {
score += 3
}
if sameManufacturer(a, b) {
score += 2
}
if aSlot != "" && aSlot == bSlot {
score += 2
}
if hasMACOverlap(a.MACAddresses, b.MACAddresses) {
score += 2
}
if sameKindFamily(a.Kind, b.Kind) {
score++
}
if samePCIID(a, b) && ((aBDF != "" && bBDF == "") || (aBDF == "" && bBDF != "")) {
score += 2
}
return score >= 7
}
func mergeDevices(primary, secondary models.HardwareDevice) models.HardwareDevice {
fillString := func(dst *string, src string) {
if strings.TrimSpace(*dst) == "" && strings.TrimSpace(src) != "" {
*dst = src
}
}
fillInt := func(dst *int, src int) {
if *dst == 0 && src != 0 {
*dst = src
}
}
fillFloat := func(dst *float64, src float64) {
if *dst == 0 && src != 0 {
*dst = src
}
}
fillString(&primary.ID, secondary.ID)
fillString(&primary.Kind, secondary.Kind)
fillString(&primary.Source, secondary.Source)
fillString(&primary.Slot, secondary.Slot)
fillString(&primary.Location, secondary.Location)
fillString(&primary.BDF, secondary.BDF)
fillString(&primary.DeviceClass, secondary.DeviceClass)
fillInt(&primary.VendorID, secondary.VendorID)
fillInt(&primary.DeviceID, secondary.DeviceID)
fillString(&primary.Model, secondary.Model)
fillString(&primary.PartNumber, secondary.PartNumber)
fillString(&primary.Manufacturer, secondary.Manufacturer)
fillString(&primary.SerialNumber, secondary.SerialNumber)
fillString(&primary.Firmware, secondary.Firmware)
fillString(&primary.Type, secondary.Type)
fillString(&primary.Interface, secondary.Interface)
if primary.Present == nil && secondary.Present != nil {
primary.Present = secondary.Present
}
fillInt(&primary.SizeMB, secondary.SizeMB)
fillInt(&primary.SizeGB, secondary.SizeGB)
fillInt(&primary.Cores, secondary.Cores)
fillInt(&primary.Threads, secondary.Threads)
fillInt(&primary.FrequencyMHz, secondary.FrequencyMHz)
fillInt(&primary.MaxFreqMHz, secondary.MaxFreqMHz)
fillInt(&primary.PortCount, secondary.PortCount)
fillString(&primary.PortType, secondary.PortType)
if len(primary.MACAddresses) == 0 && len(secondary.MACAddresses) > 0 {
primary.MACAddresses = secondary.MACAddresses
}
fillInt(&primary.LinkWidth, secondary.LinkWidth)
fillString(&primary.LinkSpeed, secondary.LinkSpeed)
fillInt(&primary.MaxLinkWidth, secondary.MaxLinkWidth)
fillString(&primary.MaxLinkSpeed, secondary.MaxLinkSpeed)
fillInt(&primary.WattageW, secondary.WattageW)
fillString(&primary.InputType, secondary.InputType)
fillInt(&primary.InputPowerW, secondary.InputPowerW)
fillInt(&primary.OutputPowerW, secondary.OutputPowerW)
fillFloat(&primary.InputVoltage, secondary.InputVoltage)
fillInt(&primary.TemperatureC, secondary.TemperatureC)
fillString(&primary.Status, secondary.Status)
if primary.StatusCheckedAt == nil && secondary.StatusCheckedAt != nil {
primary.StatusCheckedAt = secondary.StatusCheckedAt
}
if primary.StatusChangedAt == nil && secondary.StatusChangedAt != nil {
primary.StatusChangedAt = secondary.StatusChangedAt
}
if primary.StatusAtCollect == nil && secondary.StatusAtCollect != nil {
primary.StatusAtCollect = secondary.StatusAtCollect
}
if len(primary.StatusHistory) == 0 && len(secondary.StatusHistory) > 0 {
primary.StatusHistory = secondary.StatusHistory
}
fillString(&primary.ErrorDescription, secondary.ErrorDescription)
if primary.Details == nil && secondary.Details != nil {
primary.Details = secondary.Details
}
return primary
}
func samePCIID(a, b models.HardwareDevice) bool {
if (a.VendorID == 0 && a.DeviceID == 0) || (b.VendorID == 0 && b.DeviceID == 0) {
return false
}
return a.VendorID == b.VendorID && a.DeviceID == b.DeviceID
}
func sameModel(a, b models.HardwareDevice) bool {
am := normalizeText(coalesce(a.Model, a.PartNumber, a.DeviceClass))
bm := normalizeText(coalesce(b.Model, b.PartNumber, b.DeviceClass))
return am != "" && am == bm
}
func sameManufacturer(a, b models.HardwareDevice) bool {
am := normalizeText(a.Manufacturer)
bm := normalizeText(b.Manufacturer)
return am != "" && am == bm
}
func hasMACOverlap(a, b []string) bool {
if len(a) == 0 || len(b) == 0 {
return false
}
set := make(map[string]struct{}, len(a))
for _, mac := range a {
key := normalizeText(mac)
if key != "" {
set[key] = struct{}{}
}
}
for _, mac := range b {
if _, ok := set[normalizeText(mac)]; ok {
return true
}
}
return false
}
func sameKindFamily(a, b string) bool {
if a == b {
return true
}
family := map[string]bool{
models.DeviceKindPCIe: true,
models.DeviceKindGPU: true,
models.DeviceKindNetwork: true,
}
return family[a] && family[b]
}
func normalizeText(v string) string {
s := strings.ToLower(strings.TrimSpace(v))
s = strings.ReplaceAll(s, " ", "")
s = strings.ReplaceAll(s, "_", "")
s = strings.ReplaceAll(s, "-", "")
return s
}
func normalizeSlot(slot string) string {
return normalizeText(slot)
}
func qualityScore(d models.HardwareDevice) int {
score := 0
if normalizedSerial(d.SerialNumber) != "" {
score += 6
}
if strings.TrimSpace(d.BDF) != "" {
score += 4
}
if strings.TrimSpace(d.Model) != "" {
score += 3
}
if strings.TrimSpace(d.Firmware) != "" {
score += 2
}
if strings.TrimSpace(d.Status) != "" {
score++
}
return score
}
func normalizedSerial(serial string) string {
s := strings.TrimSpace(serial)
if s == "" {
return ""
}
switch strings.ToUpper(s) {
case "N/A", "NA", "NONE", "NULL", "UNKNOWN", "-":
return ""
default:
return s
}
}
func buildFirmwareBySlot(firmware []models.FirmwareInfo) map[string]slotFirmwareInfo {
out := make(map[string]slotFirmwareInfo)
add := func(slot, model, version, category string) {
key := normalizeSlotKey(slot)
if key == "" || strings.TrimSpace(version) == "" {
return
}
existing, ok := out[key]
if ok && strings.TrimSpace(existing.Model) != "" {
return
}
out[key] = slotFirmwareInfo{
Model: strings.TrimSpace(model),
Version: strings.TrimSpace(version),
Category: category,
}
}
for _, fw := range firmware {
name := strings.TrimSpace(fw.DeviceName)
if name == "" {
continue
}
if m := psuFirmwareRe.FindStringSubmatch(name); len(m) == 3 {
model := strings.TrimSpace(m[2])
if model == "" {
model = "PSU"
}
add(m[1], model, fw.Version, "psu")
continue
}
if m := nicFirmwareRe.FindStringSubmatch(name); len(m) == 3 {
model := strings.TrimSpace(m[2])
if model == "" {
model = "NIC"
}
add(m[1], model, fw.Version, "nic")
continue
}
if m := gpuFirmwareRe.FindStringSubmatch(name); len(m) == 3 {
model := strings.TrimSpace(m[2])
if model == "" {
model = "GPU"
}
add(m[1], model, fw.Version, "gpu")
continue
}
if m := nvsFirmwareRe.FindStringSubmatch(name); len(m) == 3 {
model := strings.TrimSpace(m[2])
if model == "" {
model = "NVSwitch"
}
add(m[1], model, fw.Version, "nvswitch")
continue
}
}
return out
}
func normalizeSlotKey(slot string) string {
return strings.ToLower(strings.TrimSpace(slot))
}
func annotateDuplicateSerials(items []models.HardwareDevice) []models.HardwareDevice {
if len(items) < 2 {
return items
}
countByKindSerial := make(map[string]int)
for _, d := range items {
serial := normalizedSerial(d.SerialNumber)
if serial == "" {
continue
}
key := d.Kind + "|" + strings.ToLower(serial)
countByKindSerial[key]++
}
seenByKindSerial := make(map[string]int)
for i := range items {
serial := normalizedSerial(items[i].SerialNumber)
if serial == "" {
continue
}
key := items[i].Kind + "|" + strings.ToLower(serial)
if countByKindSerial[key] < 2 {
continue
}
seenByKindSerial[key]++
items[i].SerialNumber = serial + " (DUP#" + strconv.Itoa(seenByKindSerial[key]) + ")"
}
return items
}

Some files were not shown because too many files have changed in this diff Show More