80 Commits

Author SHA1 Message Date
Mikhail Chusavitin
d8d3d8c524 build: use local go toolchain in release script 2026-03-16 00:32:09 +03:00
Mikhail Chusavitin
057a222288 ui: embed reanimator chart viewer 2026-03-16 00:20:11 +03:00
Mikhail Chusavitin
f11a43f690 export: merge inspur psu sensor groups 2026-03-15 23:29:44 +03:00
Mikhail Chusavitin
476630190d export: align reanimator contract v2.7 2026-03-15 23:27:32 +03:00
Mikhail Chusavitin
9007f1b360 export: align reanimator and enrich redfish metrics 2026-03-15 21:38:28 +03:00
Mikhail Chusavitin
0acdc2b202 docs: refresh project documentation 2026-03-15 16:35:16 +03:00
Mikhail Chusavitin
47bb0ee939 docs: document firmware filter regression pattern in bible (ADL-019)
Root cause analysis for device-bound firmware leaking into hardware.firmware
on Supermicro Redfish (SYS-A21GE-NBRT HGX B200):

- collectFirmwareInventory (6c19a58) had no coverage for Supermicro naming.
  isDeviceBoundFirmwareName checked "gpu " / "nic " (space-terminated) while
  Supermicro uses "GPU1 System Slot0" / "NIC1 System Slot0 ..." (digit suffix).

- 9c5512d added _fw_gpu_ / _fw_nvswitch_ / _inforom_gpu_ patterns to fix HGX,
  but checked DeviceName which contains "Software Inventory" (from Redfish Name),
  not the firmware Id. Dead code from day one.

09-testing.md: add firmware filter worked example and rule #4 — verify the
filter checks the field that the collector actually populates.

10-decisions.md: ADL-019 — isDeviceBoundFirmwareName must be extended per
vendor with a test case per vendor format before shipping.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 14:03:47 +03:00
Mikhail Chusavitin
5815100e2f exporter: filter Supermicro Redfish device-bound firmware from hardware.firmware
isDeviceBoundFirmwareName did not catch Supermicro FirmwareInventory naming
conventions where a digit follows the type prefix directly ("GPU1 System Slot0",
"NIC1 System Slot0 AOM-DP805-IO") instead of a space. Also missing: "Power supply N",
"NVMeController N", and "Software Inventory" (generic label for all HGX per-component
firmware slots — GPU, NVSwitch, PCIeRetimer, ERoT, InfoROM, etc.).

On SYS-A21GE-NBRT (HGX B200) this caused 29 device-bound entries to leak into
hardware.firmware: 8 GPU, 9 NIC, 1 NVMe, 6 PSU, 4 PCIeSwitch, 1 Software Inventory.

Fix: extend isDeviceBoundFirmwareName with patterns for all four new cases.
Add TestIsDeviceBoundFirmwareName covering both excluded and kept entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:48:55 +03:00
Mikhail Chusavitin
1eb639e6bf redfish: skip NVMe bay probe for non-storage chassis types (Module/Component/Zone)
On Supermicro HGX systems (SYS-A21GE-NBRT) ~35 sub-chassis (GPU, NVSwitch,
PCIeRetimer, ERoT/IRoT, BMC, FPGA) all carry ChassisType=Module/Component/Zone
and expose empty /Drives collections. shouldAdaptiveNVMeProbe returned true for
all of them, triggering 35 × 384 = 13 440 HTTP requests → ~22 min wasted per
collection (more than half of total 35 min collection time).

Fix: chassisTypeCanHaveNVMe returns false for Module, Component, Zone. The
candidate selection loop in collectRawRedfishTree now checks the parent chassis
doc before adding a /Drives path to the probe list. Enclosure (NVMe backplane),
RackMount, and unknown types are unaffected.

Tests:
- TestChassisTypeCanHaveNVMe: table-driven, covers excluded and storage-capable types
- TestNVMePostProbeSkipsNonStorageChassis: topology integration, GPU chassis +
  backplane with empty /Drives → exactly 1 candidate selected (backplane only)

Docs:
- ADL-018 in bible-local/10-decisions.md
- Candidate-selection test matrix in bible-local/09-testing.md
- SYS-A21GE-NBRT baseline row in docs/test_server_collection_memory.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:38:29 +03:00
Mikhail Chusavitin
a9f58b3cf4 redfish: fix GPU duplication on Supermicro HGX, exclude NVSwitch, restore path dedup
Three bugs, all related to GPU dedup in the Redfish replay pipeline:

1. collectGPUsFromProcessors (redfish_replay.go): GPU-type Processor entries
   (Systems/HGX_Baseboard_0/Processors/GPU_SXM_N) were not deduplicated against
   existing PCIeDevice GPUs on Supermicro HGX. The chassis-ID lookup keyed on
   processor Id ("GPU_SXM_1") but the chassis is named "HGX_GPU_SXM_1" — lookup
   returned nothing, serial stayed empty, UUID was unseen → 8 duplicate GPU rows.
   Fix: read SerialNumber directly from the Processor doc first; chassis lookup
   is now a fallback override (as it was designed for MSI).

2. looksLikeGPU (redfish.go): NVSwitch PCIe devices (Model="NVSwitch",
   Manufacturer="NVIDIA") were classified as GPUs because "nvidia" matched the
   GPU hint list. Fix: early return false when Model contains "nvswitch".

3. gpuDocDedupKey (redfish.go): commit 9df29b1 changed the dedup key to prefer
   slot|model before path, which collapsed two distinct GPUs with identical model
   names in GraphicsControllers into one entry. Fix: only serial and BDF are used
   as cross-path stable dedup keys; fall back to Redfish path when neither is
   present. This also restores TestReplayCollectGPUs_DedupUsesRedfishPathBeforeHeuristics
   which had been broken on main since 9df29b1.

Added tests:
- TestCollectGPUsFromProcessors_SupermicroHGX: Processor GPU dedup when
  chassis-ID naming convention does not match processor Id
- TestReplayCollectGPUs_DedupCrossChassisSerial: same GPU via two Chassis
  PCIeDevice trees with matching serials → collapsed to one
- TestLooksLikeGPU_NVSwitchExcluded: NVSwitch is not a GPU

Added rule to bible-local/09-testing.md: dedup/filter/classify functions must
cover true-positive, true-negative, and the vendor counter-case axes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 15:09:27 +03:00
Mikhail Chusavitin
d8ffe3d3a5 redfish: add service root to critical endpoints, tolerate missing root in replay
Add /redfish/v1 to redfishCriticalEndpoints so plan-B retries the service
root if it failed during the main crawl. Also downgrade the missing-root
error in ReplayRedfishFromRawPayloads from fatal to a warning so analysis
can complete with defaults when the root doc was not recovered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 08:31:00 +03:00
Mikhail Chusavitin
9df29b1be9 fix: dedup GPUs across multiple chassis PCIeDevice trees in Redfish collector
Supermicro HGX exposes each GPU under both Chassis/1/PCIeDevices and a
dedicated Chassis/HGX_GPU_SXM_N/PCIeDevices. gpuDocDedupKey was keying
by @odata.id path, so identical GPUs with the same serial were not
deduplicated across sources. Now stable identifiers (serial → BDF →
slot+model) take priority over path.

Also includes Inspur parser improvements: NVMe model/serial enrichment
from devicefrusdr.log and audit.log, RAID drive slot normalization to
BP notation, PSU slot normalization, BMC/CPLD/VR firmware from RESTful
version info section, and parser version bump to 1.8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 14:44:36 +03:00
Mikhail Chusavitin
62d6ad6f66 ui: deduplicate files by name and SHA-256 hash before batch convert
On folder selection, filter out duplicate files before conversion:
- First pass: same basename → skip (same filename in different subdirs)
- Second pass: same SHA-256 hash → skip (identical content, different path)

Duplicates are excluded from the convert queue and shown as a warning
in the summary with reason (same name / same content).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 12:45:09 +03:00
Mikhail Chusavitin
f09344e288 dell: filter chipset/embedded noise from PCIe device list
Skip FQDD prefixes that are internal AMD EPYC fabric or devices
already captured with richer data from other DCIM views:
- HostBridge/P2PBridge/ISABridge/SMBus.Embedded: AMD internal bus
- AHCI.Embedded: AMD FCH SATA (chipset, not a slot)
- Video.Embedded: BMC Matrox G200eW3, not user-visible
- NIC.Embedded: duplicates DCIM_NICView entries (no model/MAC in PCIe view)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 12:09:40 +03:00
19d857b459 redfish: filter PCIe topology noise, deduplicate GPU/NIC cross-sources
- isUnidentifiablePCIeDevice: skip PCIe entries with generic class
  (SingleFunction/MultiFunction) and no model/serial/VendorID — eliminates
  PCH bridges, root ports and other bus infrastructure that MSI BMC
  enumerates exhaustively (59→9 entries on CG480-S5063)
- collectPCIeDevices: skip entries where looksLikeGPU — prevents GPU
  devices from appearing in both hw.GPUs and hw.PCIeDevices (fixed
  Inspur H100 duplicate)
- dedupeCanonicalDevices: secondary model+manufacturer match for noKey
  items (no serial, no BDF) — merges NetworkAdapter entries into
  matching PCIe device entries; isGenericDeviceClass helper for
  DeviceClass identity check (fixed Inspur ENFI1100-T4 duplicate)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 22:08:02 +03:00
8d80048117 redfish: MSI support, fix zero dates, BMC MAC, Assembly FRU, crawler cleanup
- Add MSI CG480-S5063 (H100 SXM5) support:
  - collectGPUsFromProcessors: find GPUs via Processors/ProcessorType=GPU,
    resolve serials from Chassis/<GpuId>
  - looksLikeGPU: skip Description="Display Device" PCIe sidecars
  - isVirtualStorageDrive: filter AMI virtual USB drives (0-byte)
  - enrichNICMACsFromNetworkDeviceFunctions: pull MACs for MSI NICs
  - parseCPUs: filter by ProcessorType, parse Socket, L1/L2/L3 from ProcessorMemory
  - parseMemory: Location.PartLocation.ServiceLabel slot fallback
  - shouldCrawlPath: block /SubProcessors subtrees
- Fix status_checked_at/status_changed_at serializing as 0001-01-01:
  change all StatusCheckedAt/StatusChangedAt fields to *time.Time
- Redfish crawler cleanup:
  - Block non-inventory branches: AccountService, CertificateService,
    EventService, Registries, SessionService, TaskService, manager config paths,
    OperatingConfigs, BootOptions, HostPostCode, Bios/Settings, OEM KVM paths
  - Add Assembly to critical endpoints (FRU data)
  - Remove BootOptions from priority seeds
- collectBMCMAC: read BMC MAC from Managers/*/EthernetInterfaces
- collectAssemblyFRU: extract FRU serial/part from Chassis/*/Assembly
- Firmware: remove NetworkProtocol noise, fix SecureBoot field,
  filter BMCImageN redundant backup slots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 08:12:17 +03:00
21ea129933 misc: sds format support, convert limits, dell dedup, supermicro removal, bible updates
Parser / archive:
- Add .sds extension as tar-format alias (archive.go)
- Add tests for multipart upload size limits (multipart_limits_test.go)
- Remove supermicro crashdump parser (ADL-015)

Dell parser:
- Remove GPU duplicates from PCIeDevices (DCIM_VideoView vs DCIM_PCIDeviceView
  both list the same GPU; VideoView record is authoritative)

Server:
- Add LOGPILE_CONVERT_MAX_MB env var for independent convert batch size limit
- Improve "file too large" error message with current limit value

Web:
- Add CONVERT_MAX_FILES_PER_BATCH = 1000 cap
- Minor UI copy and CSS fixes

Bible:
- bible-local/06-parsers.md: add pci.ids enrichment rule (enrich model from
  pciids when name is empty but vendor_id+device_id are present)
- Sync bible submodule and local overview/architecture docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:23:44 +03:00
9c5512d238 dell: strip MAC from model names; fix device-bound firmware in dell/inspur
- Dell NICView: strip " - XX:XX:XX:XX:XX:XX" suffix from ProductName
  (Dell TSR embeds MAC in this field for every NIC port)
- Dell SoftwareIdentity: same strip applied to ElementName; store FQDD
  in FirmwareInfo.Description so exporter can filter device-bound entries
- Exporter: add isDeviceBoundFirmwareFQDD() to filter firmware entries
  whose Description matches NIC./PSU./Disk./RAID.Backplane./GPU. FQDD
  prefixes (prevents device firmware from appearing in hardware.firmware)
- Exporter: extend isDeviceBoundFirmwareName() to filter HGX GPU/NVSwitch
  firmware inventory IDs (_fw_gpu_, _fw_nvswitch_, _inforom_gpu_)
- Inspur: remove HDD firmware from Hardware.Firmware — already present
  in Storage.Firmware, duplicating it violates ADL-016
- bible-local/06-parsers.md: document firmware and MAC stripping rules
- bible-local/10-decisions.md: add ADL-016 (device-bound firmware) and
  ADL-017 (vendor-embedded MAC in model name fields)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:07:53 +03:00
206496efae unraid: parse dimm/nic/pcie and annotate duplicate serials 2026-03-01 18:14:45 +03:00
7d1a02cb72 Add H3C G5/G6 parsers with PSU and NIC extraction 2026-03-01 17:08:11 +03:00
070971685f Update bible paths kit/ → rules/
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 16:57:52 +03:00
78806f9fa0 Add shared bible submodule, rename local bible to bible-local
- Add bible.git as submodule at bible/
- Move docs/bible/ → bible-local/ (project-specific architecture)
- Update CLAUDE.md to reference both bible/ and bible-local/
- Add AGENTS.md for Codex with same structure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 16:38:57 +03:00
4940cd9645 sync file-type support across upload/convert and fix collected_at timezone handling 2026-02-28 23:27:49 +03:00
736b77f055 server: infer archive collected_at from source events 2026-02-28 22:18:47 +03:00
0252264ddc parser: fallback zone-less source timestamps to Europe/Moscow 2026-02-28 22:17:00 +03:00
25e3b8bb42 Add convert mode batch workflow with full progress 2026-02-28 21:44:36 +03:00
bb4505a249 docs: track collection speed/metrics by server model 2026-02-28 19:27:53 +03:00
2fa4a1235a collector/redfish: make prefetch/post-probe adaptive with metrics 2026-02-28 19:05:34 +03:00
fe5da1dbd7 Fix NIC port count handling and apply pending exporter updates 2026-02-28 18:42:01 +03:00
612058ed16 redfish: optimize snapshot/plan-b crawl and add timing diagnostics 2026-02-28 17:56:04 +03:00
e0146adfff Improve Redfish recovery flow and raw export timing diagnostics 2026-02-28 16:55:58 +03:00
9a30705c9a improve redfish collection progress and robust hardware dedup/serial parsing 2026-02-28 16:07:42 +03:00
8dbbec3610 optimize redfish post-probe and add eta progress 2026-02-28 15:41:44 +03:00
4c60ebbf1d collector/redfish: remove pre-snapshot critical duplicate pass 2026-02-28 15:28:24 +03:00
c52fea2fec collector/redfish: emit critical warmup branch and eta progress 2026-02-28 15:21:49 +03:00
dae4744eb3 ui: show latest collect branch/eta message instead of generic running text 2026-02-28 15:19:36 +03:00
b6ff47fea8 collector/redfish: skip deep DIMM subresources and remove memory from critical warmup 2026-02-28 15:16:04 +03:00
1d282c4196 collector/redfish: collect and parse platform model fallback 2026-02-28 14:54:55 +03:00
f35cabac48 collector/redfish: fix server model fallback and GPU/NVMe regressions 2026-02-28 14:50:02 +03:00
a2c9e9a57f collector/redfish: add ETA estimates to snapshot and plan-B progress 2026-02-28 14:36:18 +03:00
b918363252 collector/redfish: dedupe model-only GPU rows from graphics controllers 2026-02-28 13:04:34 +03:00
6c19a58b24 collector/redfish: expand endpoint coverage and timestamp collect logs 2026-02-28 12:59:57 +03:00
9aadf2f1e9 collector/redfish: improve GPU SN/model fallback and warnings 2026-02-28 12:52:22 +03:00
Mikhail Chusavitin
ddab93a5ee Add release notes for v1.7.0 2026-02-25 13:31:54 +03:00
Mikhail Chusavitin
000199fbdc Add parse errors tab and improve error diagnostics UI 2026-02-25 13:28:19 +03:00
Mikhail Chusavitin
68592da9f5 Harden Redfish collection for slow BMC endpoints 2026-02-25 12:42:43 +03:00
Mikhail Chusavitin
b1dde592ae Expand Redfish best-effort snapshot crawling 2026-02-25 12:24:06 +03:00
Mikhail Chusavitin
693b7346ab Update docs and add release artifacts 2026-02-25 12:17:17 +03:00
Mikhail Chusavitin
a4a1a19a94 Improve Redfish raw replay recovery and GUI diagnostics 2026-02-25 12:16:31 +03:00
Mikhail Chusavitin
66fb90233f Unify Redfish analysis through raw replay and add storage volumes 2026-02-24 18:34:13 +03:00
Mikhail Chusavitin
7a1285db99 Expand Redfish storage fallback for enclosure Disk.Bay paths 2026-02-24 18:25:00 +03:00
Mikhail Chusavitin
144d298efa Show total current PSU power and rely on server voltage status 2026-02-24 18:22:38 +03:00
Mikhail Chusavitin
a6c90b6e77 Probe Supermicro NVMe Disk.Bay endpoints for drive inventory 2026-02-24 18:22:02 +03:00
Mikhail Chusavitin
2e348751f3 Use 230V nominal range for PSU voltage sensor highlighting 2026-02-24 18:07:34 +03:00
Mikhail Chusavitin
15dc86a0e4 Add PSU voltage sensors with 220V range highlighting 2026-02-24 18:05:26 +03:00
Mikhail Chusavitin
752b063613 Increase upload multipart limit for raw export bundles 2026-02-24 17:42:49 +03:00
Mikhail Chusavitin
6f66a8b2a1 Raise Redfish snapshot crawl limit and prioritize PCIe paths 2026-02-24 17:41:37 +03:00
Mikhail Chusavitin
ce30f943df Export raw bundles with collection logs and parser field snapshot 2026-02-24 17:36:44 +03:00
Mikhail Chusavitin
810c4b5ff9 Add raw export reanalyze flow for Redfish snapshots 2026-02-24 17:23:26 +03:00
Mikhail Chusavitin
5d9e9d73de Fix Redfish snapshot crawl deadlock and add debug progress 2026-02-24 16:22:37 +03:00
38cc051f23 docs: consolidate architecture docs into bible 2026-02-23 17:51:25 +03:00
Mikhail Chusavitin
fcd57c1ba9 docs: introduce project Bible and consolidate all architecture documentation
- Create docs/bible/ with 10 structured chapters (overview, architecture,
  API, data models, collectors, parsers, exporters, build, testing, decisions)
- All documentation in English per ADL-007
- Record all existing architectural decisions in docs/bible/10-decisions.md
- Slim README.md to user-facing quick start only
- Replace CLAUDE.md with a single directive to read and follow the Bible
- Remove absorbed files: REANIMATOR_EXPORT.md, docs/INTEGRATION_GUIDE.md,
  and all vendor parser README.md files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 14:15:35 +03:00
Mikhail Chusavitin
82ee513835 Add release build script 2026-02-20 14:04:21 +03:00
de5521a4e5 Introduce canonical hardware.devices repository and align UI/Reanimator exports 2026-02-17 19:07:18 +03:00
a82b55b144 docs: add release notes for v1.5.0 2026-02-17 18:11:18 +03:00
758fa66282 feat: improve inspur parsing and pci.ids integration 2026-02-17 18:09:36 +03:00
b33cca5fcc nvidia: improve component mapping, firmware, statuses and check times 2026-02-16 23:17:13 +03:00
514da76ddb Update Inspur parsing and align release docs 2026-02-15 23:13:47 +03:00
c13788132b Add release script and release notes (no artifacts) 2026-02-15 22:23:53 +03:00
5e49adaf05 Update parser and project changes 2026-02-15 22:02:07 +03:00
c7b2a7ab29 Fix NVIDIA GPU/NVSwitch parsing and Reanimator export statuses 2026-02-15 21:00:30 +03:00
0af3cee9b6 Add integration guide, example generator, and built binary 2026-02-15 20:08:46 +03:00
8715fcace4 Align Reanimator export with updated integration guide 2026-02-15 20:06:36 +03:00
1b1bc74fc7 Add Reanimator format export support
Implement export to Reanimator format for asset tracking integration.

Features:
- New API endpoint: GET /api/export/reanimator
- Web UI button "Экспорт Reanimator" in Configuration tab
- Auto-detect CPU manufacturer (Intel/AMD/ARM/Ampere)
- Generate PCIe serial numbers if missing
- Merge GPUs and NetworkAdapters into pcie_devices
- Filter components without serial numbers
- RFC3339 timestamp format
- Full compliance with Reanimator specification

Changes:
- Add reanimator_models.go: data models for Reanimator format
- Add reanimator_converter.go: conversion functions
- Add reanimator_converter_test.go: unit tests
- Add reanimator_integration_test.go: integration tests
- Update handlers.go: add handleExportReanimator
- Update server.go: register /api/export/reanimator route
- Update index.html: add export button
- Update CLAUDE.md: document export behavior
- Add REANIMATOR_EXPORT.md: implementation summary

Tests: All tests passing (15+ new tests)
Format spec: example/docs/INTEGRATION_GUIDE.md

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-12 21:54:37 +03:00
77e25ddc02 Fix NVIDIA GPU serial number format extraction
Extract decimal serial numbers from devname parameters (e.g., "SXM5_SN_1653925027099")
instead of hex PCIe Device Serial Numbers. This provides the correct GPU serial
numbers as they appear in NVIDIA diagnostics tooling.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 22:57:50 +03:00
bcce975fd6 Add GPU serial number extraction for NVIDIA diagnostics
Parse inventory/output.log to extract GPU serial numbers from lspci output,
expose them via serials API, and add GPU category to web UI.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-10 22:50:46 +03:00
8b065c6cca Harden zip reader and syslog scan 2026-02-06 00:03:25 +03:00
aa22034944 Add Unraid diagnostics parser and fix zip upload support
Implements comprehensive parser for Unraid diagnostics archives with support for:
- System information (OS version, BIOS, motherboard)
- CPU details from lscpu (model, cores, threads, frequency)
- Memory information
- Storage devices with SMART data integration
- Temperature sensors from disk array
- System event logs

Parser intelligently merges data from multiple sources:
- SMART files provide detailed disk information (model, S/N, firmware)
- vars.txt provides disk configuration and filesystem types
- Deduplication ensures clean results

Also fixes critical bug where zip archives could not be uploaded via web interface
due to missing extractZipFromReader implementation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-05 23:54:55 +03:00
Mikhail Chusavitin
7d9135dc63 Merge branch 'main' of https://git.mchus.pro/mchus/logpile 2026-02-05 15:16:36 +03:00
Mikhail Chusavitin
80e726d756 chore: remove unused local test and build artifacts 2026-02-05 15:15:01 +03:00
147 changed files with 79259 additions and 2831 deletions

7
.gitignore vendored
View File

@@ -62,3 +62,10 @@ go.work.sum
# Distribution binaries
dist/
# Release artifacts
release/
releases/
releases/**/SHA256SUMS.txt
releases/**/*.tar.gz
releases/**/*.zip

9
.gitmodules vendored Normal file
View File

@@ -0,0 +1,9 @@
[submodule "third_party/pciids"]
path = third_party/pciids
url = https://github.com/pciutils/pciids.git
[submodule "bible"]
path = bible
url = https://git.mchus.pro/mchus/bible.git
[submodule "internal/chart"]
path = internal/chart
url = https://git.mchus.pro/reanimator/chart.git

11
AGENTS.md Normal file
View File

@@ -0,0 +1,11 @@
# LOGPile — Instructions for Codex
## Shared Engineering Rules
Read `bible/` — shared rules for all projects (CSV, logging, DB, tables, background tasks, code style).
Start with `bible/rules/patterns/` for specific contracts.
## Project Architecture
Read `bible-local/` — LOGPile specific architecture.
Read order: `bible-local/README.md``01-overview.md``02-architecture.md``04-data-models.md` → relevant file(s) for the task.
Every architectural decision specific to this project must be recorded in `bible-local/10-decisions.md`.

100
CLAUDE.md
View File

@@ -1,95 +1,11 @@
# LOGPile - Engineering Notes (for Claude/Codex)
# LOGPile — Instructions for Claude
## Project summary
## Shared Engineering Rules
Read `bible/` — shared rules for all projects (CSV, logging, DB, tables, background tasks, code style).
Start with `bible/rules/patterns/` for specific contracts.
LOGPile is a standalone Go app for BMC diagnostics analysis with embedded web UI.
## Project Architecture
Read `bible-local/` — LOGPile specific architecture.
Read order: `bible-local/README.md``01-overview.md``02-architecture.md``04-data-models.md` → relevant file(s) for the task.
Current product modes:
1. Upload and parse vendor archives / JSON snapshots.
2. Collect live data via Redfish and analyze/export it.
## Runtime architecture
- Go + `net/http` (`http.ServeMux`)
- Embedded UI (`web/embed.go`, `//go:embed templates static`)
- In-memory state (`Server.result`, `Server.detectedVendor`)
- Job manager for live collect status/logs
Default port: `8082`.
## Key flows
### Upload flow (`POST /api/upload`)
- Accepts multipart file field `archive`.
- If file looks like JSON, parsed as `models.AnalysisResult` snapshot.
- Otherwise passed to archive parser (`parser.NewBMCParser().ParseFromReader(...)`).
- Result stored in memory and exposed by API/UI.
### Live flow (`POST /api/collect`)
- Validates request (`host/protocol/port/username/auth_type/tls_mode`).
- Runs collector asynchronously with progress callback.
- On success:
- source metadata set (`source_type=api`, protocol/host/date),
- result becomes current in-memory dataset.
- On failed/canceled previous dataset stays unchanged.
## Collectors
Registry: `internal/collector/registry.go`
- `redfish` (real collector):
- dynamic discovery of Systems/Chassis/Managers,
- CPU/RAM/Storage/GPU/PSU/NIC/PCIe/Firmware mapping,
- raw Redfish snapshot (`result.RawPayloads["redfish_tree"]`) for offline future analysis,
- progress logs include active collection stage and snapshot progress.
- `ipmi` is currently a mock collector scaffold.
## Export behavior
Endpoints:
- `/api/export/csv`
- `/api/export/json`
- `/api/export/txt`
Filename pattern for all exports:
`YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
Notes:
- JSON export contains full `AnalysisResult`, including `raw_payloads`.
- TXT export is tabular and mirrors UI sections (no raw JSON section).
## CLI flags (`cmd/logpile/main.go`)
- `--port`
- `--file` (reserved/preload, not active workflow)
- `--version`
- `--no-browser`
- `--hold-on-crash` (default true on Windows) — keeps console open on fatal crash for debugging.
## Build / release
- `make build` -> single local binary (`CGO_ENABLED=0`).
- `make build-all` -> cross-platform binaries.
- Tags/releases are published with `tea`.
- Release notes live in `docs/releases/<tag>.md`.
## Testing expectations
Before merge:
```bash
go test ./...
```
If touching collectors/handlers, prefer adding or updating tests in:
- `internal/collector/*_test.go`
- `internal/server/*_test.go`
## Practical coding guidance
- Keep API contracts stable with frontend (`web/static/js/app.js`).
- When adding Redfish mappings, prefer tolerant/fallback parsing:
- alternate collection paths,
- `@odata.id` references and embedded members,
- deduping by serial/BDF/slot+model.
- Avoid breaking snapshot backward compatibility (`AnalysisResult` JSON shape).
Every architectural decision specific to this project must be recorded in `bible-local/10-decisions.md`.

View File

@@ -1,4 +1,4 @@
.PHONY: build run clean test build-all
.PHONY: build run clean test build-all update-pci-ids
BINARY_NAME=logpile
VERSION=$(shell git describe --tags --always --dirty 2>/dev/null || echo "dev")
@@ -6,6 +6,7 @@ COMMIT=$(shell git rev-parse --short HEAD 2>/dev/null || echo "none")
LDFLAGS=-ldflags "-X main.version=$(VERSION) -X main.commit=$(COMMIT)"
build:
@if [ "$(SKIP_PCI_IDS_UPDATE)" != "1" ]; then ./scripts/update-pci-ids.sh --best-effort; fi
CGO_ENABLED=0 go build $(LDFLAGS) -o bin/$(BINARY_NAME) ./cmd/logpile
run: build
@@ -19,6 +20,7 @@ test:
# Cross-platform builds
build-all: clean
@if [ "$(SKIP_PCI_IDS_UPDATE)" != "1" ]; then ./scripts/update-pci-ids.sh --best-effort; fi
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build $(LDFLAGS) -o bin/$(BINARY_NAME)-linux-amd64 ./cmd/logpile
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build $(LDFLAGS) -o bin/$(BINARY_NAME)-linux-arm64 ./cmd/logpile
CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build $(LDFLAGS) -o bin/$(BINARY_NAME)-darwin-amd64 ./cmd/logpile
@@ -33,3 +35,6 @@ fmt:
lint:
golangci-lint run
update-pci-ids:
./scripts/update-pci-ids.sh --sync-submodule

150
README.md
View File

@@ -1,151 +1,29 @@
# LOGPile
LOGPile — standalone Go-приложение для анализа диагностических данных BMC.
Standalone Go application for BMC diagnostics analysis with an embedded web UI.
Поддерживает два сценария:
1. Загрузка архивов/снапшотов и оффлайн-анализ в веб-интерфейсе.
2. Live-сбор через Redfish API с последующим экспортом и повторной загрузкой оффлайн.
## What it does
## Что умеет
- Parses vendor diagnostic archives into a normalized hardware inventory
- Collects live BMC data via Redfish
- Exports normalized data as CSV, raw re-analysis bundles, and Reanimator JSON
- Runs as a single Go binary with embedded UI assets
- Standalone бинарник с embedded UI (без внешних статических файлов).
- Парсинг vendor-архивов (Supermicro, Inspur/Kaytus, NVIDIA, fallback generic).
- Live-сбор по Redfish (`/api/collect`) с прогрессом и журналом шагов.
- Расширенный Redfish snapshot:
- нормализованные данные (CPU/RAM/Storage/GPU/PSU/NIC/PCIe/Firmware),
- сырой `redfish_tree` для будущего анализа.
- Загрузка JSON snapshot обратно через `/api/upload` для оффлайн-работы.
- Экспорт в CSV / JSON / TXT.
## Documentation
## Требования
- Shared engineering rules: [`bible/README.md`](bible/README.md)
- Project architecture and API contracts: [`bible-local/README.md`](bible-local/README.md)
- Agent entrypoints: [`AGENTS.md`](AGENTS.md), [`CLAUDE.md`](CLAUDE.md)
- Go 1.22+
## Сборка
## Run
```bash
make build
```
Бинарник будет в `bin/logpile`.
Для кросс-сборки:
```bash
make build-all
```
Артефакты:
- `bin/logpile-linux-amd64`
- `bin/logpile-linux-arm64`
- `bin/logpile-darwin-amd64`
- `bin/logpile-darwin-arm64`
- `bin/logpile-windows-amd64.exe`
## Запуск
```bash
./bin/logpile
./bin/logpile --port 8082
./bin/logpile --no-browser
./bin/logpile --version
```
Отладка падений (чтобы консоль не закрывалась):
Default port: `8082`
```bash
./bin/logpile --hold-on-crash
```
## License
> На Windows `--hold-on-crash` включён по умолчанию.
## Форматы загрузки
`POST /api/upload` принимает:
- архивы: `.tar`, `.tar.gz`, `.tgz`
- JSON snapshot (`AnalysisResult`)
## Live Redfish
Запуск live-сбора:
```http
POST /api/collect
```
Пример body:
```json
{
"host": "bmc01.example.local",
"protocol": "redfish",
"port": 443,
"username": "admin",
"auth_type": "password",
"password": "secret",
"tls_mode": "insecure"
}
```
Жизненный цикл задачи:
`queued -> running -> success|failed|canceled`
Статус и прогресс:
- `GET /api/collect/{id}`
- `POST /api/collect/{id}/cancel`
## Экспорт
- `GET /api/export/csv` — серийные номера
- `GET /api/export/json` — полный `AnalysisResult` (включая `raw_payloads`)
- `GET /api/export/txt` — табличный отчёт по разделам UI
Имена экспортируемых файлов:
`YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
Пример:
`2026-02-04 (SYS-421GE-TNHR2) - C8X123456789.json`
## API
```text
POST /api/upload
POST /api/collect
GET /api/collect/{id}
POST /api/collect/{id}/cancel
GET /api/status
GET /api/parsers
GET /api/events
GET /api/sensors
GET /api/config
GET /api/serials
GET /api/firmware
GET /api/export/csv
GET /api/export/json
GET /api/export/txt
DELETE /api/clear
POST /api/shutdown
```
`/api/status` и `/api/config` содержат метаданные источника:
- `source_type`: `archive` | `api`
- `protocol`: `redfish` | `ipmi` (для архивов может быть пустым)
- `target_host`
- `collected_at`
## Структура
```text
cmd/logpile/main.go # entrypoint
internal/collector/ # live collectors (redfish, ipmi mock)
internal/parser/ # archive parsers
internal/server/ # HTTP handlers
internal/exporter/ # CSV/JSON/TXT export
internal/models/ # data contracts
web/ # embedded templates/static
```
## Лицензия
MIT — см. `LICENSE`.
MIT (see `LICENSE`)

1
bible Submodule

Submodule bible added at 0c829182a1

View File

@@ -0,0 +1,43 @@
# 01 — Overview
## Purpose
LOGPile is a standalone Go application for BMC diagnostics analysis with an embedded web UI.
It runs as a single binary and normalizes hardware data from archives or live Redfish collection.
## Operating modes
| Mode | Entry point | Outcome |
|------|-------------|---------|
| Archive upload | `POST /api/upload` | Parse a supported archive, raw export bundle, or JSON snapshot into `AnalysisResult` |
| Live collection | `POST /api/collect` | Collect from a live BMC via Redfish and store the result in memory |
| Batch convert | `POST /api/convert` | Convert multiple supported input files into Reanimator JSON in a ZIP artifact |
All modes converge on the same normalized hardware model and exporter pipeline.
## In scope
- Single-binary desktop/server utility with embedded UI
- Vendor archive parsing and live Redfish collection
- Canonical hardware inventory across UI and exports
- Reopenable raw export bundles for future re-analysis
- Reanimator export and batch conversion workflows
- Embedded `pci.ids` lookup for vendor/device name enrichment
## Current vendor coverage
- Dell TSR
- H3C SDS G5/G6
- Inspur / Kaytus
- NVIDIA HGX Field Diagnostics
- NVIDIA Bug Report
- Unraid
- XigmaNAS
- Generic fallback parser
## Non-goals
- Persistent storage or multi-user state
- Production IPMI collection
- Authentication/authorization on the built-in HTTP server
- Long-term server-side job history beyond in-memory process lifetime

View File

@@ -0,0 +1,90 @@
# 02 — Architecture
## Runtime stack
| Layer | Implementation |
|-------|----------------|
| Language | Go 1.22+ |
| HTTP | `net/http` + `http.ServeMux` |
| UI | Embedded templates and static assets via `go:embed` |
| State | In-memory only |
| Build | `CGO_ENABLED=0`, single binary |
Default port: `8082`
Audit result rendering is delegated to embedded `reanimator/chart`, vendored as git submodule `internal/chart`.
LOGPile remains responsible for upload, collection, parsing, normalization, and Reanimator export generation.
## Code map
```text
cmd/logpile/main.go entrypoint and CLI flags
internal/server/ HTTP handlers, jobs, upload/export flows
internal/collector/ live collection and Redfish replay
internal/analyzer/ shared analysis helpers
internal/parser/ archive extraction and parser dispatch
internal/exporter/ CSV and Reanimator conversion
internal/chart/ vendored `reanimator/chart` viewer submodule
internal/models/ stable data contracts
web/ embedded UI assets
```
## Server state
`internal/server.Server` stores:
| Field | Purpose |
|------|---------|
| `result` | Current `AnalysisResult` shown in UI and used by exports |
| `detectedVendor` | Parser/collector identity for the current dataset |
| `rawExport` | Reopenable raw-export package associated with current result |
| `jobManager` | Shared async job state for collect and convert flows |
| `collectors` | Registered live collectors (`redfish`, `ipmi`) |
| `convertOutput` | Temporary ZIP artifacts for batch convert downloads |
State is replaced only on successful upload or successful live collection.
Failed or canceled jobs do not overwrite the previous dataset.
## Main flows
### Upload
1. `POST /api/upload` receives multipart field `archive`
2. JSON inputs are checked for raw-export package or `AnalysisResult` snapshot
3. Non-JSON inputs go through `parser.BMCParser`
4. Archive metadata is normalized onto `AnalysisResult`
5. Result becomes the current in-memory dataset
### Live collect
1. `POST /api/collect` validates request fields
2. Server creates an async job and returns `202 Accepted`
3. Selected collector gathers raw data
4. For Redfish, collector saves `raw_payloads.redfish_tree`
5. Result is normalized, source metadata applied, and state replaced on success
### Batch convert
1. `POST /api/convert` accepts multiple files
2. Each supported file is analyzed independently
3. Successful results are converted to Reanimator JSON
4. Outputs are packaged into a temporary ZIP artifact
5. Client polls job status and downloads the artifact when ready
## Redfish design rule
Live Redfish collection and offline Redfish re-analysis must use the same replay path.
The collector first captures `raw_payloads.redfish_tree`, then the replay logic builds the normalized result.
## PCI IDs lookup
Lookup order:
1. Embedded `internal/parser/vendors/pciids/pci.ids`
2. `./pci.ids`
3. `/usr/share/hwdata/pci.ids`
4. `/usr/share/misc/pci.ids`
5. `/opt/homebrew/share/pciids/pci.ids`
6. Extra paths from `LOGPILE_PCI_IDS_PATH`
Later sources override earlier ones for the same IDs.

199
bible-local/03-api.md Normal file
View File

@@ -0,0 +1,199 @@
# 03 — API Reference
## Conventions
- All endpoints are under `/api/`
- JSON responses are used unless the endpoint downloads a file
- Async jobs share the same status model: `queued`, `running`, `success`, `failed`, `canceled`
- Export filenames use `YYYY-MM-DD (MODEL) - SERIAL.<ext>` when board metadata exists
- Embedded chart viewer routes live under `/chart/` and return HTML/CSS, not JSON
## Input endpoints
### `POST /api/upload`
Uploads one file in multipart field `archive`.
Accepted inputs:
- supported archive/log formats from the parser registry
- `.json` `AnalysisResult` snapshots
- raw-export JSON packages
- raw-export ZIP bundles
Result:
- parses or replays the input
- stores the result as current in-memory state
- returns parsed summary JSON
Related helper:
- `GET /api/file-types` returns `archive_extensions`, `upload_extensions`, and `convert_extensions`
### `POST /api/collect`
Starts a live collection job.
Request body:
```json
{
"host": "bmc01.example.local",
"protocol": "redfish",
"port": 443,
"username": "admin",
"auth_type": "password",
"password": "secret",
"tls_mode": "insecure"
}
```
Supported values:
- `protocol`: `redfish` or `ipmi`
- `auth_type`: `password` or `token`
- `tls_mode`: `strict` or `insecure`
Responses:
- `202` on accepted job creation
- `400` on malformed JSON
- `422` on validation errors
Optional request field:
- `power_on_if_host_off`: when `true`, Redfish collection may power on the host before collection if preflight found it powered off
### `POST /api/collect/probe`
Checks that live API connectivity works and returns host power state before collection starts.
Typical request body is the same as `POST /api/collect`.
Typical response fields:
- `reachable`
- `protocol`
- `host_power_state`
- `host_powered_on`
- `power_control_available`
- `message`
### `GET /api/collect/{id}`
Returns async collection job status, progress, timestamps, and accumulated logs.
### `POST /api/collect/{id}/cancel`
Requests cancellation for a running collection job.
### `POST /api/convert`
Starts a batch conversion job that accepts multiple files under `files[]` or `files`.
Each supported file is parsed independently and converted to Reanimator JSON.
Response fields:
- `job_id`
- `status`
- `accepted`
- `skipped`
- `total_files`
### `GET /api/convert/{id}`
Returns batch convert job status using the same async job envelope as collection.
### `GET /api/convert/{id}/download`
Downloads the ZIP artifact produced by a successful convert job.
## Read endpoints
### `GET /api/status`
Returns source metadata for the current dataset.
If nothing is loaded, response is `{ "loaded": false }`.
Typical fields:
- `loaded`
- `filename`
- `vendor`
- `source_type`
- `protocol`
- `target_host`
- `source_timezone`
- `collected_at`
- `stats`
### `GET /api/config`
Returns the main UI configuration payload, including:
- source metadata
- `hardware.board`
- `hardware.firmware`
- canonical `hardware.devices`
- computed specification lines
### `GET /api/events`
Returns events sorted newest first.
### `GET /api/sensors`
Returns parsed sensors plus synthesized PSU voltage sensors when telemetry is available.
### `GET /api/serials`
Returns serial-oriented inventory built from canonical devices.
### `GET /api/firmware`
Returns firmware-oriented inventory built from canonical devices.
### `GET /api/parse-errors`
Returns normalized parse and collection issues combined from:
- Redfish fetch errors in `raw_payloads`
- raw-export collect logs
- derived partial-inventory warnings
### `GET /api/parsers`
Returns registered parser metadata.
### `GET /api/file-types`
Returns supported file extensions for upload and batch convert.
## Viewer endpoints
### `GET /chart/current`
Renders the current in-memory dataset as Reanimator HTML using embedded `reanimator/chart`.
The server first converts the current result to Reanimator JSON, then passes that snapshot to the viewer.
### `GET /chart/static/...`
Serves embedded `reanimator/chart` static assets.
## Export endpoints
### `GET /api/export/csv`
Downloads serial-number CSV.
### `GET /api/export/json`
Downloads a raw-export artifact for reopen and re-analysis.
Current implementation emits a ZIP bundle containing:
- `raw_export.json`
- `collect.log`
- `parser_fields.json`
### `GET /api/export/reanimator`
Downloads Reanimator JSON built from the current normalized result.
## Management endpoints
### `DELETE /api/clear`
Clears current in-memory dataset, raw export state, and temporary convert artifacts.
### `POST /api/shutdown`
Gracefully shuts down the process after responding.

View File

@@ -0,0 +1,87 @@
# 04 — Data Models
## Core contract: `AnalysisResult`
`internal/models/models.go` defines the shared result passed between parsers, collectors, server handlers, and exporters.
Stability rule:
- do not rename or remove JSON fields from `AnalysisResult`
- additive fields are allowed
- UI and exporter compatibility depends on this shape remaining stable
Key fields:
| Field | Meaning |
|------|---------|
| `filename` | Original upload name or synthesized live source name |
| `source_type` | `archive` or `api` |
| `protocol` | `redfish`, `ipmi`, or empty for archive uploads |
| `target_host` | Hostname or IP for live collection |
| `source_timezone` | Source timezone/offset if known |
| `collected_at` | Canonical collection/upload time |
| `raw_payloads` | Raw source data used for replay or diagnostics |
| `events` | Parsed event timeline |
| `fru` | FRU-derived inventory details |
| `sensors` | Sensor readings |
| `hardware` | Normalized hardware inventory |
## `HardwareConfig`
Main sections:
```text
hardware.board
hardware.devices
hardware.cpus
hardware.memory
hardware.storage
hardware.volumes
hardware.pcie_devices
hardware.gpus
hardware.network_adapters
hardware.network_cards
hardware.power_supplies
hardware.firmware
```
`network_cards` is legacy/alternate source data.
`hardware.devices` is the canonical cross-section inventory.
## Canonical inventory: `hardware.devices`
`hardware.devices` is the single source of truth for device-oriented UI and Reanimator export.
Required rules:
1. UI hardware views must read from `hardware.devices`
2. Reanimator conversion must derive device sections from `hardware.devices`
3. UI/export mismatches are bugs, not accepted divergence
4. New shared device fields belong in `HardwareDevice` first
Deduplication priority:
| Priority | Key |
|----------|-----|
| 1 | usable `serial_number` |
| 2 | `bdf` |
| 3 | keep records separate |
## Raw payloads
`raw_payloads` is authoritative for replayable sources.
Current important payloads:
- `redfish_tree`
- `redfish_fetch_errors`
- `source_timezone`
Normalized hardware fields are derived output, not the long-term source of truth.
## Raw export package
`/api/export/json` produces a reopenable raw-export artifact.
Design rules:
- raw source stays authoritative
- uploads of raw-export artifacts must re-analyze from raw source
- parsed snapshots inside the bundle are diagnostic only

View File

@@ -0,0 +1,87 @@
# 05 — Collectors
Collectors live in `internal/collector/`.
Core files:
- `registry.go` for protocol registration
- `redfish.go` for live collection
- `redfish_replay.go` for replay from raw payloads
- `ipmi_mock.go` for the placeholder IPMI implementation
- `types.go` for request/progress contracts
## Redfish collector
Status: active production path.
Request fields passed from the server:
- `host`
- `port`
- `username`
- `auth_type`
- credential field (`password` or token)
- `tls_mode`
- optional `power_on_if_host_off`
### Core rule
Live collection and replay must stay behaviorally aligned.
If the collector adds a fallback, probe, or normalization rule, replay must mirror it.
### Preflight and host power
- `Probe()` may be used before collection to verify API connectivity and current host `PowerState`
- if the host is off and the user chose power-on, the collector may issue `ComputerSystem.Reset`
with `ResetType=On`
- power-on attempts are bounded and logged
- after a successful power-on, the collector waits an extra stabilization window, then checks
`PowerState` again and only starts collection if the host is still on
- if the collector powered on the host itself for collection, it must attempt to power it back off
after collection completes
- if the host was already on before collection, the collector must not power it off afterward
- if power-on fails, collection still continues against the powered-off host
- all power-control decisions and attempts must be visible in the collection log so they are
preserved in raw-export bundles
### Discovery model
The collector does not rely on one fixed vendor tree.
It discovers and follows Redfish resources dynamically from root collections such as:
- `Systems`
- `Chassis`
- `Managers`
### Stored raw data
Important raw payloads:
- `raw_payloads.redfish_tree`
- `raw_payloads.redfish_fetch_errors`
- `raw_payloads.source_timezone` when available
### Snapshot crawler rules
- bounded by `LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`
- prioritized toward high-value inventory paths
- tolerant of expected vendor-specific failures
- normalizes `@odata.id` values before queueing
### Redfish implementation guidance
When changing collection logic:
1. Prefer alternate-path support over vendor hardcoding
2. Keep expensive probing bounded
3. Deduplicate by serial, then BDF, then location/model fallbacks
4. Preserve replay determinism from saved raw payloads
5. Add tests for both the motivating topology and a negative case
### Known vendor fallbacks
- empty standard drive collections may trigger bounded `Disk.Bay` probing
- `Storage.Links.Enclosures[*]` may be followed to recover physical drives
- `PowerSubsystem/PowerSupplies` is preferred over legacy `Power` when available
## IPMI collector
Status: mock scaffold only.
It remains registered for protocol completeness, but it is not a real collection path.

149
bible-local/06-parsers.md Normal file
View File

@@ -0,0 +1,149 @@
# 06 — Parsers
## Framework
Parsers live in `internal/parser/` and vendor implementations live in `internal/parser/vendors/`.
Core behavior:
- registration uses `init()` side effects
- all registered parsers run `Detect()`
- the highest-confidence parser wins
- generic fallback stays last and low-confidence
`VendorParser` contract:
```go
type VendorParser interface {
Name() string
Vendor() string
Version() string
Detect(files []ExtractedFile) int
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
}
```
## Adding a parser
1. Create `internal/parser/vendors/<vendor>/`
2. Start from `internal/parser/vendors/template/parser.go.template`
3. Implement `Detect()` and `Parse()`
4. Add a blank import in `internal/parser/vendors/vendors.go`
5. Add at least one positive and one negative detection test
## Data quality rules
### System firmware only in `hardware.firmware`
`hardware.firmware` must contain system-level firmware only.
Device-bound firmware belongs on the device record and must not be duplicated at the top level.
### Strip embedded MAC addresses from model names
If a source embeds ` - XX:XX:XX:XX:XX:XX` in a model/name field, remove that suffix before storing it.
### Use `pci.ids` for empty or generic PCI model names
When `vendor_id` and `device_id` are known but the model name is missing or generic, resolve the name via `internal/parser/vendors/pciids`.
## Active vendor coverage
| Vendor ID | Input family | Notes |
|-----------|--------------|-------|
| `dell` | TSR ZIP archives | Broad hardware, firmware, sensors, lifecycle events |
| `h3c_g5` | H3C SDS G5 bundles | INI/XML/CSV-driven hardware and event parsing |
| `h3c_g6` | H3C SDS G6 bundles | Similar flow with G6-specific files |
| `inspur` | onekeylog archives | FRU/SDR plus optional Redis enrichment |
| `nvidia` | HGX Field Diagnostics | GPU- and fabric-heavy diagnostic input |
| `nvidia_bug_report` | `nvidia-bug-report-*.log.gz` | dmidecode, lspci, NVIDIA driver sections |
| `unraid` | Unraid diagnostics/log bundles | Server and storage-focused parsing |
| `xigmanas` | XigmaNAS plain logs | FreeBSD/NAS-oriented inventory |
| `generic` | fallback | Low-confidence text fallback when nothing else matches |
## Practical guidance
- Be conservative with high detect scores
- Prefer filling missing fields over overwriting stronger source data
- Keep parser version constants current when behavior changes
- Any new vendor-specific filtering or dedup logic must ship with tests for that vendor format
**Archive format:** Unraid diagnostics archive contents (text-heavy diagnostics directories).
**Detection:** Combines filename/path markers (`diagnostics-*`, `unraid-*.txt`, `vars.txt`)
with content markers (e.g. `Unraid kernel build`, parity data markers).
**Extracted data (current):**
- Board / BIOS metadata (from motherboard/system files)
- CPU summary (from `lscpu.txt`)
- Memory modules (from diagnostics memory file)
- Storage devices (from `vars.txt` + SMART files)
- Syslog events
---
### H3C SDS G5 (`h3c_g5`)
**Status:** Ready (v1.0.0). Tested on H3C UniServer R4900 G5 SDS archives.
**Archive format:** `.sds` (tar archive)
**Detection:** `hardware_info.ini`, `hardware.info`, `firmware_version.ini`, `user/test*.csv`, plus H3C markers.
**Extracted data (current):**
- Board/FRU inventory (`FRUInfo.ini`, `board_info.ini`)
- Firmware list (`firmware_version.ini`)
- CPU inventory (`hardware_info.ini`)
- Memory DIMM inventory (`hardware_info.ini`)
- Storage inventory (`hardware.info`, `storage_disk.ini`, `NVMe_info.txt`, RAID text enrichments)
- Logical RAID volumes (`raid.json`, `Storage_RAID-*.txt`)
- Sensor snapshot (`sensor_info.ini`)
- SEL events (`user/test.csv`, `user/test1.csv`, fallback `Sel.json` / `sel_list.txt`)
---
### H3C SDS G6 (`h3c_g6`)
**Status:** Ready (v1.0.0). Tested on H3C UniServer R4700 G6 SDS archives.
**Archive format:** `.sds` (tar archive)
**Detection:** `CPUDetailInfo.xml`, `MemoryDetailInfo.xml`, `firmware_version.json`, `Sel.json`, plus H3C markers.
**Extracted data (current):**
- Board/FRU inventory (`FRUInfo.ini`, `board_info.ini`)
- Firmware list (`firmware_version.json`)
- CPU inventory (`CPUDetailInfo.xml`)
- Memory DIMM inventory (`MemoryDetailInfo.xml`)
- Storage inventory + capacity/model/interface (`storage_disk.ini`, `Storage_RAID-*.txt`, `NVMe_info.txt`)
- Logical RAID volumes (`raid.json`, fallback from `Storage_RAID-*.txt` when available)
- Sensor snapshot (`sensor_info.ini`)
- SEL events (`user/Sel.json`, fallback `user/sel_list.txt`)
---
### Generic text fallback (`generic`)
**Status:** Ready (v1.0.0).
**Confidence:** 15 (lowest — only matches if no other parser scores higher)
**Purpose:** Fallback for any text file or single `.gz` file not matching a specific vendor.
**Behavior:**
- If filename matches `nvidia-bug-report-*.log.gz`: extracts driver version and GPU list.
- Otherwise: confirms file is text (not binary) and records a basic "Text File" event.
---
## Supported vendor matrix
| Vendor | ID | Status | Tested on |
|--------|----|--------|-----------|
| Dell TSR | `dell` | Ready | TSR nested zip archives |
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
| Unraid | `unraid` | Ready | Unraid diagnostics archives |
| XigmaNAS | `xigmanas` | Ready | FreeBSD NAS logs |
| H3C SDS G5 | `h3c_g5` | Ready | H3C UniServer R4900 G5 SDS archives |
| H3C SDS G6 | `h3c_g6` | Ready | H3C UniServer R4700 G6 SDS archives |
| Generic fallback | `generic` | Ready | Any text file |

View File

@@ -0,0 +1,93 @@
# 07 — Exporters
## Export surfaces
| Endpoint | Output | Purpose |
|----------|--------|---------|
| `GET /api/export/csv` | CSV | Serial-number export |
| `GET /api/export/json` | raw-export ZIP bundle | Reopen and re-analyze later |
| `GET /api/export/reanimator` | JSON | Reanimator hardware payload |
| `POST /api/convert` | async ZIP artifact | Batch archive-to-Reanimator conversion |
## Raw export
Raw export is not a final report dump.
It is a replayable artifact that preserves enough source data for future parser improvements.
Current bundle contents:
- `raw_export.json`
- `collect.log`
- `parser_fields.json`
Design rules:
- raw source is authoritative
- uploads of raw export must replay from raw source
- parsed snapshots inside the bundle are diagnostic only
## Reanimator export
Implementation files:
- `internal/exporter/reanimator_models.go`
- `internal/exporter/reanimator_converter.go`
- `internal/server/handlers.go`
- `bible-local/docs/hardware-ingest-contract.md`
Conversion rules:
- canonical source is merged canonical inventory derived from `hardware.devices` plus legacy hardware slices
- output must conform to the strict Reanimator ingest contract in `docs/hardware-ingest-contract.md`
- local mirror currently tracks upstream contract `v2.7`
- timestamps are RFC3339
- status is normalized to Reanimator-friendly values
- missing component serial numbers must stay absent; LOGPile must not synthesize fake serials for Reanimator export
- CPU `firmware` field means CPU microcode, not generic processor firmware inventory
- `NULL`-style board manufacturer/product values are treated as absent
- optional component telemetry/health fields are exported when LOGPile already has the data
- partial `hardware.devices` must not suppress components still present only in legacy parser/collector fields
- `present` is not serialized for exported components; presence is expressed by the existence of the component record itself
- Reanimator ingest may apply its own server-side fallback serial rules for CPU and PCIe when LOGPile leaves serials absent
## Inclusion rules
Included:
- PCIe-class devices when the component itself is present, even if serial number is missing
- contract `v2.7` component telemetry and health fields when source data exists
- hardware sensors grouped into `fans`, `power`, `temperatures`, `other` only when the sensor has a real numeric reading
- sensor `location` is not exported; LOGPile keeps only sensor `name` plus measured values and status
- Redfish linked metric docs that carry component telemetry: `ProcessorMetrics`, `MemoryMetrics`, `DriveMetrics`, `EnvironmentMetrics`, `Metrics`
- `pcie_devices.slot` is treated as the canonical PCIe address; `bdf` is used only as an internal fallback/dedupe key and is not serialized in the payload
- `event_logs` are exported only from normalized parser/collector events that can be mapped to contract sources `host` / `bmc` / `redfish` without synthesizing content
- `manufactured_year_week` is exported only as a reliable passthrough when the parser/collector already extracted a valid `YYYY-Www` value
Excluded:
- storage endpoints from `pcie_devices`; disks and NVMe drives export only through `hardware.storage`
- fake serial numbers for PCIe-class devices; any fallback serial generation belongs to Reanimator ingest, not LOGPile
- sensors without a real numeric reading
- events with internal-only or unmappable sources such as LOGPile internal warnings
- memory with missing serial number
- memory with `present=false` or `status=Empty`
- CPUs with `present=false`
- storage without `serial_number`
- storage with `present=false`
- power supplies without `serial_number`
- power supplies with `present=false`
- non-present network adapters
- non-present PCIe / GPU devices
- device-bound firmware duplicated at top-level firmware list
- any field not present in the strict ingest contract
## Batch convert
`POST /api/convert` accepts multiple supported files and produces a ZIP with:
- one `*.reanimator.json` file per successful input
- `convert-summary.txt`
Behavior:
- unsupported filenames are skipped
- each file is parsed independently
- one bad file must not fail the whole batch if at least one conversion succeeds
- result artifact is temporary and deleted after download
## CSV export
`GET /api/export/csv` uses the same merged canonical inventory as Reanimator export,
with legacy network-card fallback kept only for records that still have no canonical device match.

View File

@@ -0,0 +1,81 @@
# 08 — Build & Release
## CLI flags
Defined in `cmd/logpile/main.go`:
| Flag | Default | Purpose |
|------|---------|---------|
| `--port` | `8082` | HTTP server port |
| `--file` | empty | Preload archive file |
| `--version` | `false` | Print version and exit |
| `--no-browser` | `false` | Do not auto-open browser |
| `--hold-on-crash` | `true` on Windows | Keep console open after fatal crash |
## Common commands
```bash
make build
make build-all
make test
make fmt
make update-pci-ids
```
Notes:
- `make build` outputs `bin/logpile`
- `make build-all` builds the supported cross-platform binaries
- `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort` unless `SKIP_PCI_IDS_UPDATE=1`
## PCI IDs
Source submodule: `third_party/pciids`
Embedded copy: `internal/parser/vendors/pciids/pci.ids`
Typical setup after clone:
```bash
git submodule update --init third_party/pciids
```
## Release script
Run:
```bash
./scripts/release.sh
```
Current behavior:
1. Reads version from `git describe --tags`
2. Refuses a dirty tree unless `ALLOW_DIRTY=1`
3. Sets stable Go cache/toolchain environment
4. Creates `releases/{VERSION}/`
5. Creates a release-notes template if missing
6. Builds `darwin-arm64` and `windows-amd64`
7. Packages any already-present binaries from `bin/`
8. Generates `SHA256SUMS.txt`
Important limitation:
- `scripts/release.sh` does not run `make build-all` for you
- if you want Linux or additional macOS archives in the release directory, build them before running the script
Toolchain note:
- `scripts/release.sh` defaults `GOTOOLCHAIN=local` to use the already installed Go toolchain and avoid implicit network downloads during release builds
- if you intentionally want another toolchain, pass it explicitly, for example `GOTOOLCHAIN=go1.24.0 ./scripts/release.sh`
## Run locally
```bash
./bin/logpile
./bin/logpile --port 9090
./bin/logpile --no-browser
./bin/logpile --version
```
## macOS Gatekeeper
```bash
xattr -d com.apple.quarantine /path/to/logpile-darwin-arm64
```

54
bible-local/09-testing.md Normal file
View File

@@ -0,0 +1,54 @@
# 09 — Testing
## Baseline
Required before merge:
```bash
go test ./...
```
## Test locations
| Area | Location |
|------|----------|
| Collectors and replay | `internal/collector/*_test.go` |
| HTTP handlers and jobs | `internal/server/*_test.go` |
| Exporters | `internal/exporter/*_test.go` |
| Vendor parsers | `internal/parser/vendors/<vendor>/*_test.go` |
## General rules
- Prefer table-driven tests
- No network access in unit tests
- Cover happy path and realistic failure/partial-data cases
- New vendor parsers need both detection and parse coverage
## Mandatory coverage for dedup/filter/classify logic
Any new deduplication, filtering, or classification function must have:
1. A true-positive case
2. A true-negative case
3. A regression case for the vendor or topology that motivated the change
This is mandatory for inventory logic, firmware filtering, and similar code paths where silent data drift is likely.
## Mandatory coverage for expensive path selection
Any function that decides whether to crawl or probe an expensive path must have:
1. A positive selection case
2. A negative exclusion case
3. A topology-level count/integration case
The goal is to catch runaway I/O regressions before they ship.
## Useful focused commands
```bash
go test ./internal/exporter/...
go test ./internal/collector/...
go test ./internal/server/...
go test ./internal/parser/vendors/...
```

606
bible-local/10-decisions.md Normal file
View File

@@ -0,0 +1,606 @@
# 10 — Architectural Decision Log (ADL)
> **Rule:** Every significant architectural decision **must be recorded here** before or alongside
> the code change. This applies to humans and AI assistants alike.
>
> Format: date · title · context · decision · consequences
---
## ADL-001 — In-memory only state (no database)
**Date:** project start
**Context:** LOGPile is designed as a standalone diagnostic tool, not a persistent service.
**Decision:** All parsed/collected data lives in `Server.result` (in-memory). No database, no files written.
**Consequences:**
- Data is lost on process restart — intentional.
- Simple deployment: single binary, no setup required.
- JSON export is the persistence mechanism for users who want to save results.
---
## ADL-002 — Vendor parser auto-registration via init()
**Date:** project start
**Context:** Need an extensible parser registry without a central factory function.
**Decision:** Each vendor parser registers itself in its package's `init()` function.
`vendors/vendors.go` holds blank imports to trigger registration.
**Consequences:**
- Adding a new parser requires only: implement interface + add one blank import.
- No central list to maintain (other than the import file).
- `go test ./...` will include new parsers automatically.
---
## ADL-003 — Highest-confidence parser wins
**Date:** project start
**Context:** Multiple parsers may partially match an archive (e.g. generic + specific vendor).
**Decision:** Run all parsers' `Detect()`, select the one returning the highest score (0100).
**Consequences:**
- Generic fallback (score 15) only activates when no vendor parser scores higher.
- Parsers must be conservative with high scores (70+) to avoid false positives.
---
## ADL-004 — Canonical hardware.devices as single source of truth
**Date:** v1.5.0
**Context:** UI tabs and Reanimator exporter were reading from different sub-fields of
`AnalysisResult`, causing potential drift.
**Decision:** Introduce `hardware.devices` as the canonical inventory repository.
All UI tabs and all exporters must read exclusively from this repository.
**Consequences:**
- Any UI vs Reanimator discrepancy is classified as a bug, not a "known difference".
- Deduplication logic runs once in the repository builder (serial → bdf → distinct).
- New hardware attributes must be added to canonical schema first, then mapped to consumers.
---
## ADL-005 — No hardcoded PCI model strings; use pci.ids
**Date:** v1.5.0
**Context:** NVIDIA and other vendors release new GPU models frequently; hardcoded maps
required code changes for each new model ID.
**Decision:** Use the `pciutils/pciids` database (git submodule, embedded at build time).
PCI vendor/device ID → human-readable model name via lookup.
**Consequences:**
- New GPU models can be supported by updating `pci.ids` without code changes.
- `make build` auto-syncs `pci.ids` from submodule before compilation.
- External override via `LOGPILE_PCI_IDS_PATH` env var.
---
## ADL-006 — Reanimator export uses canonical hardware.devices (not raw sub-fields)
**Date:** v1.5.0
**Context:** Early Reanimator exporter read from `Hardware.GPUs`, `Hardware.NICs`, etc.
directly, diverging from UI data.
**Decision:** Reanimator exporter must use `hardware.devices` — the same source as the UI.
Exporter groups/filters canonical records by section; does not rebuild from sub-fields.
**Consequences:**
- Guarantees UI and export consistency.
- Exporter code is simpler — mainly a filter+map, not a data reconstruction.
---
## ADL-007 — Documentation language is English
**Date:** 2026-02-20
**Context:** Codebase documentation was mixed Russian/English, reducing clarity for
international contributors and AI assistants.
**Decision:** All maintained project documentation (`docs/bible/`, `README.md`,
`CLAUDE.md`, and new technical docs) must be written in English.
**Consequences:**
- Bible is authoritative in English.
- AI assistants get consistent, unambiguous context.
---
## ADL-008 — Bible is the single source of truth for architecture docs
**Date:** 2026-02-23
**Context:** Architecture information was duplicated across `README.md`, `CLAUDE.md`,
and the Bible, creating drift risk and stale guidance for humans and AI agents.
**Decision:** Keep architecture and technical design documentation only in `docs/bible/`.
Top-level `README.md` and `CLAUDE.md` must remain minimal pointers/instructions.
**Consequences:**
- Reduces documentation drift and duplicate updates.
- AI assistants are directed to one authoritative source before making changes.
- Documentation updates that affect architecture must include Bible changes (and ADL entries when significant).
---
## ADL-009 — Redfish analysis is performed from raw snapshot replay (unified tunnel)
**Date:** 2026-02-24
**Context:** Live Redfish collection and raw export re-analysis used different parsing paths,
which caused drift and made bug fixes difficult to validate consistently.
**Decision:** Redfish live collection must produce a `raw_payloads.redfish_tree` snapshot first,
then run the same replay analyzer used for imported raw exports.
**Consequences:**
- Same `redfish_tree` input produces the same parsed result in live and offline modes.
- Debugging parser issues can be done against exported raw bundles without live BMC access.
- Snapshot completeness becomes critical; collector seeds/limits are part of analyzer correctness.
---
## ADL-010 — Raw export is a self-contained re-analysis package (not a final result dump)
**Date:** 2026-02-24
**Context:** Exporting only normalized `AnalysisResult` loses raw source fidelity and prevents
future parser improvements from being applied to already collected data.
**Decision:** `Export Raw Data` produces a self-contained raw package (JSON or ZIP bundle)
that the application can reopen and re-analyze. Parsed data in the package is optional and not
the source of truth on import.
**Consequences:**
- Re-opening an export always re-runs analysis from raw source (`redfish_tree` or uploaded file bytes).
- Raw bundles include collection context and diagnostics for debugging (`collect.log`, `parser_fields.json`).
- Endpoint compatibility is preserved (`/api/export/json`) while actual payload format may be a bundle.
---
## ADL-011 — Redfish snapshot crawler is bounded, prioritized, and failure-tolerant
**Date:** 2026-02-24
**Context:** Full Redfish trees on modern GPU systems are large, noisy, and contain many
vendor-specific or non-fetchable links. Unbounded crawling and naive queue design caused hangs
and incomplete snapshots.
**Decision:** Use a bounded snapshot crawler with:
- explicit document cap (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
- priority seed paths (PCIe/Fabrics/Firmware/Storage/PowerSubsystem/ThermalSubsystem)
- normalized `@odata.id` paths (strip `#fragment`)
- noisy expected error filtering (404/405/410/501 hidden from UI)
- queue capacity sized to crawl cap to avoid producer/consumer deadlock
**Consequences:**
- Snapshot collection remains stable on large BMC trees.
- Most high-value inventory paths are reached before the cap.
- UI progress remains useful while debug logs retain low-level fetch failures.
---
## ADL-012 — Vendor-specific storage inventory probing is allowed as fallback
**Date:** 2026-02-24
**Context:** Some Supermicro BMCs expose empty standard `Storage/.../Drives` collections while
real disk inventory exists under vendor-specific `Disk.Bay` endpoints and enclosure links.
**Decision:** When standard drive collections are empty, collector/replay may probe vendor-style
`.../Drives/Disk.Bay.*` endpoints and follow `Storage.Links.Enclosures[*]` to recover physical drives.
**Consequences:**
- Higher storage inventory coverage on Supermicro HBA/HA-RAID/MRVL/NVMe backplane implementations.
- Replay must mirror the same probing behavior to preserve deterministic results.
- Probing remains bounded (finite candidate set) to avoid runaway requests.
---
## ADL-013 — PowerSubsystem is preferred over legacy Power on newer Redfish implementations
**Date:** 2026-02-24
**Context:** X14+/newer Redfish implementations increasingly expose authoritative PSU data in
`PowerSubsystem/PowerSupplies`, while legacy `/Power` may be incomplete or schema-shifted.
**Decision:** Prefer `Chassis/*/PowerSubsystem/PowerSupplies` as the primary PSU source and use
legacy `Chassis/*/Power` as fallback.
**Consequences:**
- Better compatibility with newer BMC firmware generations.
- Legacy systems remain supported without special-case collector selection.
- Snapshot priority seeds must include `PowerSubsystem` resources.
---
## ADL-014 — Threshold logic lives on the server; UI reflects status only
**Date:** 2026-02-24
**Context:** Duplicating threshold math in frontend and backend creates drift and inconsistent
highlighting (e.g. PSU mains voltage range checks).
**Decision:** Business threshold evaluation (e.g. PSU voltage nominal range) must be computed on
the server; frontend only renders status/flags returned by the API.
**Consequences:**
- Single source of truth for threshold policies.
- UI can evolve visually without re-implementing domain logic.
- API payloads may carry richer status semantics over time.
---
## ADL-015 — Supermicro crashdump archive parser removed from active registry
**Date:** 2026-03-01
**Context:** The Supermicro crashdump parser (`SMC Crash Dump Parser`) produced low-value
results for current workflows and was explicitly rejected as a supported archive path.
**Decision:** Remove `supermicro` vendor parser from active registration and project source.
Do not include it in `/api/parsers` output or parser documentation matrix.
**Consequences:**
- Supermicro crashdump archives (`CDump.txt` format) are no longer parsed by a dedicated vendor parser.
- Such archives fall back to other matching parsers (typically `generic`) unless a new replacement parser is added.
- Reintroduction requires a new parser package and an explicit registry import in `vendors/vendors.go`.
---
## ADL-016 — Device-bound firmware must not appear in hardware.firmware
**Date:** 2026-03-01
**Context:** Dell TSR `DCIM_SoftwareIdentity` lists firmware for every component (NICs,
PSUs, disks, backplanes) in addition to system-level firmware. Naively importing all entries
into `Hardware.Firmware` caused device firmware to appear twice in Reanimator: once in the
device's own record and again in the top-level firmware list.
**Decision:**
- `Hardware.Firmware` contains only system-level firmware (BIOS, BMC/iDRAC, CPLD,
Lifecycle Controller, storage controllers, BOSS).
- Device-bound entries (NIC, PSU, Disk, Backplane, GPU) must not be added to
`Hardware.Firmware`.
- Parsers must store the FQDD (or equivalent slot identifier) in `FirmwareInfo.Description`
so the Reanimator exporter can filter by FQDD prefix.
- The exporter's `isDeviceBoundFirmwareFQDD()` function performs this filter.
**Consequences:**
- Any new parser that ingests a per-device firmware inventory must follow the same rule.
- Device firmware is accessible only via the device's own record, not the firmware list.
---
## ADL-017 — Vendor-embedded MAC addresses must be stripped from model name fields
**Date:** 2026-03-01
**Context:** Dell TSR embeds MAC addresses directly in `ProductName` and `ElementName`
fields (e.g. `"NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08"`).
This caused model names to contain MAC addresses in NIC model, NIC firmware device name,
and potentially other fields.
**Decision:** Strip any ` - XX:XX:XX:XX:XX:XX` suffix from all model/name string fields
at parse time before storing in any model struct. Use the regex
`\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$`.
**Consequences:**
- Model names are clean and consistent across all devices.
- All parsers must apply this stripping to any field used as a device name or model.
- Confirmed affected fields in Dell: `DCIM_NICView.ProductName`, `DCIM_SoftwareIdentity.ElementName`.
---
## ADL-018 — NVMe bay probe must be restricted to storage-capable chassis types
**Date:** 2026-03-12
**Context:** `shouldAdaptiveNVMeProbe` was introduced in `2fa4a12` to recover NVMe drives on
Supermicro BMCs that expose empty `Drives` collections but serve disks at direct `Disk.Bay.N`
paths. The function returns `true` for any chassis with an empty `Members` array. On
Supermicro HGX systems (SYS-A21GE-NBRT and similar) ~35 sub-chassis (GPU, NVSwitch,
PCIeRetimer, ERoT, IRoT, BMC, FPGA) all carry `ChassisType=Module/Component/Zone` and
expose empty `/Drives` collections. Without filtering, each triggered 384 HTTP requests →
13 440 requests ≈ 22 minutes of pure I/O waste per collection.
**Decision:** Before probing `Disk.Bay.N` candidates for a chassis, check its `ChassisType`
via `chassisTypeCanHaveNVMe`. Skip if type is `Module`, `Component`, or `Zone`. Keep probing
for `Enclosure`, `RackMount`, and any unrecognised type (fail-safe).
**Consequences:**
- On HGX systems post-probe NVMe goes from ~22 min to effectively zero.
- NVMe backplane recovery (`Enclosure` type) is unaffected.
- Any new chassis type that hosts NVMe storage is covered by the default `true` path.
- `chassisTypeCanHaveNVMe` and the candidate-selection loop must have unit tests covering
both the excluded types and the storage-capable types (see `TestChassisTypeCanHaveNVMe`
and `TestNVMePostProbeSkipsNonStorageChassis`).
## ADL-019 — isDeviceBoundFirmwareName must cover vendor-specific naming patterns per vendor
**Date:** 2026-03-12
**Context:** `isDeviceBoundFirmwareName` was written to filter Dell-style device firmware names
(`"GPU SomeDevice"`, `"NIC OnboardLAN"`). When Supermicro Redfish FirmwareInventory was added
(`6c19a58`), no Supermicro-specific patterns were added. Supermicro names a NIC entry
`"NIC1 System Slot0 AOM-DP805-IO"` — a digit follows the type prefix directly, bypassing the
`"nic "` (space-terminated) check. 29 device-bound entries leaked into `hardware.firmware` on
SYS-A21GE-NBRT (HGX B200). Commit `9c5512d` attempted a fix by adding `_fw_gpu_` patterns,
but checked `DeviceName` which contains `"Software Inventory"` (from the Redfish `Name` field),
not the firmware inventory ID. The patterns were dead code from the moment they were committed.
**Decision:**
- `isDeviceBoundFirmwareName` must be extended for each new vendor whose FirmwareInventory
naming convention differs from the existing patterns.
- When adding HGX/Supermicro patterns, check that the pattern matches the field value that
`collectFirmwareInventory` actually stores — trace the data path from Redfish doc to
`FirmwareInfo.DeviceName` before writing the condition.
- `TestIsDeviceBoundFirmwareName` must contain at least one case per vendor format.
**Consequences:**
- New vendors with FirmwareInventory support require a test covering both device-bound names
(must return true) and system-level names (must return false) before the code ships.
- The dead `_fw_gpu_` / `_fw_nvswitch_` / `_inforom_gpu_` patterns were replaced with
correct prefix+digit checks (`"gpu" + digit`, `"nic" + digit`) and explicit string checks
(`"nvmecontroller"`, `"power supply"`, `"software inventory"`).
## ADL-020 — Dell TSR device-bound firmware filtered via FQDD; InfiniBand routed to NetworkAdapters
**Date:** 2026-03-15
**Context:** Dell TSR `sysinfo_DCIM_SoftwareIdentity.xml` lists firmware for every installed
component. `parseSoftwareIdentityXML` dumped all of these into `hardware.firmware` without
filtering, so device-bound entries such as `"Mellanox Network Adapter"` (FQDD `InfiniBand.Slot.1-1`)
and `"PERC H755 Front"` (FQDD `RAID.SL.3-1`) appeared in the reanimator export alongside system
firmware like BIOS and iDRAC. Confirmed on PowerEdge R6625 (8VS2LG4).
Additionally, `DCIM_InfiniBandView` was not handled in the parser switch, so Mellanox ConnectX-6
appeared only as a PCIe device with `model: "16x or x16"` (from `DataBusWidth` fallback).
`parseControllerView` called `addFirmware` with description `"storage controller"` instead of the
FQDD, so the FQDD-based filter in the exporter could not remove it.
**Decision:**
1. `isDeviceBoundFirmwareFQDD` extended with `"infiniband."` and `"fc."` prefixes; `"raid.backplane."`
broadened to `"raid."` to cover `RAID.SL.*`, `RAID.Integrated.*`, etc.
2. `DCIM_InfiniBandView` routed to `parseNICView` → device appears as `NetworkAdapter` with correct
firmware, MAC address, and VendorID/DeviceID.
3. `"InfiniBand."` added to `pcieFQDDNoisePrefix` to suppress the duplicate `DCIM_PCIDeviceView`
entry (DataBusWidth-only, no useful data).
4. `parseControllerView` now passes `fqdd` as the `addFirmware` description so the FQDD filter
removes the entry in the exporter.
5. `parsePCIeDeviceView` now prioritises `props["description"]` (chip model, e.g. `"MT28908 Family
[ConnectX-6]"`) over `props["devicedescription"]` (location string) for `pcie.Description`.
6. `convertPCIeDevices` model fallback order: `PartNumber → Description → DeviceClass`.
**Consequences:**
- `hardware.firmware` contains only system-level entries; NIC/RAID/storage-controller firmware
lives on the respective device record.
- `TestParseDellInfiniBandView` and `TestIsDeviceBoundFirmwareFQDD` guard the regression.
- Any future Dell TSR device class whose FQDD prefix is not yet in the prefix list may still leak;
extend `isDeviceBoundFirmwareFQDD` and add a test case when encountered.
---
## ADL-021 — pci.ids enrichment: chip model and vendor resolved from PCI IDs when source data is generic or missing
**Date:** 2026-03-15
**Context:**
Dell TSR `DCIM_InfiniBandView.ProductName` reports a generic marketing name ("Mellanox Network
Adapter") instead of the precise chip identifier ("MT28908 Family [ConnectX-6]"). The actual
chip model is available in `pci.ids` by VendorID:DeviceID (15B3:101B). Vendor name may also be
absent when no `VendorName` / `Manufacturer` property is present.
The general rule was established: *if model is not found in source data but PCI IDs are known,
resolve model from `pci.ids`*. This rule applies broadly across all export paths.
**Decision (two-layer enrichment):**
1. **Parser layer (Dell, `parseNICView`):** When `VendorID != 0 && DeviceID != 0`, prefer
`pciids.DeviceName(vendorID, deviceID)` over the product name from logs. This makes the chip
identifier the primary model for NIC/InfiniBand adapters (more specific than marketing name).
Fill `Vendor` from `pciids.VendorName(vendorID)` when the vendor field is otherwise empty.
Same fallback applied in `parsePCIeDeviceView` for empty `Description`.
2. **Exporter layer (`convertPCIeFromDevices`):** General rule — when `d.Model == ""` after all
legacy fallbacks and `VendorID != 0 && DeviceID != 0`, set `model = pciids.DeviceName(...)`.
Also fill empty `manufacturer` from `pciids.VendorName(...)`. This covers all parsers/sources.
**Consequences:**
- Mellanox InfiniBand slot now reports `model: "MT28908 Family [ConnectX-6]"` and
`manufacturer: "Mellanox Technologies"` in the reanimator export.
- For NICs where pci.ids has no entry, the original product name is kept (pci.ids returns "").
- `TestParseDellInfiniBandView` asserts the model and vendor from pci.ids.
---
## ADL-022 — CPUAffinity parsed into NUMANode for PCIe, NIC, and controller devices
**Date:** 2026-03-15
**Context:**
Dell TSR DCIM view classes report `CPUAffinity` for NIC, InfiniBand, PCIe, and controller
devices. Values are "1", "2" (NUMA node index), or "Not Applicable" (for devices that bridge
both CPUs or have no CPU affinity). This data is needed for topology-aware diagnostics.
**Decision:**
- Add `NUMANode int` (JSON: `"numa_node,omitempty"`) to `models.PCIeDevice`,
`models.NetworkAdapter`, `models.HardwareDevice`, and `ReanimatorPCIe`.
- Parse from `props["cpuaffinity"]` using `parseIntLoose`: numeric values ("1", "2") map
directly; "Not Applicable" returns 0 (omitted via `omitempty`).
- Thread through `buildDevicesFromLegacy` (PCIe and NIC sections) and `convertPCIeFromDevices`.
- `parseControllerView` also parses CPUAffinity since RAID controllers have NUMA affinity.
**Consequences:**
- `numa_node: 1` or `2` appears in reanimator export for devices with known affinity.
- Value 0 / absent means "not reported" — covers both "Not Applicable" and sources that don't
provide CPUAffinity at all.
- `TestParseDellCPUAffinity` verifies numeric values parsed correctly and "Not Applicable"→0.
---
## ADL-023 — Reanimator export must match ingest contract exactly
**Date:** 2026-03-15
**Context:**
LOGPile's Reanimator export had drifted from the strict ingest contract. It emitted fields that
Reanimator does not currently accept (`status_at_collection`, `numa_node`),
while missing fields and sections now present in the contract (`hardware.sensors`,
`pcie_devices[].mac_addresses`). Memory export rules also diverged from the ingest side: empty or
serial-less DIMMs were still exported.
**Decision:**
- Treat the Reanimator ingest contract as the authoritative schema for `GET /api/export/reanimator`.
- Emit only fields present in the current upstream contract revision.
- Add `hardware.sensors`, `pcie_devices[].mac_addresses`, `pcie_devices[].numa_node`, and
upstream-approved component telemetry/health fields.
- Leave out fields that are still not part of the upstream contract.
- Map internal `source_type=archive` to external `source_type=logfile`.
- Skip memory entries that are empty, not present, or missing serial numbers.
- Generate CPU and PCIe serials only in the forms allowed by the contract.
- Mirror the applied contract in `bible-local/docs/hardware-ingest-contract.md`.
**Consequences:**
- Some previously exported diagnostic fields are intentionally dropped from the Reanimator payload
until the upstream contract adds them.
- Internal models may retain richer fields than the current export schema.
- `hardware.devices` is canonical only after merge with legacy hardware slices; partial parser-owned
canonical records must not hide CPUs, memory, storage, NICs, or PSUs still stored in legacy
fields.
- CSV and Reanimator exports must use the same merged canonical inventory to avoid divergent export
contents across surfaces.
- Future exporter changes must update both the code and the mirrored contract document together.
---
## ADL-024 — Component presence is implicit; Redfish linked metrics are part of replay correctness
**Date:** 2026-03-15
**Context:**
The upstream ingest contract allows `present`, but current export semantics do not need to send
`present=true` for populated components. At the same time, several important Redfish component
telemetry fields were only available through linked metric resources such as `ProcessorMetrics`,
`MemoryMetrics`, and `DriveMetrics`. Without collecting and replaying these linked documents,
live collection and raw snapshot replay still underreported component health fields.
**Decision:**
- Do not serialize `present=true` in Reanimator export. Presence is represented by the presence of
the component record itself.
- Do not export component records marked `present=false`.
- Interpret CPU `firmware` in Reanimator payload as CPU microcode.
- Treat Redfish linked metric resources `ProcessorMetrics`, `MemoryMetrics`, `DriveMetrics`,
`EnvironmentMetrics`, and generic `Metrics` as part of analyzer correctness when they are linked
from component resources.
- Replay logic must merge these linked metric resources back into CPU, memory, storage, PCIe, GPU,
NIC, and PSU component `Details` the same way live collection expects them to be used.
**Consequences:**
- Reanimator payloads are smaller and avoid redundant `present=true` noise while still excluding
empty slots and absent components.
- Any future exporter change that reintroduces serialized component presence needs an explicit
contract review.
- Raw Redfish snapshot completeness now includes linked per-component metric resources, not only
top-level inventory members.
- CPU microcode is no longer expected in top-level `hardware.firmware`; it belongs on the CPU
component record.
<!-- Add new decisions below this line using the format above -->
## ADL-025 — Missing serial numbers must remain absent in Reanimator export
**Date:** 2026-03-15
**Context:**
LOGPile previously generated synthetic serial numbers for components that had no real serial in
source data, especially CPUs and PCIe-class devices. This made the payload look richer, but the
serials were not authoritative and could mislead downstream consumers. Reanimator can already
accept missing serials and generate its own internal fallback identifiers when needed.
**Decision:**
- Do not synthesize fake serial numbers in LOGPile's Reanimator export.
- If a component has no real serial in parsed source data, export the serial field as absent.
- This applies to CPUs, PCIe devices, GPUs, NICs, and any other component class unless an
upstream contract explicitly requires a deterministic exporter-generated identifier.
- Any fallback serial generation defined by the upstream contract is ingest-side Reanimator behavior,
not LOGPile exporter behavior.
**Consequences:**
- Exported payloads carry only source-backed serial numbers.
- Fake identifiers such as `BOARD-...-CPU-...` or synthetic PCIe serials are no longer considered
acceptable exporter behavior.
- Any future attempt to reintroduce generated serials requires an explicit contract review and a
new ADL entry.
---
## ADL-026 — Live Redfish collection uses explicit preflight host-power confirmation
**Date:** 2026-03-15
**Context:**
Live Redfish inventory can be incomplete when the managed host is powered off. At the same time,
LOGPile must not silently power on a host without explicit user choice. The collection workflow
therefore needs a preflight step that verifies connectivity, shows current host power state to the
user, and only powers on the host when the user explicitly chose that path.
**Decision:**
- Add a dedicated live preflight API step before collection starts.
- UI first runs connectivity and power-state check, then offers:
- collect as-is
- power on and collect
- if the host is off and the user does not answer within 5 seconds, default to collecting without
powering the host on
- Redfish collection may power on the host only when the request explicitly sets
`power_on_if_host_off=true`
- when LOGPile powers on the host for collection, it must try to power the host back off after
collection completes
- if LOGPile did not power the host on itself, it must never power the host off
- all preflight and power-control steps must be logged into the collection log and therefore into
the raw-export bundle
**Consequences:**
- Live collection becomes a two-step UX: probe first, collect second.
- Raw bundles preserve operator-visible evidence of power-state decisions and power-control attempts.
- Power-on failures do not block collection entirely; they only downgrade completeness expectations.
---
## ADL-027 — Sensors without numeric readings are not exported
**Date:** 2026-03-15
**Context:**
Some parsed sensor records carry only a name, unit, or status, but no actual numeric reading. Such
records are not useful as telemetry in Reanimator export and create noisy, low-value sensor lists.
**Decision:**
- Do not export temperature, power, fan, or other sensor records unless they carry a real numeric
measurement value.
- Presence of a sensor name or health/status alone is not sufficient for export.
**Consequences:**
- Exported sensor groups contain only actionable telemetry.
- Parsers and collectors may still keep non-numeric sensor artifacts internally for diagnostics, but
Reanimator export must filter them out.
---
## ADL-028 — Reanimator PCIe export excludes storage endpoints and synthetic serials
**Date:** 2026-03-15
**Context:**
Some Redfish and archive sources expose NVMe drives both as storage inventory and as PCIe-visible
endpoints. Exporting such drives in both `hardware.storage` and `hardware.pcie_devices` creates
duplicates without adding useful topology value. At the same time, PCIe-class export still had old
fallback behavior that generated synthetic serial numbers when source serials were absent.
**Decision:**
- Export disks and NVMe drives only through `hardware.storage`.
- Do not export storage endpoints as `hardware.pcie_devices`, even if the source inventory exposes
them as PCIe/NVMe devices.
- Keep real PCIe storage controllers such as RAID and HBA adapters in `hardware.pcie_devices`.
- Do not synthesize PCIe/GPU/NIC serial numbers in LOGPile; missing serials stay absent.
- Treat placeholder names such as `Network Device View` as non-authoritative and prefer resolved
device names when stronger data exists.
**Consequences:**
- Reanimator payloads no longer duplicate NVMe drives between storage and PCIe sections.
- PCIe export remains topology-focused while storage export remains component-focused.
- Missing PCIe-class serials no longer produce fake `BOARD-...-PCIE-...` identifiers.
---
## ADL-029 — Local exporter guidance tracks upstream contract v2.7 terminology
**Date:** 2026-03-15
**Context:**
The upstream Reanimator hardware ingest contract moved to `v2.7` and clarified several points that
matter for LOGPile documentation: ingest-side serial fallback rules, canonical PCIe addressing via
`slot`, the optional `event_logs` section, and the shared `manufactured_year_week` field.
**Decision:**
- Keep the local mirrored contract file as an exact copy of the upstream `v2.7` document.
- Describe CPU/PCIe serial fallback as Reanimator ingest behavior, not LOGPile exporter behavior.
- Treat `pcie_devices.slot` as the canonical address on the LOGPile side as well; `bdf` may remain
an internal fallback/dedupe key but is not serialized in the payload.
- Export `event_logs` only from normalized parser/collector events that can be mapped to contract
sources `host` / `bmc` / `redfish` without synthesizing message content.
- Export `manufactured_year_week` only as a reliable passthrough when a parser/collector already
extracted a valid `YYYY-Www` value.
**Consequences:**
- Local bible wording no longer conflicts with upstream contract terminology.
- Reanimator payloads use contract-native PCIe addressing and no longer expose `bdf` as a parallel
coordinate.
- LOGPile event export remains strictly source-derived; internal warnings such as LOGPile analysis
notes do not leak into Reanimator `event_logs`.
---
## ADL-030 — Audit result rendering is delegated to embedded reanimator/chart
**Date:** 2026-03-16
**Context:**
LOGPile already owns file upload, Redfish collection, archive parsing, normalization, and
Reanimator export. Maintaining a second host-side audit renderer for the same data created
presentation drift and duplicated UI logic.
**Decision:**
- Use vendored `reanimator/chart` as the only audit result viewer.
- Keep LOGPile responsible for service flows: upload, live collection, batch convert, raw export,
Reanimator export, and parse-error reporting.
- Render the current dataset by converting it to Reanimator JSON and passing that snapshot to
embedded `chart` under `/chart/current`.
**Consequences:**
- Reanimator JSON becomes the single presentation contract for the audit surface.
- The host UI becomes a service shell around the viewer instead of maintaining its own
field-by-field tabs.
- `internal/chart` must be updated explicitly as a git submodule when the viewer changes.

42
bible-local/README.md Normal file
View File

@@ -0,0 +1,42 @@
# LOGPile Bible
`bible-local/` is the project-specific source of truth for LOGPile.
Keep top-level docs minimal and put maintained architecture/API contracts here.
## Rules
- Documentation language: English only
- Update relevant bible files in the same change as the code
- Record significant architectural decisions in [`10-decisions.md`](10-decisions.md)
- Do not duplicate shared rules from `bible/`
## Read order
| File | Purpose |
|------|---------|
| [01-overview.md](01-overview.md) | Product scope, modes, non-goals |
| [02-architecture.md](02-architecture.md) | Runtime structure, state, main flows |
| [04-data-models.md](04-data-models.md) | Stable data contracts and canonical inventory |
| [03-api.md](03-api.md) | HTTP endpoints and response contracts |
| [05-collectors.md](05-collectors.md) | Live collection behavior |
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor coverage |
| [07-exporters.md](07-exporters.md) | Raw export, Reanimator export, batch convert |
| [docs/hardware-ingest-contract.md](docs/hardware-ingest-contract.md) | Reanimator ingest schema mirrored locally |
| [08-build-release.md](08-build-release.md) | Build and release workflow |
| [09-testing.md](09-testing.md) | Test expectations and regression rules |
| [10-decisions.md](10-decisions.md) | Architectural Decision Log |
## Fast orientation
- Entry point: `cmd/logpile/main.go`
- HTTP layer: `internal/server/`
- Core contracts: `internal/models/models.go`
- Live collection: `internal/collector/`
- Archive parsing: `internal/parser/`
- Export conversion: `internal/exporter/`
- Frontend consumer: `web/static/js/app.js`
## Maintenance rule
If a document becomes stale, either fix it immediately or delete it.
Stale docs are worse than missing docs.

View File

@@ -0,0 +1,793 @@
---
title: Hardware Ingest JSON Contract
version: "2.7"
updated: "2026-03-15"
maintainer: Reanimator Core
audience: external-integrators, ai-agents
language: ru
---
# Интеграция с Reanimator: контракт JSON-импорта аппаратного обеспечения
Версия: **2.7** · Дата: **2026-03-15**
Документ описывает формат JSON для передачи данных об аппаратном обеспечении серверов в систему **Reanimator** (управление жизненным циклом аппаратного обеспечения).
Предназначен для разработчиков смежных систем (Redfish-коллекторов, агентов мониторинга, CMDB-экспортёров) и может быть включён в документацию интегрируемых проектов.
> Актуальная версия документа: https://git.mchus.pro/reanimator/core/src/branch/main/bible-local/docs/hardware-ingest-contract.md
---
## Changelog
| Версия | Дата | Изменения |
|--------|------|-----------|
| 2.7 | 2026-03-15 | Явно запрещён синтез данных в `event_logs`; интеграторы не должны придумывать серийные номера компонентов, если источник их не отдал |
| 2.6 | 2026-03-15 | Добавлена необязательная секция `event_logs` для dedup/upsert логов `host` / `bmc` / `redfish` вне history timeline |
| 2.5 | 2026-03-15 | Добавлено общее необязательное поле `manufactured_year_week` для компонентных секций (`YYYY-Www`) |
| 2.4 | 2026-03-15 | Добавлена первая волна component telemetry: health/life поля для `cpus`, `memory`, `storage`, `pcie_devices`, `power_supplies` |
| 2.3 | 2026-03-15 | Добавлены component telemetry поля: `pcie_devices.temperature_c`, `pcie_devices.power_w`, `power_supplies.temperature_c` |
| 2.2 | 2026-03-15 | Добавлено поле `numa_node` у `pcie_devices` для topology/affinity |
| 2.1 | 2026-03-15 | Добавлена секция `sensors` (fans, power, temperatures, other); поле `mac_addresses` у `pcie_devices`; расширен список значений `device_class` |
| 2.0 | 2026-02-01 | История статусов (`status_history`, `status_changed_at`); поля telemetry у PSU; async job response |
| 1.0 | 2026-01-01 | Начальная версия контракта |
---
## Принципы
1. **Snapshot** — JSON описывает состояние сервера на момент сбора. Может включать историю изменений статуса компонентов.
2. **Идемпотентность** — повторная отправка идентичного payload не создаёт дублей (дедупликация по хешу).
3. **Частичность** — можно передавать только те секции, данные по которым доступны. Пустой массив и отсутствие секции эквивалентны.
4. **Строгая схема** — endpoint использует строгий JSON-декодер; неизвестные поля приводят к `400 Bad Request`.
5. **Event-driven** — импорт создаёт события в timeline (LOG_COLLECTED, INSTALLED, REMOVED, FIRMWARE_CHANGED и др.).
6. **Без синтеза со стороны интегратора** — сборщик передаёт только фактически собранные значения. Нельзя придумывать `serial_number`, `component_ref`, `message`, `message_id` или другие идентификаторы/атрибуты, если источник их не предоставил или парсер не смог их надёжно извлечь.
---
## Endpoint
```
POST /ingest/hardware
Content-Type: application/json
```
**Ответ при приёме (202 Accepted):**
```json
{
"status": "accepted",
"job_id": "job_01J..."
}
```
Импорт выполняется асинхронно. Результат доступен по:
```
GET /ingest/hardware/jobs/{job_id}
```
**Ответ при успехе задачи:**
```json
{
"status": "success",
"bundle_id": "lb_01J...",
"asset_id": "mach_01J...",
"collected_at": "2026-02-10T15:30:00Z",
"duplicate": false,
"summary": {
"parts_observed": 15,
"parts_created": 2,
"parts_updated": 13,
"installations_created": 2,
"installations_closed": 1,
"timeline_events_created": 9,
"failure_events_created": 1
}
}
```
**Ответ при дубликате:**
```json
{
"status": "success",
"duplicate": true,
"message": "LogBundle with this content hash already exists"
}
```
**Ответ при ошибке (400 Bad Request):**
```json
{
"status": "error",
"error": "validation_failed",
"details": {
"field": "hardware.board.serial_number",
"message": "serial_number is required"
}
}
```
Частые причины `400`:
- Неверный формат `collected_at` (требуется RFC3339).
- Пустой `hardware.board.serial_number`.
- Наличие неизвестного JSON-поля на любом уровне.
- Тело запроса превышает допустимый размер.
---
## Структура верхнего уровня
```json
{
"filename": "redfish://10.10.10.103",
"source_type": "api",
"protocol": "redfish",
"target_host": "10.10.10.103",
"collected_at": "2026-02-10T15:30:00Z",
"hardware": {
"board": { ... },
"firmware": [ ... ],
"cpus": [ ... ],
"memory": [ ... ],
"storage": [ ... ],
"pcie_devices": [ ... ],
"power_supplies": [ ... ],
"sensors": { ... },
"event_logs": [ ... ]
}
}
```
### Поля верхнего уровня
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `collected_at` | string RFC3339 | **да** | Время сбора данных |
| `hardware` | object | **да** | Аппаратный снапшот |
| `hardware.board.serial_number` | string | **да** | Серийный номер платы/сервера |
| `target_host` | string | нет | IP или hostname |
| `source_type` | string | нет | Тип источника: `api`, `logfile`, `manual` |
| `protocol` | string | нет | Протокол: `redfish`, `ipmi`, `snmp`, `ssh` |
| `filename` | string | нет | Идентификатор источника |
---
## Общие поля статуса компонентов
Применяются ко всем компонентным секциям (`cpus`, `memory`, `storage`, `pcie_devices`, `power_supplies`).
| Поле | Тип | Описание |
|------|-----|----------|
| `status` | string | Текущий статус: `OK`, `Warning`, `Critical`, `Unknown`, `Empty` |
| `status_checked_at` | string RFC3339 | Время последней проверки статуса |
| `status_changed_at` | string RFC3339 | Время последнего изменения статуса |
| `status_history` | array | История переходов статусов (см. ниже) |
| `error_description` | string | Текст ошибки/диагностики |
| `manufactured_year_week` | string | Дата производства в формате `YYYY-Www`, например `2024-W07` |
**Объект `status_history[]`:**
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `status` | string | **да** | Статус в этот момент |
| `changed_at` | string RFC3339 | **да** | Время перехода (без этого поля запись игнорируется) |
| `details` | string | нет | Пояснение к переходу |
**Правила приоритета времени события:**
1. `status_changed_at`
2. Последняя запись `status_history` с совпадающим статусом
3. Последняя парсируемая запись `status_history`
4. `status_checked_at`
**Правила передачи статусов:**
- Передавайте `status` как текущее состояние компонента в snapshot.
- Если источник хранит историю — передавайте `status_history` отсортированным по `changed_at` по возрастанию.
- Не включайте записи `status_history` без `changed_at`.
- Все даты — RFC3339, рекомендуется UTC (`Z`).
- `manufactured_year_week` используйте, когда источник знает только год и неделю производства, без точной календарной даты.
---
## Секции hardware
### board
Основная информация о сервере. Обязательная секция.
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `serial_number` | string | **да** | Серийный номер (ключ идентификации Asset) |
| `manufacturer` | string | нет | Производитель |
| `product_name` | string | нет | Модель |
| `part_number` | string | нет | Партномер |
| `uuid` | string | нет | UUID системы |
Значения `"NULL"` в строковых полях трактуются как отсутствие данных.
```json
"board": {
"manufacturer": "Supermicro",
"product_name": "X12DPG-QT6",
"serial_number": "21D634101",
"part_number": "X12DPG-QT6-REV1.01",
"uuid": "d7ef2fe5-2fd0-11f0-910a-346f11040868"
}
```
---
### firmware
Версии прошивок системных компонентов (BIOS, BMC, CPLD и др.).
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `device_name` | string | **да** | Название устройства (`BIOS`, `BMC`, `CPLD`, …) |
| `version` | string | **да** | Версия прошивки |
Записи с пустым `device_name` или `version` игнорируются.
Изменение версии создаёт событие `FIRMWARE_CHANGED` для Asset.
```json
"firmware": [
{ "device_name": "BIOS", "version": "06.08.05" },
{ "device_name": "BMC", "version": "5.17.00" },
{ "device_name": "CPLD", "version": "01.02.03" }
]
```
---
### cpus
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `socket` | int | **да** | Номер сокета (используется для генерации serial) |
| `model` | string | нет | Модель процессора |
| `manufacturer` | string | нет | Производитель |
| `cores` | int | нет | Количество ядер |
| `threads` | int | нет | Количество потоков |
| `frequency_mhz` | int | нет | Текущая частота |
| `max_frequency_mhz` | int | нет | Максимальная частота |
| `temperature_c` | float | нет | Температура CPU, °C (telemetry) |
| `power_w` | float | нет | Текущая мощность CPU, Вт (telemetry) |
| `throttled` | bool | нет | Зафиксирован thermal/power throttling |
| `correctable_error_count` | int | нет | Количество корректируемых ошибок CPU |
| `uncorrectable_error_count` | int | нет | Количество некорректируемых ошибок CPU |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| `serial_number` | string | нет | Серийный номер (если доступен) |
| `firmware` | string | нет | Версия микрокода; если логгер отдает `Microcode level`, передавайте его сюда как есть |
| `present` | bool | нет | Наличие (по умолчанию `true`) |
| + общие поля статуса | | | см. раздел выше |
**Генерация serial_number при отсутствии:** `{board_serial}-CPU-{socket}`
Если источник использует поле/лейбл `Microcode level`, его значение передавайте в `cpus[].firmware` без дополнительного преобразования.
```json
"cpus": [
{
"socket": 0,
"model": "INTEL(R) XEON(R) GOLD 6530",
"cores": 32,
"threads": 64,
"frequency_mhz": 2100,
"max_frequency_mhz": 4000,
"temperature_c": 61.5,
"power_w": 182.0,
"throttled": false,
"manufacturer": "Intel",
"status": "OK",
"status_checked_at": "2026-02-10T15:28:00Z"
}
]
```
---
### memory
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `slot` | string | нет | Идентификатор слота |
| `present` | bool | нет | Наличие модуля (по умолчанию `true`) |
| `serial_number` | string | нет | Серийный номер |
| `part_number` | string | нет | Партномер (используется как модель) |
| `manufacturer` | string | нет | Производитель |
| `size_mb` | int | нет | Объём в МБ |
| `type` | string | нет | Тип: `DDR3`, `DDR4`, `DDR5`, … |
| `max_speed_mhz` | int | нет | Максимальная частота |
| `current_speed_mhz` | int | нет | Текущая частота |
| `temperature_c` | float | нет | Температура DIMM/модуля, °C (telemetry) |
| `correctable_ecc_error_count` | int | нет | Количество корректируемых ECC-ошибок |
| `uncorrectable_ecc_error_count` | int | нет | Количество некорректируемых ECC-ошибок |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| `spare_blocks_remaining_pct` | float | нет | Остаток spare blocks, % |
| `performance_degraded` | bool | нет | Зафиксирована деградация производительности |
| `data_loss_detected` | bool | нет | Источник сигнализирует риск/факт потери данных |
| + общие поля статуса | | | см. раздел выше |
Модуль без `serial_number` игнорируется. Модуль с `present=false` или `status=Empty` игнорируется.
```json
"memory": [
{
"slot": "CPU0_C0D0",
"present": true,
"size_mb": 32768,
"type": "DDR5",
"max_speed_mhz": 4800,
"current_speed_mhz": 4800,
"temperature_c": 43.0,
"correctable_ecc_error_count": 0,
"manufacturer": "Hynix",
"serial_number": "80AD032419E17CEEC1",
"part_number": "HMCG88AGBRA191N",
"status": "OK"
}
]
```
---
### storage
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `slot` | string | нет | Канонический адрес установки PCIe-устройства; передавайте BDF (`0000:18:00.0`) |
| `serial_number` | string | нет | Серийный номер |
| `model` | string | нет | Модель |
| `manufacturer` | string | нет | Производитель |
| `type` | string | нет | Тип: `NVMe`, `SSD`, `HDD` |
| `interface` | string | нет | Интерфейс: `NVMe`, `SATA`, `SAS` |
| `size_gb` | int | нет | Размер в ГБ |
| `temperature_c` | float | нет | Температура накопителя, °C (telemetry) |
| `power_on_hours` | int64 | нет | Время работы, часы |
| `power_cycles` | int64 | нет | Количество циклов питания |
| `unsafe_shutdowns` | int64 | нет | Нештатные выключения |
| `media_errors` | int64 | нет | Ошибки носителя / media errors |
| `error_log_entries` | int64 | нет | Количество записей в error log |
| `written_bytes` | int64 | нет | Всего записано байт |
| `read_bytes` | int64 | нет | Всего прочитано байт |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `available_spare_pct` | float | нет | Доступный spare, % |
| `reallocated_sectors` | int64 | нет | Переназначенные сектора |
| `current_pending_sectors` | int64 | нет | Сектора в ожидании ремапа |
| `offline_uncorrectable` | int64 | нет | Некорректируемые ошибки offline scan |
| `firmware` | string | нет | Версия прошивки |
| `present` | bool | нет | Наличие (по умолчанию `true`) |
| + общие поля статуса | | | см. раздел выше |
Диск без `serial_number` игнорируется. Изменение `firmware` создаёт событие `FIRMWARE_CHANGED`.
```json
"storage": [
{
"slot": "OB01",
"type": "NVMe",
"model": "INTEL SSDPF2KX076T1",
"size_gb": 7680,
"temperature_c": 38.5,
"power_on_hours": 12450,
"unsafe_shutdowns": 3,
"written_bytes": 9876543210,
"life_remaining_pct": 91.0,
"serial_number": "BTAX41900GF87P6DGN",
"manufacturer": "Intel",
"firmware": "9CV10510",
"interface": "NVMe",
"present": true,
"status": "OK"
}
]
```
---
### pcie_devices
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `slot` | string | нет | Идентификатор слота |
| `vendor_id` | int | нет | PCI Vendor ID (decimal) |
| `device_id` | int | нет | PCI Device ID (decimal) |
| `numa_node` | int | нет | NUMA node / CPU affinity устройства |
| `temperature_c` | float | нет | Температура устройства, °C (telemetry) |
| `power_w` | float | нет | Текущее энергопотребление устройства, Вт (telemetry) |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| `ecc_corrected_total` | int64 | нет | Всего корректируемых ECC-ошибок |
| `ecc_uncorrected_total` | int64 | нет | Всего некорректируемых ECC-ошибок |
| `hw_slowdown` | bool | нет | Устройство вошло в hardware slowdown / protective mode |
| `battery_charge_pct` | float | нет | Заряд батареи / supercap, % |
| `battery_health_pct` | float | нет | Состояние батареи / supercap, % |
| `battery_temperature_c` | float | нет | Температура батареи / supercap, °C |
| `battery_voltage_v` | float | нет | Напряжение батареи / supercap, В |
| `battery_replace_required` | bool | нет | Требуется замена батареи / supercap |
| `sfp_temperature_c` | float | нет | Температура SFP/optic, °C |
| `sfp_tx_power_dbm` | float | нет | TX optical power, dBm |
| `sfp_rx_power_dbm` | float | нет | RX optical power, dBm |
| `sfp_voltage_v` | float | нет | Напряжение SFP, В |
| `sfp_bias_ma` | float | нет | Bias current SFP, мА |
| `bdf` | string | нет | Deprecated alias для `slot`; при наличии ingest нормализует его в `slot` |
| `device_class` | string | нет | Класс устройства (см. список ниже) |
| `manufacturer` | string | нет | Производитель |
| `model` | string | нет | Модель |
| `serial_number` | string | нет | Серийный номер |
| `firmware` | string | нет | Версия прошивки |
| `link_width` | int | нет | Текущая ширина линка |
| `link_speed` | string | нет | Текущая скорость: `Gen3`, `Gen4`, `Gen5` |
| `max_link_width` | int | нет | Максимальная ширина линка |
| `max_link_speed` | string | нет | Максимальная скорость |
| `mac_addresses` | string[] | нет | MAC-адреса портов (для сетевых устройств) |
| `present` | bool | нет | Наличие (по умолчанию `true`) |
| + общие поля статуса | | | см. раздел выше |
`numa_node` передавайте для NIC / InfiniBand / RAID / GPU, когда источник знает CPU/NUMA affinity. Поле сохраняется в snapshot-атрибутах PCIe-компонента и дублируется в telemetry для topology use cases.
Поля `temperature_c` и `power_w` используйте для device-level telemetry GPU / accelerator / smart PCIe devices. Они не влияют на идентификацию компонента.
**Генерация serial_number при отсутствии или `"N/A"`:** `{board_serial}-PCIE-{slot}`, где `slot` для PCIe равен BDF.
`slot` — единственный канонический адрес компонента. Для PCIe в `slot` передавайте BDF. Поле `bdf` сохраняется только как переходный alias на входе и не должно использоваться как отдельная координата рядом со `slot`.
**Значения `device_class`:**
| Значение | Назначение |
|----------|------------|
| `MassStorageController` | RAID-контроллеры |
| `StorageController` | HBA, SAS-контроллеры |
| `NetworkController` | Сетевые адаптеры (InfiniBand, общий) |
| `EthernetController` | Ethernet NIC |
| `FibreChannelController` | Fibre Channel HBA |
| `VideoController` | GPU, видеокарты |
| `ProcessingAccelerator` | Вычислительные ускорители (AI/ML) |
| `DisplayController` | Контроллеры дисплея (BMC VGA) |
Список открытый: допускаются произвольные строки для нестандартных классов.
```json
"pcie_devices": [
{
"slot": "0000:3b:00.0",
"vendor_id": 5555,
"device_id": 4401,
"numa_node": 0,
"temperature_c": 48.5,
"power_w": 18.2,
"sfp_temperature_c": 36.2,
"sfp_tx_power_dbm": -1.8,
"sfp_rx_power_dbm": -2.1,
"device_class": "EthernetController",
"manufacturer": "Intel",
"model": "X710 10GbE",
"serial_number": "K65472-003",
"firmware": "9.20 0x8000d4ae",
"mac_addresses": ["3c:fd:fe:aa:bb:cc", "3c:fd:fe:aa:bb:cd"],
"status": "OK"
}
]
```
---
### power_supplies
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `slot` | string | нет | Идентификатор слота |
| `present` | bool | нет | Наличие (по умолчанию `true`) |
| `serial_number` | string | нет | Серийный номер |
| `part_number` | string | нет | Партномер |
| `model` | string | нет | Модель |
| `vendor` | string | нет | Производитель |
| `wattage_w` | int | нет | Мощность в ваттах |
| `firmware` | string | нет | Версия прошивки |
| `input_type` | string | нет | Тип входа (например `ACWideRange`) |
| `input_voltage` | float | нет | Входное напряжение, В (telemetry) |
| `input_power_w` | float | нет | Входная мощность, Вт (telemetry) |
| `output_power_w` | float | нет | Выходная мощность, Вт (telemetry) |
| `temperature_c` | float | нет | Температура PSU, °C (telemetry) |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| + общие поля статуса | | | см. раздел выше |
Поля telemetry (`input_voltage`, `input_power_w`, `output_power_w`, `temperature_c`, `life_remaining_pct`, `life_used_pct`) сохраняются в атрибутах компонента и не влияют на его идентификацию.
PSU без `serial_number` игнорируется.
```json
"power_supplies": [
{
"slot": "0",
"present": true,
"model": "GW-CRPS3000LW",
"vendor": "Great Wall",
"wattage_w": 3000,
"serial_number": "2P06C102610",
"firmware": "00.03.05",
"status": "OK",
"input_type": "ACWideRange",
"input_power_w": 137,
"output_power_w": 104,
"input_voltage": 215.25,
"temperature_c": 39.5,
"life_remaining_pct": 97.0
}
]
```
---
### sensors
Показания сенсоров сервера. Секция опциональная, не привязана к компонентам.
Данные хранятся как последнее известное значение (last-known-value) на уровне Asset.
```json
"sensors": {
"fans": [ ... ],
"power": [ ... ],
"temperatures": [ ... ],
"other": [ ... ]
}
```
---
### event_logs
Нормализованные операционные логи сервера из `host`, `bmc` или `redfish`.
Эти записи не попадают в history timeline и не создают history events. Они сохраняются в отдельной deduplicated log store и отображаются в отдельном UI-блоке asset logs / host logs.
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `source` | string | **да** | Источник лога: `host`, `bmc`, `redfish` |
| `event_time` | string RFC3339 | нет | Время события из источника; если отсутствует, используется время ingest/collection |
| `severity` | string | нет | Уровень: `OK`, `Info`, `Warning`, `Critical`, `Unknown` |
| `message_id` | string | нет | Идентификатор/код события источника |
| `message` | string | **да** | Нормализованный текст события |
| `component_ref` | string | нет | Ссылка на компонент/устройство/слот, если извлекается |
| `fingerprint` | string | нет | Внешний готовый dedup-key; если не передан, система вычисляет свой |
| `is_active` | bool | нет | Признак, что событие всё ещё активно/не погашено, если источник умеет lifecycle |
| `raw_payload` | object | нет | Сырой vendor-specific payload для диагностики |
**Правила event_logs:**
- Логи дедуплицируются в рамках asset + source + fingerprint.
- Если `fingerprint` не передан, система строит его из нормализованных полей (`source`, `message_id`, `message`, `component_ref`, временная нормализация).
- Интегратор/сборщик логов не должен синтезировать содержимое событий: не придумывайте `message`, `message_id`, `component_ref`, serial/device identifiers или иные поля, если они отсутствуют в исходном логе или не были надёжно извлечены.
- Повторное получение того же события обновляет `last_seen_at`/счётчик повторов и не должно создавать новый timeline/history event.
- `event_logs` используются для отдельного UI-представления логов и не изменяют canonical state компонентов/asset по умолчанию.
```json
"event_logs": [
{
"source": "bmc",
"event_time": "2026-03-15T14:03:11Z",
"severity": "Warning",
"message_id": "0x000F",
"message": "Correctable ECC error threshold exceeded",
"component_ref": "CPU0_C0D0",
"raw_payload": {
"sensor": "DIMM_A1",
"sel_record_id": "0042"
}
},
{
"source": "redfish",
"event_time": "2026-03-15T14:03:20Z",
"severity": "Info",
"message_id": "OpenBMC.0.1.SystemReboot",
"message": "System reboot requested by administrator",
"component_ref": "Mainboard"
}
]
```
#### sensors.fans
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `name` | string | **да** | Уникальное имя сенсора в рамках секции |
| `location` | string | нет | Физическое расположение |
| `rpm` | int | нет | Обороты, RPM |
| `status` | string | нет | Статус: `OK`, `Warning`, `Critical`, `Unknown` |
#### sensors.power
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `name` | string | **да** | Уникальное имя сенсора |
| `location` | string | нет | Физическое расположение |
| `voltage_v` | float | нет | Напряжение, В |
| `current_a` | float | нет | Ток, А |
| `power_w` | float | нет | Мощность, Вт |
| `status` | string | нет | Статус |
#### sensors.temperatures
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `name` | string | **да** | Уникальное имя сенсора |
| `location` | string | нет | Физическое расположение |
| `celsius` | float | нет | Температура, °C |
| `threshold_warning_celsius` | float | нет | Порог Warning, °C |
| `threshold_critical_celsius` | float | нет | Порог Critical, °C |
| `status` | string | нет | Статус |
#### sensors.other
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `name` | string | **да** | Уникальное имя сенсора |
| `location` | string | нет | Физическое расположение |
| `value` | float | нет | Значение |
| `unit` | string | нет | Единица измерения |
| `status` | string | нет | Статус |
**Правила sensors:**
- Идентификатор сенсора: пара `(sensor_type, name)`. Дубли в одном payload — берётся первое вхождение.
- Сенсоры без `name` игнорируются.
- При каждом импорте значения перезаписываются (upsert по ключу).
```json
"sensors": {
"fans": [
{ "name": "FAN1", "location": "Front", "rpm": 4200, "status": "OK" },
{ "name": "FAN_CPU0", "location": "CPU0", "rpm": 5600, "status": "OK" }
],
"power": [
{ "name": "12V Rail", "location": "Mainboard", "voltage_v": 12.06, "status": "OK" },
{ "name": "PSU0 Input", "location": "PSU0", "voltage_v": 215.25, "current_a": 0.64, "power_w": 137.0, "status": "OK" }
],
"temperatures": [
{ "name": "CPU0 Temp", "location": "CPU0", "celsius": 46.0, "threshold_warning_celsius": 80.0, "threshold_critical_celsius": 95.0, "status": "OK" },
{ "name": "Inlet Temp", "location": "Front", "celsius": 22.0, "threshold_warning_celsius": 40.0, "threshold_critical_celsius": 50.0, "status": "OK" }
],
"other": [
{ "name": "System Humidity", "value": 38.5, "unit": "%" , "status": "OK" }
]
}
```
---
## Обработка статусов компонентов
| Статус | Поведение |
|--------|-----------|
| `OK` | Нормальная обработка |
| `Warning` | Создаётся событие `COMPONENT_WARNING` |
| `Critical` | Создаётся событие `COMPONENT_FAILED` + запись в `failure_events` |
| `Unknown` | Компонент считается рабочим, создаётся событие `COMPONENT_UNKNOWN` |
| `Empty` | Компонент не создаётся/не обновляется |
---
## Обработка отсутствующих serial_number
Общее правило для всех секций: если источник не вернул серийный номер и сборщик не смог его надёжно извлечь, интегратор не должен подставлять вымышленные значения, хеши, локальные placeholder-идентификаторы или серийные номера "по догадке". Разрешены только явно оговорённые ниже server-side fallback-правила ingest.
| Тип | Поведение |
|-----|-----------|
| CPU | Генерируется: `{board_serial}-CPU-{socket}` |
| PCIe | Генерируется: `{board_serial}-PCIE-{slot}` (если serial = `"N/A"` или пустой; `slot` для PCIe = BDF) |
| Memory | Компонент игнорируется |
| Storage | Компонент игнорируется |
| PSU | Компонент игнорируется |
Если `serial_number` не уникален внутри одного payload для того же `model`:
- Первое вхождение сохраняет оригинальный серийный номер.
- Каждое следующее дублирующее получает placeholder: `NO_SN-XXXXXXXX`.
---
## Минимальный валидный пример
```json
{
"collected_at": "2026-02-10T15:30:00Z",
"target_host": "192.168.1.100",
"hardware": {
"board": {
"serial_number": "SRV-001"
}
}
}
```
---
## Полный пример с историей статусов
```json
{
"filename": "redfish://10.10.10.103",
"source_type": "api",
"protocol": "redfish",
"target_host": "10.10.10.103",
"collected_at": "2026-02-10T15:30:00Z",
"hardware": {
"board": {
"manufacturer": "Supermicro",
"product_name": "X12DPG-QT6",
"serial_number": "21D634101"
},
"firmware": [
{ "device_name": "BIOS", "version": "06.08.05" },
{ "device_name": "BMC", "version": "5.17.00" }
],
"cpus": [
{
"socket": 0,
"model": "INTEL(R) XEON(R) GOLD 6530",
"manufacturer": "Intel",
"cores": 32,
"threads": 64,
"status": "OK"
}
],
"storage": [
{
"slot": "OB01",
"type": "NVMe",
"model": "INTEL SSDPF2KX076T1",
"size_gb": 7680,
"serial_number": "BTAX41900GF87P6DGN",
"manufacturer": "Intel",
"firmware": "9CV10510",
"present": true,
"status": "OK",
"status_changed_at": "2026-02-10T15:22:00Z",
"status_history": [
{ "status": "Critical", "changed_at": "2026-02-10T15:10:00Z", "details": "I/O timeout on NVMe queue 3" },
{ "status": "OK", "changed_at": "2026-02-10T15:22:00Z", "details": "Recovered after controller reset" }
]
}
],
"pcie_devices": [
{
"slot": "0000:18:00.0",
"device_class": "EthernetController",
"manufacturer": "Intel",
"model": "X710 10GbE",
"serial_number": "K65472-003",
"mac_addresses": ["3c:fd:fe:aa:bb:cc", "3c:fd:fe:aa:bb:cd"],
"status": "OK"
}
],
"power_supplies": [
{
"slot": "0",
"present": true,
"model": "GW-CRPS3000LW",
"vendor": "Great Wall",
"wattage_w": 3000,
"serial_number": "2P06C102610",
"firmware": "00.03.05",
"status": "OK",
"input_power_w": 137,
"output_power_w": 104,
"input_voltage": 215.25
}
],
"sensors": {
"fans": [
{ "name": "FAN1", "location": "Front", "rpm": 4200, "status": "OK" }
],
"power": [
{ "name": "12V Rail", "voltage_v": 12.06, "status": "OK" }
],
"temperatures": [
{ "name": "CPU0 Temp", "celsius": 46.0, "threshold_warning_celsius": 80.0, "threshold_critical_celsius": 95.0, "status": "OK" }
],
"other": [
{ "name": "System Humidity", "value": 38.5, "unit": "%" }
]
}
}
}
```

View File

@@ -40,6 +40,8 @@ func main() {
cfg := server.Config{
Port: *port,
PreloadFile: *file,
AppVersion: version,
AppCommit: commit,
}
srv := server.New(cfg)

6
go.mod
View File

@@ -1,3 +1,7 @@
module git.mchus.pro/mchus/logpile
go 1.22
go 1.24.0
require reanimator/chart v0.0.0
replace reanimator/chart => ./internal/chart

1
internal/chart Submodule

Submodule internal/chart added at a71f55a6f9

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,40 @@
package collector
import (
"strings"
"testing"
)
func TestParseNIC_ResolvesModelFromPCIIDs(t *testing.T) {
doc := map[string]interface{}{
"Id": "NIC1",
"VendorId": "0x8086",
"DeviceId": "0x1521",
"Model": "0x1521",
}
nic := parseNIC(doc)
if nic.Model == "" {
t.Fatalf("expected model resolved from pci.ids")
}
if !strings.Contains(strings.ToUpper(nic.Model), "I350") {
t.Fatalf("expected I350 in model, got %q", nic.Model)
}
}
func TestParsePCIeFunction_ResolvesDeviceClassFromPCIIDs(t *testing.T) {
doc := map[string]interface{}{
"Id": "PCIE1",
"VendorId": "0x9005",
"DeviceId": "0x028f",
"ClassCode": "0x010700",
}
dev := parsePCIeFunction(doc, 0)
if dev.DeviceClass == "" || strings.EqualFold(dev.DeviceClass, "PCIe device") {
t.Fatalf("expected device class resolved from pci.ids, got %q", dev.DeviceClass)
}
if strings.HasPrefix(strings.ToLower(strings.TrimSpace(dev.DeviceClass)), "0x") {
t.Fatalf("expected resolved name instead of raw hex, got %q", dev.DeviceClass)
}
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -7,14 +7,15 @@ import (
)
type Request struct {
Host string
Protocol string
Port int
Username string
AuthType string
Password string
Token string
TLSMode string
Host string
Protocol string
Port int
Username string
AuthType string
Password string
Token string
TLSMode string
PowerOnIfHostOff bool
}
type Progress struct {
@@ -25,7 +26,20 @@ type Progress struct {
type ProgressFn func(Progress)
type ProbeResult struct {
Reachable bool
Protocol string
HostPowerState string
HostPoweredOn bool
PowerControlAvailable bool
SystemPath string
}
type Connector interface {
Protocol() string
Collect(ctx context.Context, req Request, emit ProgressFn) (*models.AnalysisResult, error)
}
type Prober interface {
Probe(ctx context.Context, req Request) (*ProbeResult, error)
}

View File

@@ -3,9 +3,8 @@ package exporter
import (
"encoding/csv"
"encoding/json"
"fmt"
"io"
"text/tabwriter"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
@@ -36,7 +35,7 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
// FRU data
for _, fru := range e.result.FRU {
if fru.SerialNumber == "" {
if !hasUsableSerial(fru.SerialNumber) {
continue
}
name := fru.ProductName
@@ -55,50 +54,45 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
// Hardware data
if e.result.Hardware != nil {
// Memory
for _, mem := range e.result.Hardware.Memory {
if mem.SerialNumber == "" {
continue
}
location := mem.Location
if location == "" {
location = mem.Slot
}
// Board
if hasUsableSerial(e.result.Hardware.BoardInfo.SerialNumber) {
if err := writer.Write([]string{
mem.PartNumber,
mem.SerialNumber,
mem.Manufacturer,
location,
e.result.Hardware.BoardInfo.ProductName,
strings.TrimSpace(e.result.Hardware.BoardInfo.SerialNumber),
e.result.Hardware.BoardInfo.Manufacturer,
"Board",
}); err != nil {
return err
}
}
// Storage
for _, stor := range e.result.Hardware.Storage {
if stor.SerialNumber == "" {
seenCanonical := make(map[string]struct{})
for _, dev := range canonicalDevicesForExport(e.result.Hardware) {
if !hasUsableSerial(dev.SerialNumber) {
continue
}
if err := writer.Write([]string{
stor.Model,
stor.SerialNumber,
stor.Manufacturer,
stor.Slot,
}); err != nil {
serial := strings.TrimSpace(dev.SerialNumber)
seenCanonical[serial] = struct{}{}
component, manufacturer, location := csvFieldsFromCanonicalDevice(dev)
if err := writer.Write([]string{component, serial, manufacturer, location}); err != nil {
return err
}
}
// PCIe devices
for _, pcie := range e.result.Hardware.PCIeDevices {
if pcie.SerialNumber == "" {
// Legacy network cards
for _, nic := range e.result.Hardware.NetworkCards {
if !hasUsableSerial(nic.SerialNumber) {
continue
}
serial := strings.TrimSpace(nic.SerialNumber)
if _, ok := seenCanonical[serial]; ok {
continue
}
if err := writer.Write([]string{
pcie.DeviceClass,
pcie.SerialNumber,
pcie.Manufacturer,
pcie.Slot,
nic.Model,
serial,
"",
"Network",
}); err != nil {
return err
}
@@ -115,220 +109,64 @@ func (e *Exporter) ExportJSON(w io.Writer) error {
return encoder.Encode(e.result)
}
// ExportTXT exports a human-readable text report
func (e *Exporter) ExportTXT(w io.Writer) error {
fmt.Fprintln(w, "LOGPile Analysis Report - mchus.pro")
fmt.Fprintln(w, "====================================")
fmt.Fprintln(w)
if e.result == nil {
fmt.Fprintln(w, "No data loaded.")
return nil
func hasUsableSerial(serial string) bool {
s := strings.TrimSpace(serial)
if s == "" {
return false
}
fmt.Fprintf(w, "File:\t%s\n", e.result.Filename)
fmt.Fprintf(w, "Source:\t%s\n", e.result.SourceType)
fmt.Fprintf(w, "Protocol:\t%s\n", e.result.Protocol)
fmt.Fprintf(w, "Target:\t%s\n", e.result.TargetHost)
fmt.Fprintln(w)
// Server model and serial number
if e.result.Hardware != nil && e.result.Hardware.BoardInfo.ProductName != "" {
fmt.Fprintf(w, "Server Model:\t%s\n", e.result.Hardware.BoardInfo.ProductName)
fmt.Fprintf(w, "Serial Number:\t%s\n", e.result.Hardware.BoardInfo.SerialNumber)
switch strings.ToUpper(s) {
case "N/A", "NA", "NONE", "NULL", "UNKNOWN", "-":
return false
default:
return true
}
fmt.Fprintln(w)
// Hardware summary
if e.result.Hardware != nil {
hw := e.result.Hardware
// Firmware tab
if len(hw.Firmware) > 0 {
fmt.Fprintln(w, "FIRMWARE VERSIONS")
fmt.Fprintln(w, "-----------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Component\tVersion\tBuild Time")
for _, fw := range hw.Firmware {
fmt.Fprintf(tw, "%s\t%s\t%s\n", fw.DeviceName, fw.Version, fw.BuildTime)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// CPU tab
if len(hw.CPUs) > 0 {
fmt.Fprintln(w, "PROCESSORS")
fmt.Fprintln(w, "----------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Socket\tModel\tCores\tThreads\tFreq MHz\tTurbo MHz\tTDP W\tPPIN/SN")
for _, cpu := range hw.CPUs {
id := cpu.SerialNumber
if id == "" {
id = cpu.PPIN
}
fmt.Fprintf(tw, "CPU%d\t%s\t%d\t%d\t%d\t%d\t%d\t%s\n",
cpu.Socket, cpu.Model, cpu.Cores, cpu.Threads, cpu.FrequencyMHz, cpu.MaxFreqMHz, cpu.TDP, id)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Memory tab
if len(hw.Memory) > 0 {
fmt.Fprintln(w, "MEMORY")
fmt.Fprintln(w, "------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tPresent\tSize MB\tType\tSpeed MHz\tVendor\tModel/PN\tSerial\tStatus")
for _, mem := range hw.Memory {
location := mem.Location
if location == "" {
location = mem.Slot
}
fmt.Fprintf(tw, "%s\t%t\t%d\t%s\t%d\t%s\t%s\t%s\t%s\n",
location, mem.Present, mem.SizeMB, mem.Type, mem.CurrentSpeedMHz, mem.Manufacturer, mem.PartNumber, mem.SerialNumber, mem.Status)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Power tab
if len(hw.PowerSupply) > 0 {
fmt.Fprintln(w, "POWER SUPPLIES")
fmt.Fprintln(w, "--------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tPresent\tVendor\tModel\tWattage W\tInput W\tOutput W\tInput V\tTemp C\tStatus\tSerial")
for _, psu := range hw.PowerSupply {
fmt.Fprintf(tw, "%s\t%t\t%s\t%s\t%d\t%d\t%d\t%.0f\t%d\t%s\t%s\n",
psu.Slot, psu.Present, psu.Vendor, psu.Model, psu.WattageW, psu.InputPowerW, psu.OutputPowerW, psu.InputVoltage, psu.TemperatureC, psu.Status, psu.SerialNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Storage tab
if len(hw.Storage) > 0 {
fmt.Fprintln(w, "STORAGE")
fmt.Fprintln(w, "-------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tPresent\tType\tInterface\tModel\tSize GB\tVendor\tFirmware\tSerial")
for _, stor := range hw.Storage {
fmt.Fprintf(tw, "%s\t%t\t%s\t%s\t%s\t%d\t%s\t%s\t%s\n",
stor.Slot, stor.Present, stor.Type, stor.Interface, stor.Model, stor.SizeGB, stor.Manufacturer, stor.Firmware, stor.SerialNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// GPU tab
if len(hw.GPUs) > 0 {
fmt.Fprintln(w, "GPUS")
fmt.Fprintln(w, "----")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tModel\tVendor\tBDF\tPCIe\tSerial\tStatus")
for _, gpu := range hw.GPUs {
link := fmt.Sprintf("x%d %s", gpu.CurrentLinkWidth, gpu.CurrentLinkSpeed)
if gpu.MaxLinkWidth > 0 || gpu.MaxLinkSpeed != "" {
link = fmt.Sprintf("%s / x%d %s", link, gpu.MaxLinkWidth, gpu.MaxLinkSpeed)
}
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%s\t%s\t%s\n",
gpu.Slot, gpu.Model, gpu.Manufacturer, gpu.BDF, link, gpu.SerialNumber, gpu.Status)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Network tab
if len(hw.NetworkAdapters) > 0 {
fmt.Fprintln(w, "NETWORK ADAPTERS")
fmt.Fprintln(w, "----------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tLocation\tModel\tVendor\tPorts\tType\tStatus\tSerial")
for _, nic := range hw.NetworkAdapters {
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%d\t%s\t%s\t%s\n",
nic.Slot, nic.Location, nic.Model, nic.Vendor, nic.PortCount, nic.PortType, nic.Status, nic.SerialNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Device inventory tab
if len(hw.PCIeDevices) > 0 {
fmt.Fprintln(w, "PCIE DEVICES")
fmt.Fprintln(w, "------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Slot\tBDF\tClass\tVendor\tVID:DID\tLink\tSerial")
for _, pcie := range hw.PCIeDevices {
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%04x:%04x\tx%d %s / x%d %s\t%s\n",
pcie.Slot, pcie.BDF, pcie.DeviceClass, pcie.Manufacturer, pcie.VendorID, pcie.DeviceID,
pcie.LinkWidth, pcie.LinkSpeed, pcie.MaxLinkWidth, pcie.MaxLinkSpeed, pcie.SerialNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
}
// Sensors tab
if len(e.result.Sensors) > 0 {
fmt.Fprintln(w, "SENSOR READINGS")
fmt.Fprintln(w, "---------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Type\tName\tValue\tUnit\tRaw\tStatus")
for _, s := range e.result.Sensors {
fmt.Fprintf(tw, "%s\t%s\t%.0f\t%s\t%s\t%s\n", s.Type, s.Name, s.Value, s.Unit, s.RawValue, s.Status)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Serials/FRU tab
if len(e.result.FRU) > 0 {
fmt.Fprintln(w, "FRU COMPONENTS")
fmt.Fprintln(w, "--------------")
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Description\tManufacturer\tProduct\tSerial\tPart Number")
for _, fru := range e.result.FRU {
name := fru.ProductName
if name == "" {
name = fru.Description
}
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%s\n", fru.Description, fru.Manufacturer, name, fru.SerialNumber, fru.PartNumber)
}
_ = tw.Flush()
fmt.Fprintln(w)
}
// Events tab
fmt.Fprintf(w, "EVENTS: %d total\n", len(e.result.Events))
if len(e.result.Events) > 0 {
tw := tabwriter.NewWriter(w, 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "Time\tSeverity\tSource\tType\tName\tDescription")
for _, ev := range e.result.Events {
fmt.Fprintf(tw, "%s\t%s\t%s\t%s\t%s\t%s\n",
ev.Timestamp.Format("2006-01-02 15:04:05"), ev.Severity, ev.Source, ev.SensorType, ev.SensorName, ev.Description)
}
_ = tw.Flush()
}
var critical, warning, info int
for _, ev := range e.result.Events {
switch ev.Severity {
case models.SeverityCritical:
critical++
case models.SeverityWarning:
warning++
case models.SeverityInfo:
info++
}
}
fmt.Fprintf(w, " Critical: %d\n", critical)
fmt.Fprintf(w, " Warning: %d\n", warning)
fmt.Fprintf(w, " Info: %d\n", info)
// Footer
fmt.Fprintln(w)
fmt.Fprintln(w, "------------------------------------")
fmt.Fprintln(w, "Generated by LOGPile - mchus.pro")
fmt.Fprintln(w, "https://git.mchus.pro/mchus/logpile")
return nil
}
func csvFieldsFromCanonicalDevice(dev models.HardwareDevice) (component, manufacturer, location string) {
component = firstNonEmptyString(
dev.Model,
dev.PartNumber,
dev.DeviceClass,
dev.Kind,
)
manufacturer = firstNonEmptyString(dev.Manufacturer, inferCSVVendor(dev))
location = firstNonEmptyString(dev.Location, dev.Slot, dev.BDF, dev.Kind)
switch dev.Kind {
case models.DeviceKindCPU:
if component == "" {
component = "CPU"
}
if location == "" {
location = "CPU"
}
case models.DeviceKindMemory:
component = firstNonEmptyString(dev.PartNumber, dev.Model, "Memory")
case models.DeviceKindPCIe, models.DeviceKindGPU, models.DeviceKindNetwork:
if location == "" {
location = firstNonEmptyString(dev.Slot, dev.BDF, "PCIe")
}
case models.DeviceKindPSU:
component = firstNonEmptyString(dev.Model, "Power Supply")
}
return component, manufacturer, location
}
func inferCSVVendor(dev models.HardwareDevice) string {
switch dev.Kind {
case models.DeviceKindCPU:
return ""
default:
return ""
}
}
func firstNonEmptyString(values ...string) string {
for _, value := range values {
if strings.TrimSpace(value) != "" {
return strings.TrimSpace(value)
}
}
return ""
}

View File

@@ -0,0 +1,79 @@
package exporter
import (
"bytes"
"encoding/csv"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestExportCSV_IncludesAllComponentTypesWithUsableSerials(t *testing.T) {
result := &models.AnalysisResult{
FRU: []models.FRUInfo{
{ProductName: "FRU Board", SerialNumber: "FRU-001", Manufacturer: "ACME"},
},
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
ProductName: "X12",
SerialNumber: "BOARD-001",
Manufacturer: "Supermicro",
},
CPUs: []models.CPU{
{Socket: 0, Model: "Xeon", SerialNumber: "CPU-001"},
},
Memory: []models.MemoryDIMM{
{Slot: "DIMM0", PartNumber: "MEM-PN", SerialNumber: "MEM-001", Manufacturer: "Samsung"},
},
Storage: []models.Storage{
{Slot: "U.2-1", Model: "PM9A3", SerialNumber: "SSD-001", Manufacturer: "Samsung"},
},
GPUs: []models.GPU{
{Slot: "GPU1", Model: "H200", SerialNumber: "GPU-001", Manufacturer: "NVIDIA"},
},
PCIeDevices: []models.PCIeDevice{
{Slot: "PCIe1", DeviceClass: "NVSwitch", SerialNumber: "PCIE-001", Manufacturer: "NVIDIA"},
},
NetworkAdapters: []models.NetworkAdapter{
{Slot: "Slot 17", Location: "#CPU0_PCIE4", Model: "I350", SerialNumber: "NIC-001", Vendor: "Intel"},
{Slot: "Slot 18", Model: "skip-na", SerialNumber: "N/A", Vendor: "Intel"},
},
NetworkCards: []models.NIC{
{Model: "Legacy NIC", SerialNumber: "LNIC-001"},
},
PowerSupply: []models.PSU{
{Slot: "PSU0", Model: "GW-CRPS3000LW", SerialNumber: "PSU-001", Vendor: "Great Wall"},
},
},
}
var buf bytes.Buffer
if err := New(result).ExportCSV(&buf); err != nil {
t.Fatalf("ExportCSV failed: %v", err)
}
rows, err := csv.NewReader(bytes.NewReader(buf.Bytes())).ReadAll()
if err != nil {
t.Fatalf("read csv: %v", err)
}
if len(rows) < 2 {
t.Fatalf("expected data rows, got %d", len(rows))
}
serials := make(map[string]bool)
for _, row := range rows[1:] {
if len(row) > 1 {
serials[row[1]] = true
}
}
want := []string{"FRU-001", "BOARD-001", "CPU-001", "MEM-001", "SSD-001", "GPU-001", "PCIE-001", "NIC-001", "LNIC-001", "PSU-001"}
for _, sn := range want {
if !serials[sn] {
t.Fatalf("expected serial %s in csv export", sn)
}
}
if serials["N/A"] {
t.Fatalf("did not expect unusable serial N/A in export")
}
}

View File

@@ -0,0 +1,164 @@
package exporter
import (
"encoding/json"
"os"
"path/filepath"
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
// TestGenerateReanimatorExample generates an example reanimator.json file
// This test is marked as skipped by default - run with: go test -v -run TestGenerateReanimatorExample
func TestGenerateReanimatorExample(t *testing.T) {
t.Skip("Skip by default - run manually to generate example")
// Create realistic test data matching import-example-full.json structure
result := &models.AnalysisResult{
Filename: "redfish://10.10.10.103",
SourceType: "api",
Protocol: "redfish",
TargetHost: "10.10.10.103",
CollectedAt: time.Date(2026, 2, 10, 15, 30, 0, 0, time.UTC),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
Manufacturer: "Supermicro",
ProductName: "X12DPG-QT6",
SerialNumber: "21D634101",
PartNumber: "X12DPG-QT6-REV1.01",
UUID: "d7ef2fe5-2fd0-11f0-910a-346f11040868",
},
Firmware: []models.FirmwareInfo{
{DeviceName: "BIOS", Version: "06.08.05"},
{DeviceName: "BMC", Version: "5.17.00"},
{DeviceName: "CPLD", Version: "01.02.03"},
},
CPUs: []models.CPU{
{
Socket: 0,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
{
Socket: 1,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
},
Memory: []models.MemoryDIMM{
{
Slot: "CPU0_C0D0",
Location: "CPU0_C0D0",
Present: true,
SizeMB: 32768,
Type: "DDR5",
MaxSpeedMHz: 4800,
CurrentSpeedMHz: 4800,
Manufacturer: "Hynix",
SerialNumber: "80AD032419E17CEEC1",
PartNumber: "HMCG88AGBRA191N",
Status: "OK",
},
{
Slot: "CPU1_C0D0",
Location: "CPU1_C0D0",
Present: true,
SizeMB: 32768,
Type: "DDR5",
MaxSpeedMHz: 4800,
CurrentSpeedMHz: 4800,
Manufacturer: "Hynix",
SerialNumber: "80AD032419E17D6FBA",
PartNumber: "HMCG88AGBRA191N",
Status: "OK",
},
},
Storage: []models.Storage{
{
Slot: "OB01",
Type: "NVMe",
Model: "INTEL SSDPF2KX076T1",
SizeGB: 7680,
SerialNumber: "BTAX41900GF87P6DGN",
Manufacturer: "Intel",
Firmware: "9CV10510",
Interface: "NVMe",
Present: true,
},
{
Slot: "OB02",
Type: "NVMe",
Model: "INTEL SSDPF2KX076T1",
SizeGB: 7680,
SerialNumber: "BTAX41900BEG7P6DGN",
Manufacturer: "Intel",
Firmware: "9CV10510",
Interface: "NVMe",
Present: true,
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "PCIeCard1",
VendorID: 32902,
DeviceID: 2912,
BDF: "0000:18:00.0",
DeviceClass: "MassStorageController",
Manufacturer: "Intel",
PartNumber: "RAID Controller",
SerialNumber: "RAID-001-12345",
LinkWidth: 8,
LinkSpeed: "Gen3",
MaxLinkWidth: 8,
MaxLinkSpeed: "Gen3",
},
},
PowerSupply: []models.PSU{
{
Slot: "0",
Present: true,
Model: "GW-CRPS3000LW",
Vendor: "Great Wall",
WattageW: 3000,
SerialNumber: "2P06C102610",
PartNumber: "V0310C9000000000",
Firmware: "00.03.05",
Status: "OK",
InputType: "ACWideRange",
InputPowerW: 137,
OutputPowerW: 104,
InputVoltage: 215.25,
},
},
},
}
// Convert to Reanimator format
reanimator, err := ConvertToReanimator(result)
if err != nil {
t.Fatalf("ConvertToReanimator failed: %v", err)
}
// Marshal to JSON with indentation
jsonData, err := json.MarshalIndent(reanimator, "", " ")
if err != nil {
t.Fatalf("Failed to marshal JSON: %v", err)
}
// Write to example file
examplePath := filepath.Join("../../example/docs", "export-example-logpile.json")
if err := os.WriteFile(examplePath, jsonData, 0644); err != nil {
t.Fatalf("Failed to write example file: %v", err)
}
t.Logf("Generated example file: %s", examplePath)
t.Logf("JSON length: %d bytes", len(jsonData))
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,289 @@
package exporter
import (
"encoding/json"
"strings"
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
// TestFullReanimatorExport tests complete export with realistic data
func TestFullReanimatorExport(t *testing.T) {
// Create a realistic AnalysisResult similar to import-example-full.json
result := &models.AnalysisResult{
Filename: "redfish://10.10.10.103",
SourceType: "api",
Protocol: "redfish",
TargetHost: "10.10.10.103",
CollectedAt: time.Date(2026, 2, 10, 15, 30, 0, 0, time.UTC),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
Manufacturer: "Supermicro",
ProductName: "X12DPG-QT6",
SerialNumber: "21D634101",
PartNumber: "X12DPG-QT6-REV1.01",
UUID: "d7ef2fe5-2fd0-11f0-910a-346f11040868",
},
Firmware: []models.FirmwareInfo{
{DeviceName: "BIOS", Version: "06.08.05"},
{DeviceName: "BMC", Version: "5.17.00"},
{DeviceName: "CPLD", Version: "01.02.03"},
},
CPUs: []models.CPU{
{
Socket: 0,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
{
Socket: 1,
Model: "INTEL(R) XEON(R) GOLD 6530",
Cores: 32,
Threads: 64,
FrequencyMHz: 2100,
MaxFreqMHz: 4000,
},
},
Memory: []models.MemoryDIMM{
{
Slot: "CPU0_C0D0",
Location: "CPU0_C0D0",
Present: true,
SizeMB: 32768,
Type: "DDR5",
MaxSpeedMHz: 4800,
CurrentSpeedMHz: 4800,
Manufacturer: "Hynix",
SerialNumber: "80AD032419E17CEEC1",
PartNumber: "HMCG88AGBRA191N",
Status: "OK",
},
{
Slot: "CPU0_C1D0",
Location: "CPU0_C1D0",
Present: false,
SizeMB: 0,
Type: "",
MaxSpeedMHz: 0,
CurrentSpeedMHz: 0,
Status: "Empty",
},
},
Storage: []models.Storage{
{
Slot: "OB01",
Type: "NVMe",
Model: "INTEL SSDPF2KX076T1",
SizeGB: 7680,
SerialNumber: "BTAX41900GF87P6DGN",
Manufacturer: "Intel",
Firmware: "9CV10510",
Interface: "NVMe",
Present: true,
},
{
Slot: "FP00HDD00",
Type: "HDD",
Model: "ST12000NM0008",
SizeGB: 12000,
SerialNumber: "ZJV01234ABC",
Manufacturer: "Seagate",
Firmware: "SN03",
Interface: "SATA",
Present: true,
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "PCIeCard1",
VendorID: 32902,
DeviceID: 2912,
BDF: "0000:18:00.0",
DeviceClass: "MassStorageController",
Manufacturer: "Intel",
PartNumber: "RAID Controller RSP3DD080F",
LinkWidth: 8,
LinkSpeed: "Gen3",
MaxLinkWidth: 8,
MaxLinkSpeed: "Gen3",
SerialNumber: "RAID-001-12345",
},
{
Slot: "PCIeCard2",
VendorID: 5555,
DeviceID: 4401,
BDF: "0000:3b:00.0",
DeviceClass: "NetworkController",
Manufacturer: "Mellanox",
PartNumber: "ConnectX-5",
LinkWidth: 16,
LinkSpeed: "Gen3",
MaxLinkWidth: 16,
MaxLinkSpeed: "Gen3",
SerialNumber: "MT2892012345",
},
},
PowerSupply: []models.PSU{
{
Slot: "0",
Present: true,
Model: "GW-CRPS3000LW",
Vendor: "Great Wall",
WattageW: 3000,
SerialNumber: "2P06C102610",
PartNumber: "V0310C9000000000",
Firmware: "00.03.05",
Status: "OK",
InputType: "ACWideRange",
InputPowerW: 137,
OutputPowerW: 104,
InputVoltage: 215.25,
},
},
},
}
// Convert to Reanimator format
reanimator, err := ConvertToReanimator(result)
if err != nil {
t.Fatalf("ConvertToReanimator failed: %v", err)
}
// Verify top-level fields
if reanimator.Filename != "redfish://10.10.10.103" {
t.Errorf("Filename mismatch: got %q", reanimator.Filename)
}
if reanimator.SourceType != "api" {
t.Errorf("SourceType mismatch: got %q", reanimator.SourceType)
}
if reanimator.Protocol != "redfish" {
t.Errorf("Protocol mismatch: got %q", reanimator.Protocol)
}
if reanimator.TargetHost != "10.10.10.103" {
t.Errorf("TargetHost mismatch: got %q", reanimator.TargetHost)
}
if reanimator.CollectedAt != "2026-02-10T15:30:00Z" {
t.Errorf("CollectedAt mismatch: got %q", reanimator.CollectedAt)
}
// Verify hardware sections
hw := reanimator.Hardware
// Board
if hw.Board.SerialNumber != "21D634101" {
t.Errorf("Board serial mismatch: got %q", hw.Board.SerialNumber)
}
// Firmware
if len(hw.Firmware) != 3 {
t.Errorf("Expected 3 firmware entries, got %d", len(hw.Firmware))
}
// CPUs
if len(hw.CPUs) != 2 {
t.Fatalf("Expected 2 CPUs, got %d", len(hw.CPUs))
}
if hw.CPUs[0].Manufacturer != "Intel" {
t.Errorf("CPU manufacturer not inferred: got %q", hw.CPUs[0].Manufacturer)
}
if hw.CPUs[0].Status != "Unknown" {
t.Errorf("CPU status mismatch: got %q", hw.CPUs[0].Status)
}
// Memory (empty slots are excluded)
if len(hw.Memory) != 1 {
t.Errorf("Expected 1 memory entry (installed only), got %d", len(hw.Memory))
}
// Storage
if len(hw.Storage) != 2 {
t.Errorf("Expected 2 storage devices, got %d", len(hw.Storage))
}
if hw.Storage[0].Status != "Unknown" {
t.Errorf("Storage status mismatch: got %q", hw.Storage[0].Status)
}
// PCIe devices
if len(hw.PCIeDevices) != 2 {
t.Errorf("Expected 2 PCIe devices, got %d", len(hw.PCIeDevices))
}
if hw.PCIeDevices[0].Model == "" {
t.Error("PCIe model should be populated from PartNumber")
}
// Power supplies
if len(hw.PowerSupplies) != 1 {
t.Errorf("Expected 1 PSU, got %d", len(hw.PowerSupplies))
}
// Verify JSON marshaling works
jsonData, err := json.MarshalIndent(reanimator, "", " ")
if err != nil {
t.Fatalf("Failed to marshal to JSON: %v", err)
}
// Check that JSON contains expected fields
jsonStr := string(jsonData)
expectedFields := []string{
`"filename"`,
`"source_type"`,
`"protocol"`,
`"target_host"`,
`"collected_at"`,
`"hardware"`,
`"board"`,
`"cpus"`,
`"memory"`,
`"storage"`,
`"pcie_devices"`,
`"power_supplies"`,
`"firmware"`,
}
for _, field := range expectedFields {
if !strings.Contains(jsonStr, field) {
t.Errorf("JSON missing expected field: %s", field)
}
}
// Optional: print JSON for manual inspection (commented out for normal test runs)
// t.Logf("Generated Reanimator JSON:\n%s", string(jsonData))
}
// TestReanimatorExportWithoutTargetHost tests that target_host is inferred from filename
func TestReanimatorExportWithoutTargetHost(t *testing.T) {
result := &models.AnalysisResult{
Filename: "redfish://192.168.1.100",
SourceType: "api",
Protocol: "redfish",
TargetHost: "", // Empty - should be inferred
CollectedAt: time.Now(),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
SerialNumber: "TEST123",
},
},
}
reanimator, err := ConvertToReanimator(result)
if err != nil {
t.Fatalf("ConvertToReanimator failed: %v", err)
}
if reanimator.TargetHost != "192.168.1.100" {
t.Errorf("Expected target_host to be inferred from filename, got %q", reanimator.TargetHost)
}
}

View File

@@ -0,0 +1,254 @@
package exporter
// ReanimatorExport represents the top-level structure for Reanimator format export
type ReanimatorExport struct {
Filename string `json:"filename"`
SourceType string `json:"source_type,omitempty"`
Protocol string `json:"protocol,omitempty"`
TargetHost string `json:"target_host,omitempty"`
CollectedAt string `json:"collected_at"` // RFC3339 format
Hardware ReanimatorHardware `json:"hardware"`
}
// ReanimatorHardware contains all hardware components
type ReanimatorHardware struct {
Board ReanimatorBoard `json:"board"`
Firmware []ReanimatorFirmware `json:"firmware,omitempty"`
CPUs []ReanimatorCPU `json:"cpus,omitempty"`
Memory []ReanimatorMemory `json:"memory,omitempty"`
Storage []ReanimatorStorage `json:"storage,omitempty"`
PCIeDevices []ReanimatorPCIe `json:"pcie_devices,omitempty"`
PowerSupplies []ReanimatorPSU `json:"power_supplies,omitempty"`
Sensors *ReanimatorSensors `json:"sensors,omitempty"`
EventLogs []ReanimatorEventLog `json:"event_logs,omitempty"`
}
// ReanimatorBoard represents motherboard/server information
type ReanimatorBoard struct {
Manufacturer string `json:"manufacturer,omitempty"`
ProductName string `json:"product_name,omitempty"`
SerialNumber string `json:"serial_number"`
PartNumber string `json:"part_number,omitempty"`
UUID string `json:"uuid,omitempty"`
}
// ReanimatorFirmware represents firmware version information
type ReanimatorFirmware struct {
DeviceName string `json:"device_name"`
Version string `json:"version"`
}
type ReanimatorStatusHistoryEntry struct {
Status string `json:"status"`
ChangedAt string `json:"changed_at"`
Details string `json:"details,omitempty"`
}
// ReanimatorCPU represents processor information
type ReanimatorCPU struct {
Socket int `json:"socket"`
Model string `json:"model,omitempty"`
Cores int `json:"cores,omitempty"`
Threads int `json:"threads,omitempty"`
FrequencyMHz int `json:"frequency_mhz,omitempty"`
MaxFrequencyMHz int `json:"max_frequency_mhz,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
PowerW float64 `json:"power_w,omitempty"`
Throttled *bool `json:"throttled,omitempty"`
CorrectableErrorCount int64 `json:"correctable_error_count,omitempty"`
UncorrectableErrorCount int64 `json:"uncorrectable_error_count,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Present *bool `json:"present,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorMemory represents a memory module (DIMM)
type ReanimatorMemory struct {
Slot string `json:"slot"`
Location string `json:"location,omitempty"`
Present *bool `json:"present,omitempty"`
SizeMB int `json:"size_mb,omitempty"`
Type string `json:"type,omitempty"`
MaxSpeedMHz int `json:"max_speed_mhz,omitempty"`
CurrentSpeedMHz int `json:"current_speed_mhz,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
CorrectableECCErrorCount int64 `json:"correctable_ecc_error_count,omitempty"`
UncorrectableECCErrorCount int64 `json:"uncorrectable_ecc_error_count,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
SpareBlocksRemainingPct float64 `json:"spare_blocks_remaining_pct,omitempty"`
PerformanceDegraded *bool `json:"performance_degraded,omitempty"`
DataLossDetected *bool `json:"data_loss_detected,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorStorage represents a storage device
type ReanimatorStorage struct {
Slot string `json:"slot"`
Type string `json:"type,omitempty"`
Model string `json:"model"`
SizeGB int `json:"size_gb,omitempty"`
SerialNumber string `json:"serial_number"`
Manufacturer string `json:"manufacturer,omitempty"`
Firmware string `json:"firmware,omitempty"`
Interface string `json:"interface,omitempty"`
Present *bool `json:"present,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
PowerOnHours int64 `json:"power_on_hours,omitempty"`
PowerCycles int64 `json:"power_cycles,omitempty"`
UnsafeShutdowns int64 `json:"unsafe_shutdowns,omitempty"`
MediaErrors int64 `json:"media_errors,omitempty"`
ErrorLogEntries int64 `json:"error_log_entries,omitempty"`
WrittenBytes int64 `json:"written_bytes,omitempty"`
ReadBytes int64 `json:"read_bytes,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
RemainingEndurancePct *int `json:"remaining_endurance_pct,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
AvailableSparePct float64 `json:"available_spare_pct,omitempty"`
ReallocatedSectors int64 `json:"reallocated_sectors,omitempty"`
CurrentPendingSectors int64 `json:"current_pending_sectors,omitempty"`
OfflineUncorrectable int64 `json:"offline_uncorrectable,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorPCIe represents a PCIe device
type ReanimatorPCIe struct {
Slot string `json:"slot"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
NUMANode int `json:"numa_node,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
PowerW float64 `json:"power_w,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
ECCCorrectedTotal int64 `json:"ecc_corrected_total,omitempty"`
ECCUncorrectedTotal int64 `json:"ecc_uncorrected_total,omitempty"`
HWSlowdown *bool `json:"hw_slowdown,omitempty"`
BatteryChargePct float64 `json:"battery_charge_pct,omitempty"`
BatteryHealthPct float64 `json:"battery_health_pct,omitempty"`
BatteryTemperatureC float64 `json:"battery_temperature_c,omitempty"`
BatteryVoltageV float64 `json:"battery_voltage_v,omitempty"`
BatteryReplaceRequired *bool `json:"battery_replace_required,omitempty"`
SFPTemperatureC float64 `json:"sfp_temperature_c,omitempty"`
SFPTXPowerDBm float64 `json:"sfp_tx_power_dbm,omitempty"`
SFPRXPowerDBm float64 `json:"sfp_rx_power_dbm,omitempty"`
SFPVoltageV float64 `json:"sfp_voltage_v,omitempty"`
SFPBiasMA float64 `json:"sfp_bias_ma,omitempty"`
BDF string `json:"-"`
DeviceClass string `json:"device_class,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Model string `json:"model,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
Present *bool `json:"present,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorPSU represents a power supply unit
type ReanimatorPSU struct {
Slot string `json:"slot"`
Present *bool `json:"present,omitempty"`
Model string `json:"model,omitempty"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW float64 `json:"input_power_w,omitempty"`
OutputPowerW float64 `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
type ReanimatorEventLog struct {
Source string `json:"source"`
EventTime string `json:"event_time,omitempty"`
Severity string `json:"severity,omitempty"`
MessageID string `json:"message_id,omitempty"`
Message string `json:"message"`
ComponentRef string `json:"component_ref,omitempty"`
Fingerprint string `json:"fingerprint,omitempty"`
IsActive *bool `json:"is_active,omitempty"`
RawPayload map[string]any `json:"raw_payload,omitempty"`
}
type ReanimatorSensors struct {
Fans []ReanimatorFanSensor `json:"fans,omitempty"`
Power []ReanimatorPowerSensor `json:"power,omitempty"`
Temperatures []ReanimatorTemperatureSensor `json:"temperatures,omitempty"`
Other []ReanimatorOtherSensor `json:"other,omitempty"`
}
type ReanimatorFanSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
RPM int `json:"rpm,omitempty"`
Status string `json:"status,omitempty"`
}
type ReanimatorPowerSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
VoltageV float64 `json:"voltage_v,omitempty"`
CurrentA float64 `json:"current_a,omitempty"`
PowerW float64 `json:"power_w,omitempty"`
Status string `json:"status,omitempty"`
}
type ReanimatorTemperatureSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
Celsius float64 `json:"celsius,omitempty"`
ThresholdWarningCelsius float64 `json:"threshold_warning_celsius,omitempty"`
ThresholdCriticalCelsius float64 `json:"threshold_critical_celsius,omitempty"`
Status string `json:"status,omitempty"`
}
type ReanimatorOtherSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
Value float64 `json:"value,omitempty"`
Unit string `json:"unit,omitempty"`
Status string `json:"status,omitempty"`
}

View File

@@ -9,16 +9,17 @@ const (
// AnalysisResult contains all parsed data from an archive
type AnalysisResult struct {
Filename string `json:"filename"`
SourceType string `json:"source_type,omitempty"` // archive | api
Protocol string `json:"protocol,omitempty"` // redfish | ipmi
TargetHost string `json:"target_host,omitempty"` // BMC host for live collect
CollectedAt time.Time `json:"collected_at,omitempty"` // Collection/upload timestamp
RawPayloads map[string]any `json:"raw_payloads,omitempty"` // Additional source payloads (e.g. Redfish tree)
Events []Event `json:"events"`
FRU []FRUInfo `json:"fru"`
Sensors []SensorReading `json:"sensors"`
Hardware *HardwareConfig `json:"hardware"`
Filename string `json:"filename"`
SourceType string `json:"source_type,omitempty"` // archive | api
Protocol string `json:"protocol,omitempty"` // redfish | ipmi
TargetHost string `json:"target_host,omitempty"` // BMC host for live collect
SourceTimezone string `json:"source_timezone,omitempty"` // Source timezone/offset used during collection (e.g. +08:00)
CollectedAt time.Time `json:"collected_at,omitempty"` // Collection/upload timestamp
RawPayloads map[string]any `json:"raw_payloads,omitempty"` // Additional source payloads (e.g. Redfish tree)
Events []Event `json:"events"`
FRU []FRUInfo `json:"fru"`
Sensors []SensorReading `json:"sensors"`
Hardware *HardwareConfig `json:"hardware"`
}
// Event represents a single log event
@@ -43,6 +44,19 @@ const (
SeverityInfo Severity = "info"
)
// StatusAtCollection captures component status at a specific timestamp.
type StatusAtCollection struct {
Status string `json:"status"`
At time.Time `json:"at"`
}
// StatusHistoryEntry represents a status transition point.
type StatusHistoryEntry struct {
Status string `json:"status"`
ChangedAt time.Time `json:"changed_at"`
Details string `json:"details,omitempty"`
}
// SensorReading represents a single sensor reading
type SensorReading struct {
Name string `json:"name"`
@@ -71,9 +85,11 @@ type FRUInfo struct {
type HardwareConfig struct {
Firmware []FirmwareInfo `json:"firmware,omitempty"`
BoardInfo BoardInfo `json:"board,omitempty"`
Devices []HardwareDevice `json:"devices,omitempty"`
CPUs []CPU `json:"cpus,omitempty"`
Memory []MemoryDIMM `json:"memory,omitempty"`
Storage []Storage `json:"storage,omitempty"`
Volumes []StorageVolume `json:"volumes,omitempty"`
PCIeDevices []PCIeDevice `json:"pcie_devices,omitempty"`
GPUs []GPU `json:"gpus,omitempty"`
NetworkCards []NIC `json:"network_cards,omitempty"`
@@ -81,27 +97,93 @@ type HardwareConfig struct {
PowerSupply []PSU `json:"power_supplies,omitempty"`
}
const (
DeviceKindBoard = "board"
DeviceKindCPU = "cpu"
DeviceKindMemory = "memory"
DeviceKindStorage = "storage"
DeviceKindPCIe = "pcie"
DeviceKindGPU = "gpu"
DeviceKindNetwork = "network"
DeviceKindPSU = "psu"
)
// HardwareDevice is canonical device inventory used across UI and exports.
type HardwareDevice struct {
ID string `json:"id"`
Kind string `json:"kind"`
Source string `json:"source,omitempty"`
Slot string `json:"slot,omitempty"`
Location string `json:"location,omitempty"`
BDF string `json:"bdf,omitempty"`
DeviceClass string `json:"device_class,omitempty"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
Model string `json:"model,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Type string `json:"type,omitempty"`
Interface string `json:"interface,omitempty"`
Present *bool `json:"present,omitempty"`
SizeMB int `json:"size_mb,omitempty"`
SizeGB int `json:"size_gb,omitempty"`
Cores int `json:"cores,omitempty"`
Threads int `json:"threads,omitempty"`
FrequencyMHz int `json:"frequency_mhz,omitempty"`
MaxFreqMHz int `json:"max_frequency_mhz,omitempty"`
PortCount int `json:"port_count,omitempty"`
PortType string `json:"port_type,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
RemainingEndurancePct *int `json:"remaining_endurance_pct,omitempty"` // 0-100 %; nil = not reported
NUMANode int `json:"numa_node,omitempty"` // 0 = not reported/N/A
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// FirmwareInfo represents firmware version information
type FirmwareInfo struct {
DeviceName string `json:"device_name"`
Version string `json:"version"`
BuildTime string `json:"build_time,omitempty"`
DeviceName string `json:"device_name"`
Description string `json:"description,omitempty"`
Version string `json:"version"`
BuildTime string `json:"build_time,omitempty"`
}
// BoardInfo represents motherboard/system information
type BoardInfo struct {
Manufacturer string `json:"manufacturer,omitempty"`
ProductName string `json:"product_name,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Version string `json:"version,omitempty"`
UUID string `json:"uuid,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
ProductName string `json:"product_name,omitempty"`
Description string `json:"description,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Version string `json:"version,omitempty"`
UUID string `json:"uuid,omitempty"`
BMCMACAddress string `json:"bmc_mac_address,omitempty"`
}
// CPU represents processor information
type CPU struct {
Socket int `json:"socket"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Cores int `json:"cores"`
Threads int `json:"threads"`
FrequencyMHz int `json:"frequency_mhz"`
@@ -112,12 +194,21 @@ type CPU struct {
TDP int `json:"tdp_w,omitempty"`
PPIN string `json:"ppin,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// MemoryDIMM represents a memory module
type MemoryDIMM struct {
Slot string `json:"slot"`
Location string `json:"location"`
Description string `json:"description,omitempty"`
Present bool `json:"present"`
SizeMB int `json:"size_mb"`
Type string `json:"type"`
@@ -129,26 +220,57 @@ type MemoryDIMM struct {
PartNumber string `json:"part_number,omitempty"`
Status string `json:"status,omitempty"`
Ranks int `json:"ranks,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// Storage represents a storage device
type Storage struct {
Slot string `json:"slot"`
Type string `json:"type"`
Model string `json:"model"`
SizeGB int `json:"size_gb"`
SerialNumber string `json:"serial_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Firmware string `json:"firmware,omitempty"`
Interface string `json:"interface,omitempty"`
Present bool `json:"present"`
Location string `json:"location,omitempty"` // Front/Rear
BackplaneID int `json:"backplane_id,omitempty"`
Slot string `json:"slot"`
Type string `json:"type"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
SizeGB int `json:"size_gb"`
SerialNumber string `json:"serial_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Firmware string `json:"firmware,omitempty"`
Interface string `json:"interface,omitempty"`
Present bool `json:"present"`
Location string `json:"location,omitempty"` // Front/Rear
BackplaneID int `json:"backplane_id,omitempty"`
RemainingEndurancePct *int `json:"remaining_endurance_pct,omitempty"` // 0-100 %; nil = not reported
Status string `json:"status,omitempty"`
Details map[string]any `json:"details,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// StorageVolume represents a logical storage volume (RAID/VROC/etc.).
type StorageVolume struct {
ID string `json:"id,omitempty"`
Name string `json:"name,omitempty"`
Controller string `json:"controller,omitempty"`
RAIDLevel string `json:"raid_level,omitempty"`
SizeGB int `json:"size_gb,omitempty"`
CapacityBytes int64 `json:"capacity_bytes,omitempty"`
Status string `json:"status,omitempty"`
Bootable bool `json:"bootable,omitempty"`
Encrypted bool `json:"encrypted,omitempty"`
}
// PCIeDevice represents a PCIe device
type PCIeDevice struct {
Slot string `json:"slot"`
Description string `json:"description,omitempty"`
VendorID int `json:"vendor_id"`
DeviceID int `json:"device_id"`
BDF string `json:"bdf"`
@@ -161,12 +283,22 @@ type PCIeDevice struct {
PartNumber string `json:"part_number,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
NUMANode int `json:"numa_node,omitempty"` // 0 = not reported/N/A
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// NIC represents a network interface card
type NIC struct {
Name string `json:"name"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
MACAddress string `json:"mac_address"`
SpeedMbps int `json:"speed_mbps,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
@@ -174,21 +306,29 @@ type NIC struct {
// PSU represents a power supply unit
type PSU struct {
Slot string `json:"slot"`
Present bool `json:"present"`
Model string `json:"model"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
OutputVoltage float64 `json:"output_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
Slot string `json:"slot"`
Present bool `json:"present"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
OutputVoltage float64 `json:"output_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
Details map[string]any `json:"details,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// GPU represents a graphics processing unit
@@ -196,6 +336,7 @@ type GPU struct {
Slot string `json:"slot"`
Location string `json:"location,omitempty"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
@@ -220,6 +361,13 @@ type GPU struct {
CurrentLinkWidth int `json:"current_link_width,omitempty"`
CurrentLinkSpeed string `json:"current_link_speed,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// NetworkAdapter represents a network adapter with detailed info
@@ -227,7 +375,9 @@ type NetworkAdapter struct {
Slot string `json:"slot"`
Location string `json:"location"`
Present bool `json:"present"`
BDF string `json:"bdf,omitempty"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Vendor string `json:"vendor,omitempty"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
@@ -237,5 +387,17 @@ type NetworkAdapter struct {
PortCount int `json:"port_count,omitempty"`
PortType string `json:"port_type,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
NUMANode int `json:"numa_node,omitempty"` // 0 = not reported/N/A
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}

View File

@@ -9,25 +9,45 @@ import (
"io"
"os"
"path/filepath"
"sort"
"strings"
"time"
)
const maxSingleFileSize = 10 * 1024 * 1024
const maxZipArchiveSize = 50 * 1024 * 1024
const maxGzipDecompressedSize = 50 * 1024 * 1024
var supportedArchiveExt = map[string]struct{}{
".gz": {},
".tgz": {},
".tar": {},
".sds": {},
".zip": {},
".txt": {},
".log": {},
}
// ExtractedFile represents a file extracted from archive
type ExtractedFile struct {
Path string
Content []byte
Path string
Content []byte
ModTime time.Time
Truncated bool
TruncatedMessage string
}
// ExtractArchive extracts tar.gz or zip archive and returns file contents
func ExtractArchive(archivePath string) ([]ExtractedFile, error) {
if !IsSupportedArchiveFilename(archivePath) {
return nil, fmt.Errorf("unsupported archive format: %s", strings.ToLower(filepath.Ext(archivePath)))
}
ext := strings.ToLower(filepath.Ext(archivePath))
switch ext {
case ".gz", ".tgz":
return extractTarGz(archivePath)
case ".tar":
case ".tar", ".sds":
return extractTar(archivePath)
case ".zip":
return extractZip(archivePath)
@@ -40,13 +60,18 @@ func ExtractArchive(archivePath string) ([]ExtractedFile, error) {
// ExtractArchiveFromReader extracts archive from reader
func ExtractArchiveFromReader(r io.Reader, filename string) ([]ExtractedFile, error) {
if !IsSupportedArchiveFilename(filename) {
return nil, fmt.Errorf("unsupported archive format: %s", strings.ToLower(filepath.Ext(filename)))
}
ext := strings.ToLower(filepath.Ext(filename))
switch ext {
case ".gz", ".tgz":
return extractTarGzFromReader(r, filename)
case ".tar":
case ".tar", ".sds":
return extractTarFromReader(r)
case ".zip":
return extractZipFromReader(r)
case ".txt", ".log":
return extractSingleFileFromReader(r, filename)
default:
@@ -54,6 +79,27 @@ func ExtractArchiveFromReader(r io.Reader, filename string) ([]ExtractedFile, er
}
}
// IsSupportedArchiveFilename reports whether filename extension is supported by archive extractor.
func IsSupportedArchiveFilename(filename string) bool {
ext := strings.ToLower(strings.TrimSpace(filepath.Ext(filename)))
if ext == "" {
return false
}
_, ok := supportedArchiveExt[ext]
return ok
}
// SupportedArchiveExtensions returns sorted list of archive/file extensions
// accepted by archive extractor.
func SupportedArchiveExtensions() []string {
out := make([]string, 0, len(supportedArchiveExt))
for ext := range supportedArchiveExt {
out = append(out, ext)
}
sort.Strings(out)
return out
}
func extractTarGz(archivePath string) ([]ExtractedFile, error) {
f, err := os.Open(archivePath)
if err != nil {
@@ -105,6 +151,7 @@ func extractTarFromReader(r io.Reader) ([]ExtractedFile, error) {
files = append(files, ExtractedFile{
Path: header.Name,
Content: content,
ModTime: header.ModTime,
})
}
@@ -118,12 +165,16 @@ func extractTarGzFromReader(r io.Reader, filename string) ([]ExtractedFile, erro
}
defer gzr.Close()
// Read all decompressed content into buffer
// Limit to 50MB for plain gzip files, 10MB per file for tar.gz
decompressed, err := io.ReadAll(io.LimitReader(gzr, 50*1024*1024))
// Read decompressed content with a hard cap.
// When the payload exceeds the cap, keep the first chunk and mark it as truncated.
decompressed, err := io.ReadAll(io.LimitReader(gzr, maxGzipDecompressedSize+1))
if err != nil {
return nil, fmt.Errorf("read gzip content: %w", err)
}
gzipTruncated := len(decompressed) > maxGzipDecompressedSize
if gzipTruncated {
decompressed = decompressed[:maxGzipDecompressedSize]
}
// Try to read as tar archive
tr := tar.NewReader(bytes.NewReader(decompressed))
@@ -139,12 +190,20 @@ func extractTarGzFromReader(r io.Reader, filename string) ([]ExtractedFile, erro
baseName = gzr.Name
}
return []ExtractedFile{
{
Path: baseName,
Content: decompressed,
},
}, nil
file := ExtractedFile{
Path: baseName,
Content: decompressed,
ModTime: gzr.ModTime,
}
if gzipTruncated {
file.Truncated = true
file.TruncatedMessage = fmt.Sprintf(
"decompressed gzip content exceeded %d bytes and was truncated",
maxGzipDecompressedSize,
)
}
return []ExtractedFile{file}, nil
}
return nil, fmt.Errorf("tar read: %w", err)
}
@@ -163,6 +222,7 @@ func extractTarGzFromReader(r io.Reader, filename string) ([]ExtractedFile, erro
files = append(files, ExtractedFile{
Path: header.Name,
Content: content,
ModTime: header.ModTime,
})
}
}
@@ -213,6 +273,59 @@ func extractZip(archivePath string) ([]ExtractedFile, error) {
files = append(files, ExtractedFile{
Path: f.Name,
Content: content,
ModTime: f.Modified,
})
}
return files, nil
}
func extractZipFromReader(r io.Reader) ([]ExtractedFile, error) {
// Read all data into memory with a hard cap
data, err := io.ReadAll(io.LimitReader(r, maxZipArchiveSize+1))
if err != nil {
return nil, fmt.Errorf("read zip data: %w", err)
}
if len(data) > maxZipArchiveSize {
return nil, fmt.Errorf("zip too large: max %d bytes", maxZipArchiveSize)
}
// Create a ReaderAt from the byte slice
readerAt := bytes.NewReader(data)
// Open the zip archive
zipReader, err := zip.NewReader(readerAt, int64(len(data)))
if err != nil {
return nil, fmt.Errorf("open zip: %w", err)
}
var files []ExtractedFile
for _, f := range zipReader.File {
if f.FileInfo().IsDir() {
continue
}
// Skip large files (>10MB)
if f.FileInfo().Size() > 10*1024*1024 {
continue
}
rc, err := f.Open()
if err != nil {
return nil, fmt.Errorf("open file %s: %w", f.Name, err)
}
content, err := io.ReadAll(rc)
rc.Close()
if err != nil {
return nil, fmt.Errorf("read file %s: %w", f.Name, err)
}
files = append(files, ExtractedFile{
Path: f.Name,
Content: content,
ModTime: f.Modified,
})
}
@@ -220,13 +333,24 @@ func extractZip(archivePath string) ([]ExtractedFile, error) {
}
func extractSingleFile(path string) ([]ExtractedFile, error) {
info, err := os.Stat(path)
if err != nil {
return nil, fmt.Errorf("stat file: %w", err)
}
f, err := os.Open(path)
if err != nil {
return nil, fmt.Errorf("open file: %w", err)
}
defer f.Close()
return extractSingleFileFromReader(f, filepath.Base(path))
files, err := extractSingleFileFromReader(f, filepath.Base(path))
if err != nil {
return nil, err
}
if len(files) > 0 {
files[0].ModTime = info.ModTime()
}
return files, nil
}
func extractSingleFileFromReader(r io.Reader, filename string) ([]ExtractedFile, error) {
@@ -234,16 +358,24 @@ func extractSingleFileFromReader(r io.Reader, filename string) ([]ExtractedFile,
if err != nil {
return nil, fmt.Errorf("read file content: %w", err)
}
if len(content) > maxSingleFileSize {
return nil, fmt.Errorf("file too large: max %d bytes", maxSingleFileSize)
truncated := len(content) > maxSingleFileSize
if truncated {
content = content[:maxSingleFileSize]
}
return []ExtractedFile{
{
Path: filepath.Base(filename),
Content: content,
},
}, nil
file := ExtractedFile{
Path: filepath.Base(filename),
Content: content,
}
if truncated {
file.Truncated = true
file.TruncatedMessage = fmt.Sprintf(
"file exceeded %d bytes and was truncated",
maxSingleFileSize,
)
}
return []ExtractedFile{file}, nil
}
// FindFileByPattern finds files matching pattern in extracted files

View File

@@ -1,6 +1,8 @@
package parser
import (
"archive/tar"
"bytes"
"os"
"path/filepath"
"strings"
@@ -46,3 +48,79 @@ func TestExtractArchiveTXT(t *testing.T) {
t.Fatalf("content mismatch")
}
}
func TestExtractArchiveFromReaderTXT_TruncatedWhenTooLarge(t *testing.T) {
large := bytes.Repeat([]byte("a"), maxSingleFileSize+1024)
files, err := ExtractArchiveFromReader(bytes.NewReader(large), "huge.log")
if err != nil {
t.Fatalf("extract huge txt from reader: %v", err)
}
if len(files) != 1 {
t.Fatalf("expected 1 file, got %d", len(files))
}
f := files[0]
if !f.Truncated {
t.Fatalf("expected file to be marked as truncated")
}
if got := len(f.Content); got != maxSingleFileSize {
t.Fatalf("expected truncated size %d, got %d", maxSingleFileSize, got)
}
if f.TruncatedMessage == "" {
t.Fatalf("expected truncation message")
}
}
func TestIsSupportedArchiveFilename(t *testing.T) {
cases := []struct {
name string
want bool
}{
{name: "dump.tar.gz", want: true},
{name: "nvidia-bug-report-1651124000923.log.gz", want: true},
{name: "snapshot.zip", want: true},
{name: "h3c_20250819.sds", want: true},
{name: "report.log", want: true},
{name: "xigmanas.txt", want: true},
{name: "raw_export.json", want: false},
{name: "archive.bin", want: false},
}
for _, tc := range cases {
got := IsSupportedArchiveFilename(tc.name)
if got != tc.want {
t.Fatalf("IsSupportedArchiveFilename(%q)=%v, want %v", tc.name, got, tc.want)
}
}
}
func TestExtractArchiveFromReaderSDS(t *testing.T) {
var buf bytes.Buffer
tw := tar.NewWriter(&buf)
payload := []byte("STARTTIME:0\nENDTIME:0\n")
if err := tw.WriteHeader(&tar.Header{
Name: "bmc/pack.info",
Mode: 0o600,
Size: int64(len(payload)),
}); err != nil {
t.Fatalf("write tar header: %v", err)
}
if _, err := tw.Write(payload); err != nil {
t.Fatalf("write tar payload: %v", err)
}
if err := tw.Close(); err != nil {
t.Fatalf("close tar writer: %v", err)
}
files, err := ExtractArchiveFromReader(bytes.NewReader(buf.Bytes()), "sample.sds")
if err != nil {
t.Fatalf("extract sds from reader: %v", err)
}
if len(files) != 1 {
t.Fatalf("expected 1 extracted file, got %d", len(files))
}
if files[0].Path != "bmc/pack.info" {
t.Fatalf("expected bmc/pack.info, got %q", files[0].Path)
}
}

View File

@@ -0,0 +1,135 @@
package parser
import (
"fmt"
"regexp"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
var manufacturedYearWeekPattern = regexp.MustCompile(`^\d{4}-W\d{2}$`)
// NormalizeManufacturedYearWeek converts common FRU manufacturing date formats
// into contract-compatible YYYY-Www values. Unknown or ambiguous inputs return "".
func NormalizeManufacturedYearWeek(raw string) string {
value := strings.TrimSpace(raw)
if value == "" {
return ""
}
upper := strings.ToUpper(value)
if manufacturedYearWeekPattern.MatchString(upper) {
return upper
}
layouts := []string{
time.RFC3339,
"2006-01-02T15:04:05",
"2006-01-02 15:04:05",
"2006-01-02",
"2006/01/02",
"01/02/2006 15:04:05",
"01/02/2006",
"01-02-2006",
"Mon Jan 2 15:04:05 2006",
"Mon Jan _2 15:04:05 2006",
"Jan 2 2006",
"Jan _2 2006",
}
for _, layout := range layouts {
if ts, err := time.Parse(layout, value); err == nil {
year, week := ts.ISOWeek()
return formatYearWeek(year, week)
}
}
return ""
}
func formatYearWeek(year, week int) string {
if year <= 0 || week <= 0 || week > 53 {
return ""
}
return fmt.Sprintf("%04d-W%02d", year, week)
}
// ApplyManufacturedYearWeekFromFRU attaches normalized manufactured_year_week to
// component details by exact serial-number match. Board-level FRU entries are not
// expanded to components.
func ApplyManufacturedYearWeekFromFRU(frus []models.FRUInfo, hw *models.HardwareConfig) {
if hw == nil || len(frus) == 0 {
return
}
bySerial := make(map[string]string, len(frus))
for _, fru := range frus {
serial := normalizeFRUSerial(fru.SerialNumber)
yearWeek := NormalizeManufacturedYearWeek(fru.MfgDate)
if serial == "" || yearWeek == "" {
continue
}
if _, exists := bySerial[serial]; exists {
continue
}
bySerial[serial] = yearWeek
}
if len(bySerial) == 0 {
return
}
for i := range hw.CPUs {
attachYearWeek(&hw.CPUs[i].Details, bySerial[normalizeFRUSerial(hw.CPUs[i].SerialNumber)])
}
for i := range hw.Memory {
attachYearWeek(&hw.Memory[i].Details, bySerial[normalizeFRUSerial(hw.Memory[i].SerialNumber)])
}
for i := range hw.Storage {
attachYearWeek(&hw.Storage[i].Details, bySerial[normalizeFRUSerial(hw.Storage[i].SerialNumber)])
}
for i := range hw.PCIeDevices {
attachYearWeek(&hw.PCIeDevices[i].Details, bySerial[normalizeFRUSerial(hw.PCIeDevices[i].SerialNumber)])
}
for i := range hw.GPUs {
attachYearWeek(&hw.GPUs[i].Details, bySerial[normalizeFRUSerial(hw.GPUs[i].SerialNumber)])
}
for i := range hw.NetworkAdapters {
attachYearWeek(&hw.NetworkAdapters[i].Details, bySerial[normalizeFRUSerial(hw.NetworkAdapters[i].SerialNumber)])
}
for i := range hw.PowerSupply {
attachYearWeek(&hw.PowerSupply[i].Details, bySerial[normalizeFRUSerial(hw.PowerSupply[i].SerialNumber)])
}
}
func attachYearWeek(details *map[string]any, yearWeek string) {
if yearWeek == "" {
return
}
if *details == nil {
*details = map[string]any{}
}
if existing, ok := (*details)["manufactured_year_week"]; ok && strings.TrimSpace(toString(existing)) != "" {
return
}
(*details)["manufactured_year_week"] = yearWeek
}
func normalizeFRUSerial(v string) string {
s := strings.TrimSpace(v)
if s == "" {
return ""
}
switch strings.ToUpper(s) {
case "N/A", "NA", "NULL", "UNKNOWN", "-", "0":
return ""
default:
return strings.ToUpper(s)
}
}
func toString(v any) string {
switch x := v.(type) {
case string:
return x
default:
return strings.TrimSpace(fmt.Sprint(v))
}
}

View File

@@ -0,0 +1,65 @@
package parser
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestNormalizeManufacturedYearWeek(t *testing.T) {
tests := []struct {
in string
want string
}{
{"2024-W07", "2024-W07"},
{"2024-02-13", "2024-W07"},
{"02/13/2024", "2024-W07"},
{"Tue Feb 13 12:00:00 2024", "2024-W07"},
{"", ""},
{"not-a-date", ""},
}
for _, tt := range tests {
if got := NormalizeManufacturedYearWeek(tt.in); got != tt.want {
t.Fatalf("NormalizeManufacturedYearWeek(%q) = %q, want %q", tt.in, got, tt.want)
}
}
}
func TestApplyManufacturedYearWeekFromFRU_AttachesByExactSerial(t *testing.T) {
hw := &models.HardwareConfig{
PowerSupply: []models.PSU{
{
Slot: "PSU0",
SerialNumber: "PSU-SN-001",
},
},
Storage: []models.Storage{
{
Slot: "OB01",
SerialNumber: "DISK-SN-001",
},
},
}
fru := []models.FRUInfo{
{
Description: "PSU0_FRU (ID 30)",
SerialNumber: "PSU-SN-001",
MfgDate: "2024-02-13",
},
{
Description: "Builtin FRU Device (ID 0)",
SerialNumber: "BOARD-SN-001",
MfgDate: "2024-02-01",
},
}
ApplyManufacturedYearWeekFromFRU(fru, hw)
if got := hw.PowerSupply[0].Details["manufactured_year_week"]; got != "2024-W07" {
t.Fatalf("expected PSU year week 2024-W07, got %#v", hw.PowerSupply[0].Details)
}
if hw.Storage[0].Details != nil {
t.Fatalf("expected unmatched storage serial to stay untouched, got %#v", hw.Storage[0].Details)
}
}

View File

@@ -9,7 +9,7 @@ type VendorParser interface {
// Name returns human-readable parser name
Name() string
// Vendor returns vendor identifier (e.g., "inspur", "supermicro", "dell")
// Vendor returns vendor identifier (e.g., "inspur", "dell", "h3c_g6")
Vendor() string
// Version returns parser version string

View File

@@ -3,6 +3,8 @@ package parser
import (
"fmt"
"io"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
@@ -62,11 +64,74 @@ func (p *BMCParser) parseFiles() error {
// Preserve filename
result.Filename = p.result.Filename
appendExtractionWarnings(result, p.files)
if result.CollectedAt.IsZero() {
if ts := inferCollectedAtFromExtractedFiles(p.files); !ts.IsZero() {
result.CollectedAt = ts.UTC()
}
}
p.result = result
return nil
}
func inferCollectedAtFromExtractedFiles(files []ExtractedFile) time.Time {
var latestReliable time.Time
var latestAny time.Time
for _, f := range files {
ts := f.ModTime
if ts.IsZero() {
continue
}
if latestAny.IsZero() || ts.After(latestAny) {
latestAny = ts
}
// Ignore placeholder archive mtimes like 1980-01-01.
if ts.Year() < 2000 {
continue
}
if latestReliable.IsZero() || ts.After(latestReliable) {
latestReliable = ts
}
}
if !latestReliable.IsZero() {
return latestReliable
}
return latestAny
}
func appendExtractionWarnings(result *models.AnalysisResult, files []ExtractedFile) {
if result == nil {
return
}
truncated := make([]string, 0)
for _, f := range files {
if !f.Truncated {
continue
}
if f.TruncatedMessage != "" {
truncated = append(truncated, fmt.Sprintf("%s: %s", f.Path, f.TruncatedMessage))
continue
}
truncated = append(truncated, fmt.Sprintf("%s: content was truncated due to size limit", f.Path))
}
if len(truncated) == 0 {
return
}
result.Events = append(result.Events, models.Event{
Timestamp: time.Now(),
Source: "LOGPile",
EventType: "Analysis Warning",
Severity: models.SeverityWarning,
Description: "Input data was too large; analysis is partial and may be incomplete",
RawData: strings.Join(truncated, "; "),
})
}
// Result returns the analysis result
func (p *BMCParser) Result() *models.AnalysisResult {
return p.result

View File

@@ -0,0 +1,62 @@
package parser
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestAppendExtractionWarnings(t *testing.T) {
result := &models.AnalysisResult{
Events: make([]models.Event, 0),
}
files := []ExtractedFile{
{Path: "ok.log", Content: []byte("ok")},
{Path: "big.log", Truncated: true, TruncatedMessage: "file exceeded size limit and was truncated"},
}
appendExtractionWarnings(result, files)
if len(result.Events) != 1 {
t.Fatalf("expected 1 warning event, got %d", len(result.Events))
}
ev := result.Events[0]
if ev.Severity != models.SeverityWarning {
t.Fatalf("expected warning severity, got %q", ev.Severity)
}
if ev.EventType != "Analysis Warning" {
t.Fatalf("unexpected event type: %q", ev.EventType)
}
if ev.RawData == "" {
t.Fatalf("expected warning details in RawData")
}
}
func TestInferCollectedAtFromExtractedFiles_PrefersReliableMTime(t *testing.T) {
files := []ExtractedFile{
{Path: "a.log", ModTime: time.Date(1980, 1, 1, 0, 0, 0, 0, time.UTC)},
{Path: "b.log", ModTime: time.Date(2025, 12, 12, 10, 14, 49, 0, time.FixedZone("EST", -5*3600))},
{Path: "c.log", ModTime: time.Date(2026, 2, 28, 4, 18, 18, 0, time.FixedZone("UTC+8", 8*3600))},
}
got := inferCollectedAtFromExtractedFiles(files)
want := files[2].ModTime
if !got.Equal(want) {
t.Fatalf("expected %s, got %s", want, got)
}
}
func TestInferCollectedAtFromExtractedFiles_FallsBackToAnyMTime(t *testing.T) {
files := []ExtractedFile{
{Path: "a.log", ModTime: time.Date(1980, 1, 1, 0, 0, 0, 0, time.UTC)},
{Path: "b.log", ModTime: time.Date(1970, 1, 2, 0, 0, 0, 0, time.UTC)},
}
got := inferCollectedAtFromExtractedFiles(files)
want := files[0].ModTime
if !got.Equal(want) {
t.Fatalf("expected fallback %s, got %s", want, got)
}
}

View File

@@ -0,0 +1,33 @@
package parser
import (
"sync"
"time"
)
const fallbackTimezoneName = "Europe/Moscow"
var (
fallbackTimezoneOnce sync.Once
fallbackTimezone *time.Location
)
// DefaultArchiveLocation returns the timezone used for source timestamps
// that do not contain an explicit offset.
func DefaultArchiveLocation() *time.Location {
fallbackTimezoneOnce.Do(func() {
loc, err := time.LoadLocation(fallbackTimezoneName)
if err != nil {
fallbackTimezone = time.FixedZone("MSK", 3*60*60)
return
}
fallbackTimezone = loc
})
return fallbackTimezone
}
// ParseInDefaultArchiveLocation parses timestamps without timezone information
// using Europe/Moscow as the assumed source timezone.
func ParseInDefaultArchiveLocation(layout, value string) (time.Time, error) {
return time.ParseInLocation(layout, value, DefaultArchiveLocation())
}

View File

@@ -1,96 +0,0 @@
# Vendor Parser Modules
Каждый производитель серверов имеет свой формат диагностических архивов BMC.
Эта директория содержит модули парсеров для разных производителей.
## Структура модуля
```
vendors/
├── vendors.go # Импорты всех модулей (добавьте сюда новый)
├── README.md # Эта документация
├── template/ # Шаблон для нового модуля
│ └── parser.go.template
├── inspur/ # Модуль Inspur/Kaytus
│ ├── parser.go # Основной парсер + регистрация
│ ├── sdr.go # Парсинг SDR (сенсоры)
│ ├── fru.go # Парсинг FRU (серийники)
│ ├── asset.go # Парсинг asset.json
│ └── syslog.go # Парсинг syslog
├── supermicro/ # Будущий модуль Supermicro
├── dell/ # Будущий модуль Dell iDRAC
└── hpe/ # Будущий модуль HPE iLO
```
## Как добавить новый модуль
### 1. Создайте директорию модуля
```bash
mkdir -p internal/parser/vendors/VENDORNAME
```
### 2. Скопируйте шаблон
```bash
cp internal/parser/vendors/template/parser.go.template \
internal/parser/vendors/VENDORNAME/parser.go
```
### 3. Отредактируйте parser.go
- Замените `VENDORNAME` на идентификатор вендора (например, `supermicro`)
- Замените `VENDOR_DESCRIPTION` на описание (например, `Supermicro`)
- Реализуйте метод `Detect()` для определения формата
- Реализуйте метод `Parse()` для парсинга данных
### 4. Зарегистрируйте модуль
Добавьте импорт в `vendors/vendors.go`:
```go
import (
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/VENDORNAME" // Новый модуль
)
```
### 5. Готово!
Модуль автоматически зарегистрируется при старте приложения через `init()`.
## Интерфейс VendorParser
```go
type VendorParser interface {
// Name возвращает человекочитаемое имя парсера
Name() string
// Vendor возвращает идентификатор вендора
Vendor() string
// Detect проверяет, подходит ли этот парсер для файлов
// Возвращает уверенность 0-100 (0 = не подходит, 100 = точно этот формат)
Detect(files []ExtractedFile) int
// Parse парсит извлеченные файлы
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
}
```
## Советы по реализации Detect()
- Ищите уникальные файлы/директории для данного вендора
- Проверяйте содержимое файлов на характерные маркеры
- Возвращайте высокий confidence (70+) только при уверенном совпадении
- Несколько парсеров могут вернуть >0, выбирается с максимальным confidence
## Поддерживаемые вендоры
| Вендор | Идентификатор | Статус | Протестировано на |
|--------|---------------|--------|-------------------|
| Inspur/Kaytus | `inspur` | ✅ Готов | KR4268X2 (onekeylog) |
| Supermicro | `supermicro` | ⏳ Планируется | - |
| Dell iDRAC | `dell` | ⏳ Планируется | - |
| HPE iLO | `hpe` | ⏳ Планируется | - |
| Lenovo XCC | `lenovo` | ⏳ Планируется | - |

1573
internal/parser/vendors/dell/parser.go vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,480 @@
package dell
import (
"archive/zip"
"bytes"
"testing"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestDetectNestedTSRZip(t *testing.T) {
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{"Make":"Dell Inc.","Model":"PowerEdge R750","ServiceTag":"G37Q064"}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(`<CIM><MESSAGE><SIMPLEREQ/></MESSAGE></CIM>`),
})
p := &Parser{}
score := p.Detect([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR20241119143901_G37Q064.pl.zip", Content: inner},
})
if score < 80 {
t.Fatalf("expected high detect score for nested TSR zip, got %d", score)
}
}
func TestParseNestedTSRZip(t *testing.T) {
const viewXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SystemView">
<PROPERTY NAME="Manufacturer"><VALUE>Dell Inc.</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>PowerEdge R750</VALUE></PROPERTY>
<PROPERTY NAME="ServiceTag"><VALUE>G37Q064</VALUE></PROPERTY>
<PROPERTY NAME="BIOSVersionString"><VALUE>2.19.1</VALUE></PROPERTY>
<PROPERTY NAME="LifecycleControllerVersion"><VALUE>7.00.30.00</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_CPUView">
<PROPERTY NAME="FQDD"><VALUE>CPU.Socket.1</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>Intel(R) Xeon(R) Gold 6330</VALUE></PROPERTY>
<PROPERTY NAME="Manufacturer"><VALUE>Intel</VALUE></PROPERTY>
<PROPERTY NAME="NumberOfEnabledCores"><VALUE>28</VALUE></PROPERTY>
<PROPERTY NAME="NumberOfEnabledThreads"><VALUE>56</VALUE></PROPERTY>
<PROPERTY NAME="CurrentClockSpeed"><VALUE>2000</VALUE></PROPERTY>
<PROPERTY NAME="MaxClockSpeed"><VALUE>3100</VALUE></PROPERTY>
<PROPERTY NAME="PPIN"><VALUE>ABCD</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_NICView">
<PROPERTY NAME="FQDD"><VALUE>NIC.Slot.1-1-1</VALUE></PROPERTY>
<PROPERTY NAME="ProductName"><VALUE>Broadcom 57414 Dual Port 10/25GbE SFP28 Adapter</VALUE></PROPERTY>
<PROPERTY NAME="VendorName"><VALUE>Broadcom</VALUE></PROPERTY>
<PROPERTY NAME="CurrentMACAddress"><VALUE>00:11:22:33:44:55</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>NICSERIAL1</VALUE></PROPERTY>
<PROPERTY NAME="FamilyVersion"><VALUE>22.80.17</VALUE></PROPERTY>
<PROPERTY NAME="PCIVendorID"><VALUE>0x14e4</VALUE></PROPERTY>
<PROPERTY NAME="PCIDeviceID"><VALUE>0x16d7</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PowerSupplyView">
<PROPERTY NAME="FQDD"><VALUE>PSU.Slot.1</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>D1400E-S0</VALUE></PROPERTY>
<PROPERTY NAME="Manufacturer"><VALUE>Dell</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>PSUSERIAL1</VALUE></PROPERTY>
<PROPERTY NAME="FirmwareVersion"><VALUE>00.1A</VALUE></PROPERTY>
<PROPERTY NAME="TotalOutputPower"><VALUE>1400</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_VideoView">
<PROPERTY NAME="FQDD"><VALUE>Video.Slot.38-1</VALUE></PROPERTY>
<PROPERTY NAME="MarketingName"><VALUE>NVIDIA H100 PCIe</VALUE></PROPERTY>
<PROPERTY NAME="Description"><VALUE>GH100 [H100 PCIe]</VALUE></PROPERTY>
<PROPERTY NAME="Manufacturer"><VALUE>NVIDIA Corporation</VALUE></PROPERTY>
<PROPERTY NAME="PCIVendorID"><VALUE>10DE</VALUE></PROPERTY>
<PROPERTY NAME="PCIDeviceID"><VALUE>2331</VALUE></PROPERTY>
<PROPERTY NAME="BusNumber"><VALUE>74</VALUE></PROPERTY>
<PROPERTY NAME="DeviceNumber"><VALUE>0</VALUE></PROPERTY>
<PROPERTY NAME="FunctionNumber"><VALUE>0</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>1793924039808</VALUE></PROPERTY>
<PROPERTY NAME="FirmwareVersion"><VALUE>96.00.AF.00.01</VALUE></PROPERTY>
<PROPERTY NAME="GPUGUID"><VALUE>bc681a6d4785dde08c21f49c46c05cc3</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
const swXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SoftwareIdentity">
<PROPERTY NAME="ElementName"><VALUE>NIC.Slot.1-1-1</VALUE></PROPERTY>
<PROPERTY NAME="VersionString"><VALUE>22.80.17</VALUE></PROPERTY>
<PROPERTY NAME="ComponentType"><VALUE>Network</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
const eventsXML = `<Log>
<Event AgentID="Lifecycle Controller" Category="System Health" Severity="Warning" Timestamp="2024-11-19T14:39:01-0800">
<MessageID>SYS1001</MessageID>
<Message>Link is down</Message>
<FQDD>NIC.Slot.1-1-1</FQDD>
</Event>
</Log>`
const cimSensorXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_GPUSensor">
<PROPERTY NAME="DeviceID"><VALUE>Video.Slot.38-1</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryGPUTemperature"><VALUE>290</VALUE></PROPERTY>
<PROPERTY NAME="MemoryTemperature"><VALUE>440</VALUE></PROPERTY>
<PROPERTY NAME="PowerConsumption"><VALUE>295</VALUE></PROPERTY>
<PROPERTY NAME="ThermalAlertStatus"><VALUE>5</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="CIM_NumericSensor">
<PROPERTY NAME="ElementName"><VALUE>PS1 Voltage 1</VALUE></PROPERTY>
<PROPERTY NAME="CurrentReading"><VALUE>224.0</VALUE></PROPERTY>
<PROPERTY NAME="BaseUnits"><VALUE>5</VALUE></PROPERTY>
<PROPERTY NAME="UnitModifier"><VALUE>0</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>5</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{
"Make":"Dell Inc.",
"Model":"PowerEdge R750",
"ServiceTag":"G37Q064",
"FirmwareVersion":"7.00.30.00",
"CollectionDateTime":"2024-11-19 14:39:01.000-0800"
}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(viewXML),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml": []byte(swXML),
"tsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xml": []byte(cimSensorXML),
"tsr/hardware/sysinfo/lcfiles/curr_lclog.xml": []byte(eventsXML),
})
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR20241119143901_G37Q064.pl.zip", Content: inner},
})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if got := result.Hardware.BoardInfo.Manufacturer; got != "Dell Inc." {
t.Fatalf("unexpected board manufacturer: %q", got)
}
if got := result.Hardware.BoardInfo.ProductName; got != "PowerEdge R750" {
t.Fatalf("unexpected board product: %q", got)
}
if got := result.Hardware.BoardInfo.SerialNumber; got != "G37Q064" {
t.Fatalf("unexpected service tag: %q", got)
}
if len(result.Hardware.CPUs) != 1 {
t.Fatalf("expected 1 cpu, got %d", len(result.Hardware.CPUs))
}
if got := result.Hardware.CPUs[0].Model; got != "Intel(R) Xeon(R) Gold 6330" {
t.Fatalf("unexpected cpu model: %q", got)
}
if len(result.Hardware.NetworkAdapters) != 1 {
t.Fatalf("expected 1 network adapter, got %d", len(result.Hardware.NetworkAdapters))
}
adapter := result.Hardware.NetworkAdapters[0]
if adapter.Vendor != "Broadcom" {
t.Fatalf("unexpected nic vendor: %q", adapter.Vendor)
}
if adapter.Firmware != "22.80.17" {
t.Fatalf("unexpected nic firmware: %q", adapter.Firmware)
}
if adapter.SerialNumber != "NICSERIAL1" {
t.Fatalf("unexpected nic serial: %q", adapter.SerialNumber)
}
if len(result.Hardware.PowerSupply) != 1 {
t.Fatalf("expected 1 psu, got %d", len(result.Hardware.PowerSupply))
}
psu := result.Hardware.PowerSupply[0]
if psu.Model != "D1400E-S0" {
t.Fatalf("unexpected psu model: %q", psu.Model)
}
if psu.Firmware != "00.1A" {
t.Fatalf("unexpected psu firmware: %q", psu.Firmware)
}
if len(result.Hardware.Firmware) == 0 {
t.Fatalf("expected firmware entries")
}
if len(result.Hardware.GPUs) != 1 {
t.Fatalf("expected 1 gpu, got %d", len(result.Hardware.GPUs))
}
if got := result.Hardware.GPUs[0].Model; got != "NVIDIA H100 PCIe" {
t.Fatalf("unexpected gpu model: %q", got)
}
if got := result.Hardware.GPUs[0].SerialNumber; got != "1793924039808" {
t.Fatalf("unexpected gpu serial: %q", got)
}
if got := result.Hardware.GPUs[0].Temperature; got != 29 {
t.Fatalf("unexpected gpu temperature: %d", got)
}
if len(result.Sensors) == 0 {
t.Fatalf("expected sensors from CIM_Sensor")
}
if len(result.Events) != 1 {
t.Fatalf("expected one lifecycle event, got %d", len(result.Events))
}
if got := string(result.Events[0].Severity); got != "warning" {
t.Fatalf("unexpected event severity: %q", got)
}
}
// TestParseDellPhysicalDiskEndurance verifies that RemainingRatedWriteEndurance from
// DCIM_PhysicalDiskView is parsed into Storage.RemainingEndurancePct.
func TestParseDellPhysicalDiskEndurance(t *testing.T) {
const viewXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SystemView">
<PROPERTY NAME="Manufacturer"><VALUE>Dell Inc.</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>PowerEdge R6625</VALUE></PROPERTY>
<PROPERTY NAME="ServiceTag"><VALUE>8VS2LG4</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PhysicalDiskView">
<PROPERTY NAME="FQDD"><VALUE>Disk.Bay.0:Enclosure.Internal.0-1:RAID.SL.3-1</VALUE></PROPERTY>
<PROPERTY NAME="Slot"><VALUE>0</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>HFS480G3H2X069N</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>ESEAN5254I030B26B</VALUE></PROPERTY>
<PROPERTY NAME="SizeInBytes"><VALUE>479559942144</VALUE></PROPERTY>
<PROPERTY NAME="MediaType"><VALUE>Solid State Drive</VALUE></PROPERTY>
<PROPERTY NAME="BusProtocol"><VALUE>SATA</VALUE></PROPERTY>
<PROPERTY NAME="Revision"><VALUE>DZ03</VALUE></PROPERTY>
<PROPERTY NAME="RemainingRatedWriteEndurance"><VALUE>100</VALUE><DisplayValue>100 %</DisplayValue></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>1</VALUE><DisplayValue>OK</DisplayValue></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PhysicalDiskView">
<PROPERTY NAME="FQDD"><VALUE>Disk.Bay.1:Enclosure.Internal.0-1:RAID.SL.3-1</VALUE></PROPERTY>
<PROPERTY NAME="Slot"><VALUE>1</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>TOSHIBA MG08ADA800E</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>X1G0A0YXFVVG</VALUE></PROPERTY>
<PROPERTY NAME="SizeInBytes"><VALUE>8001563222016</VALUE></PROPERTY>
<PROPERTY NAME="MediaType"><VALUE>Hard Disk Drive</VALUE></PROPERTY>
<PROPERTY NAME="BusProtocol"><VALUE>SAS</VALUE></PROPERTY>
<PROPERTY NAME="Revision"><VALUE>0104</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{"Make":"Dell Inc.","Model":"PowerEdge R6625","ServiceTag":"8VS2LG4"}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(viewXML),
})
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR20260306141852_8VS2LG4.pl.zip", Content: inner},
})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if len(result.Hardware.Storage) != 2 {
t.Fatalf("expected 2 storage devices, got %d", len(result.Hardware.Storage))
}
ssd := result.Hardware.Storage[0]
if ssd.RemainingEndurancePct == nil {
t.Fatalf("SSD slot 0: expected RemainingEndurancePct to be set")
}
if *ssd.RemainingEndurancePct != 100 {
t.Errorf("SSD slot 0: expected RemainingEndurancePct=100, got %d", *ssd.RemainingEndurancePct)
}
hdd := result.Hardware.Storage[1]
if hdd.RemainingEndurancePct != nil {
t.Errorf("HDD slot 1: expected RemainingEndurancePct absent, got %d", *hdd.RemainingEndurancePct)
}
}
// TestParseDellInfiniBandView verifies that DCIM_InfiniBandView entries are parsed as
// NetworkAdapters (not PCIe devices) and that the corresponding SoftwareIdentity firmware
// entry with FQDD "InfiniBand.Slot.*" does not leak into hardware.firmware.
//
// Regression guard: PowerEdge R6625 (8VS2LG4) — "Mellanox Network Adapter" version
// "20.39.35.60" appeared in hardware.firmware because DCIM_InfiniBandView was ignored
// (device ended up only in PCIeDevices with model "16x or x16") and SoftwareIdentity
// FQDD "InfiniBand.Slot.1-1" was not filtered. (2026-03-15)
func TestParseDellInfiniBandView(t *testing.T) {
const viewXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SystemView">
<PROPERTY NAME="Manufacturer"><VALUE>Dell Inc.</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>PowerEdge R6625</VALUE></PROPERTY>
<PROPERTY NAME="ServiceTag"><VALUE>8VS2LG4</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_InfiniBandView">
<PROPERTY NAME="FQDD"><VALUE>InfiniBand.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="DeviceDescription"><VALUE>InfiniBand in Slot 1 Port 1</VALUE></PROPERTY>
<PROPERTY NAME="CurrentMACAddress"><VALUE>00:1C:FD:D7:5A:E6</VALUE></PROPERTY>
<PROPERTY NAME="FamilyVersion"><VALUE>20.39.35.60</VALUE></PROPERTY>
<PROPERTY NAME="EFIVersion"><VALUE>14.32.17</VALUE></PROPERTY>
<PROPERTY NAME="PCIVendorID"><VALUE>15B3</VALUE></PROPERTY>
<PROPERTY NAME="PCIDeviceID"><VALUE>101B</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PCIDeviceView">
<PROPERTY NAME="FQDD"><VALUE>InfiniBand.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="Description"><VALUE>MT28908 Family [ConnectX-6]</VALUE></PROPERTY>
<PROPERTY NAME="DeviceDescription"><VALUE>InfiniBand in Slot 1 Port 1</VALUE></PROPERTY>
<PROPERTY NAME="Manufacturer"><VALUE>Mellanox Technologies</VALUE></PROPERTY>
<PROPERTY NAME="PCIVendorID"><VALUE>15B3</VALUE></PROPERTY>
<PROPERTY NAME="PCIDeviceID"><VALUE>101B</VALUE></PROPERTY>
<PROPERTY NAME="DataBusWidth"><DisplayValue>16x or x16</DisplayValue></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_ControllerView">
<PROPERTY NAME="FQDD"><VALUE>RAID.SL.3-1</VALUE></PROPERTY>
<PROPERTY NAME="ProductName"><VALUE>PERC H755 Front</VALUE></PROPERTY>
<PROPERTY NAME="ControllerFirmwareVersion"><VALUE>52.30.0-6115</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
const swXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SoftwareIdentity">
<PROPERTY NAME="ElementName"><VALUE>Mellanox Network Adapter - 00:1C:FD:D7:5A:E6</VALUE></PROPERTY>
<PROPERTY NAME="FQDD"><VALUE>InfiniBand.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="VersionString"><VALUE>20.39.35.60</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SoftwareIdentity">
<PROPERTY NAME="ElementName"><VALUE>PERC H755 Front</VALUE></PROPERTY>
<PROPERTY NAME="FQDD"><VALUE>RAID.SL.3-1</VALUE></PROPERTY>
<PROPERTY NAME="VersionString"><VALUE>52.30.0-6115</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SoftwareIdentity">
<PROPERTY NAME="ElementName"><VALUE>BIOS</VALUE></PROPERTY>
<PROPERTY NAME="FQDD"><VALUE>BIOS.Setup.1-1</VALUE></PROPERTY>
<PROPERTY NAME="VersionString"><VALUE>1.15.3</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{"Make":"Dell Inc.","Model":"PowerEdge R6625","ServiceTag":"8VS2LG4"}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(viewXML),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml": []byte(swXML),
})
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR20260306141852_8VS2LG4.pl.zip", Content: inner},
})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
// InfiniBand adapter must appear as a NetworkAdapter, not a PCIe device.
if len(result.Hardware.NetworkAdapters) != 1 {
t.Fatalf("expected 1 network adapter, got %d", len(result.Hardware.NetworkAdapters))
}
nic := result.Hardware.NetworkAdapters[0]
if nic.Slot != "InfiniBand.Slot.1-1" {
t.Errorf("unexpected NIC slot: %q", nic.Slot)
}
if nic.Firmware != "20.39.35.60" {
t.Errorf("unexpected NIC firmware: %q", nic.Firmware)
}
if len(nic.MACAddresses) == 0 || nic.MACAddresses[0] != "00:1C:FD:D7:5A:E6" {
t.Errorf("unexpected NIC MAC: %v", nic.MACAddresses)
}
// pci.ids enrichment: VendorID=0x15B3, DeviceID=0x101B → chip model + vendor name.
if nic.Model != "MT28908 Family [ConnectX-6]" {
t.Errorf("NIC model = %q, want MT28908 Family [ConnectX-6] (from pci.ids)", nic.Model)
}
if nic.Vendor != "Mellanox Technologies" {
t.Errorf("NIC vendor = %q, want Mellanox Technologies (from pci.ids)", nic.Vendor)
}
// InfiniBand FQDD must NOT appear in PCIe devices.
for _, pcie := range result.Hardware.PCIeDevices {
if pcie.Slot == "InfiniBand.Slot.1-1" {
t.Errorf("InfiniBand.Slot.1-1 must not appear in PCIeDevices")
}
}
// Firmware entries from SoftwareIdentity and parseControllerView must carry the FQDD
// as their Description so the exporter's isDeviceBoundFirmwareFQDD filter can remove them.
fqddByName := make(map[string]string)
for _, fw := range result.Hardware.Firmware {
fqddByName[fw.DeviceName] = fw.Description
}
if desc := fqddByName["Mellanox Network Adapter"]; desc != "InfiniBand.Slot.1-1" {
t.Errorf("Mellanox firmware Description = %q, want InfiniBand.Slot.1-1 for FQDD filter", desc)
}
if desc := fqddByName["PERC H755 Front"]; desc != "RAID.SL.3-1" {
t.Errorf("PERC H755 Front firmware Description = %q, want RAID.SL.3-1 for FQDD filter", desc)
}
}
// TestParseDellCPUAffinity verifies that CPUAffinity is parsed into NUMANode for
// NIC, PCIe, and controller views. "Not Applicable" must result in NUMANode=0.
func TestParseDellCPUAffinity(t *testing.T) {
const viewXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SystemView">
<PROPERTY NAME="Manufacturer"><VALUE>Dell Inc.</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>PowerEdge R750</VALUE></PROPERTY>
<PROPERTY NAME="ServiceTag"><VALUE>TESTST1</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_NICView">
<PROPERTY NAME="FQDD"><VALUE>NIC.Slot.2-1-1</VALUE></PROPERTY>
<PROPERTY NAME="ProductName"><VALUE>Some NIC</VALUE></PROPERTY>
<PROPERTY NAME="CPUAffinity"><VALUE>1</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_InfiniBandView">
<PROPERTY NAME="FQDD"><VALUE>InfiniBand.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="DeviceDescription"><VALUE>InfiniBand in Slot 1</VALUE></PROPERTY>
<PROPERTY NAME="CPUAffinity"><VALUE>2</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_ControllerView">
<PROPERTY NAME="FQDD"><VALUE>RAID.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="ProductName"><VALUE>PERC H755</VALUE></PROPERTY>
<PROPERTY NAME="CPUAffinity"><VALUE>Not Applicable</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PCIDeviceView">
<PROPERTY NAME="FQDD"><VALUE>Slot.7-1</VALUE></PROPERTY>
<PROPERTY NAME="Description"><VALUE>Some PCIe Card</VALUE></PROPERTY>
<PROPERTY NAME="CPUAffinity"><VALUE>2</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{"Make":"Dell Inc.","Model":"PowerEdge R750","ServiceTag":"TESTST1"}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(viewXML),
})
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR_TESTST1.pl.zip", Content: inner},
})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
// NIC CPUAffinity=1 → NUMANode=1
nicBySlot := make(map[string]int)
for _, nic := range result.Hardware.NetworkAdapters {
nicBySlot[nic.Slot] = nic.NUMANode
}
if nicBySlot["NIC.Slot.2-1-1"] != 1 {
t.Errorf("NIC.Slot.2-1-1 NUMANode = %d, want 1", nicBySlot["NIC.Slot.2-1-1"])
}
if nicBySlot["InfiniBand.Slot.1-1"] != 2 {
t.Errorf("InfiniBand.Slot.1-1 NUMANode = %d, want 2", nicBySlot["InfiniBand.Slot.1-1"])
}
// PCIe device CPUAffinity=2 → NUMANode=2; controller CPUAffinity="Not Applicable" → NUMANode=0
pcieBySlot := make(map[string]int)
for _, pcie := range result.Hardware.PCIeDevices {
pcieBySlot[pcie.Slot] = pcie.NUMANode
}
if pcieBySlot["Slot.7-1"] != 2 {
t.Errorf("Slot.7-1 NUMANode = %d, want 2", pcieBySlot["Slot.7-1"])
}
if pcieBySlot["RAID.Slot.1-1"] != 0 {
t.Errorf("RAID.Slot.1-1 NUMANode = %d, want 0 (Not Applicable)", pcieBySlot["RAID.Slot.1-1"])
}
}
func makeZipArchive(t *testing.T, files map[string][]byte) []byte {
t.Helper()
var buf bytes.Buffer
zw := zip.NewWriter(&buf)
for name, content := range files {
w, err := zw.Create(name)
if err != nil {
t.Fatalf("create zip entry %s: %v", name, err)
}
if _, err := w.Write(content); err != nil {
t.Fatalf("write zip entry %s: %v", name, err)
}
}
if err := zw.Close(); err != nil {
t.Fatalf("close zip: %v", err)
}
return buf.Bytes()
}

View File

@@ -1,72 +0,0 @@
# Generic Text File Parser
Fallback парсер для текстовых файлов, которые не распознаны другими парсерами.
## Назначение
Этот парсер обрабатывает любые текстовые файлы, которые:
- Не являются архивами специфичных вендоров
- Содержат текстовую информацию (не бинарные данные)
- Представляют собой одиночные .gz файлы или простые текстовые файлы
## Приоритет
**Confidence score: 15** (низкий приоритет)
Этот парсер срабатывает только если ни один другой парсер не подошел с более высоким confidence.
## Поддерживаемые файлы
### Автоматически распознаваемые типы
1. **NVIDIA Bug Report** (`nvidia-bug-report-*.log.gz`)
- Извлекает информацию о драйвере NVIDIA
- Находит GPU устройства
- Показывает версию драйвера
2. **Любые текстовые файлы**
- Проверяет, что содержимое - текст (не бинарные данные)
- Показывает базовую информацию о файле
## Извлекаемые данные
### Events
- **Text File**: Базовая информация о загруженном файле
- **Driver Info**: Информация о NVIDIA драйвере (для nvidia-bug-report)
- **GPU Device**: Обнаруженные GPU устройства (для nvidia-bug-report)
## Пример использования
```bash
# Запуск с nvidia-bug-report
./logpile --file nvidia-bug-report-*.log.gz
# Запуск с любым текстовым файлом
./logpile --file system.log.gz
```
## Версионирование
**Текущая версия парсера:** 1.0.0
## Ограничения
1. Этот парсер предоставляет только базовую информацию
2. Не выполняет глубокий анализ содержимого
3. Для детального анализа специфичных логов рекомендуется создать dedicated парсер
## Расширение
Чтобы добавить поддержку нового типа файлов:
1. Добавьте проверку в функцию `Parse()`
2. Создайте функцию `parseXXX()` для извлечения специфичной информации
3. Увеличьте версию парсера
Пример:
```go
if strings.Contains(strings.ToLower(file.Path), "custom-log") {
parseCustomLog(content, result)
}
```

View File

@@ -10,7 +10,7 @@ import (
)
// parserVersion - version of this parser module
const parserVersion = "1.0.0"
const parserVersion = "1.1"
func init() {
parser.Register(&Parser{})

3535
internal/parser/vendors/h3c/parser.go vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,962 @@
package h3c
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestDetectH3C_GenerationRouting(t *testing.T) {
g5 := &G5Parser{}
g6 := &G6Parser{}
g5Files := []parser.ExtractedFile{
{Path: "bmc/pack.info", Content: []byte("STARTTIME:0")},
{Path: "static/FRUInfo.ini", Content: []byte("[Baseboard]\nBoard Manufacturer=H3C\n")},
{Path: "static/hardware_info.ini", Content: []byte("[Processors: Processor 1]\nModel: Intel Xeon\n")},
{Path: "static/hardware.info", Content: []byte("[Disk_0_Front_NA]\nSerialNumber=DISK-0\n")},
{Path: "static/firmware_version.ini", Content: []byte("[System board]\nBIOS Version: 5.59\n")},
{Path: "user/test1.csv", Content: []byte("Record Time Stamp,DescInfo\n2025-01-01 00:00:00,foo\n")},
}
if gotG5, gotG6 := g5.Detect(g5Files), g6.Detect(g5Files); gotG5 <= gotG6 {
t.Fatalf("expected G5 confidence > G6 for G5 sample, got g5=%d g6=%d", gotG5, gotG6)
}
g6Files := []parser.ExtractedFile{
{Path: "bmc/pack.info", Content: []byte("STARTTIME:0")},
{Path: "static/FRUInfo.ini", Content: []byte("[Baseboard]\nBoard Manufacturer=H3C\n")},
{Path: "static/board_info.ini", Content: []byte("[System board]\nBoardMfr=H3C\n")},
{Path: "static/firmware_version.json", Content: []byte(`{"BIOS":{"Firmware Name":"BIOS","Firmware Version":"6.10"}}`)},
{Path: "static/CPUDetailInfo.xml", Content: []byte("<Root><CPU1><Model>X</Model></CPU1></Root>")},
{Path: "static/MemoryDetailInfo.xml", Content: []byte("<Root><DIMM1><Name>A0</Name></DIMM1></Root>")},
{Path: "user/Sel.json", Content: []byte(`{"Id":1}`)},
}
if gotG5, gotG6 := g5.Detect(g6Files), g6.Detect(g6Files); gotG6 <= gotG5 {
t.Fatalf("expected G6 confidence > G5 for G6 sample, got g5=%d g6=%d", gotG5, gotG6)
}
}
func TestParseH3CG6_RaidAndNVMeEnrichment(t *testing.T) {
p := &G6Parser{}
files := []parser.ExtractedFile{
{
Path: "static/storage_disk.ini",
Content: []byte(`[Disk_000]
DiskSlotDesc=Front0
Present=YES
SerialNumber=SER-0
`),
},
{
Path: "static/raid.json",
Content: []byte(`{
"RaidConfig": {
"CtrlInfo": [
{
"CtrlSlot": 1,
"CtrlName": "RAID-LSI-9560",
"LDInfo": [
{
"LDID": "0",
"LDName": "VD0",
"RAIDLevel": "1",
"CapacityBytes": 1000000000,
"Status": "Optimal"
}
]
}
]
}
}`),
},
{
Path: "static/Storage_RAID-LSI-9560-LP-8i-4GB[1].txt",
Content: []byte(`Controller Information
------------------------------------------------------------------------
AssetTag : RAID-LSI-9560
Logical Device Information
------------------------------------------------------------------------
LDID : 0
Name : VD0
RAID Level : 1
CapacityBytes : 1000000000
Status : Optimal
Physical Device Information
------------------------------------------------------------------------
ConnectionID : 0
Position : Front0
StatusIndicator : OK
Protocol : SATA
MediaType : SSD
Manufacturer : Samsung
Model : PM893
Revision : GDC1
SerialNumber : SER-0
CapacityBytes : 480000000000
ConnectionID : 1
Position : Front1
StatusIndicator : OK
Protocol : SATA
MediaType : SSD
Manufacturer : Samsung
Model : PM893
Revision : GDC1
SerialNumber : SER-1
CapacityBytes : 480000000000
`),
},
{
Path: "static/NVMe_info.txt",
Content: []byte(`[NVMe_0]
Present=YES
DiskSlotDesc=Front2
Model=INTEL SSDPE2KX010T8
SerialNumber=NVME-1
Firmware=V100
CapacityBytes=1000204886016
Interface=NVMe
Status=OK
`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if len(result.Hardware.Volumes) != 1 {
t.Fatalf("expected 1 volume, got %d", len(result.Hardware.Volumes))
}
vol := result.Hardware.Volumes[0]
if vol.RAIDLevel != "RAID1" {
t.Fatalf("expected RAID1 level, got %q", vol.RAIDLevel)
}
if vol.SizeGB != 1 {
t.Fatalf("expected 1GB logical volume, got %d", vol.SizeGB)
}
if len(result.Hardware.Storage) != 3 {
t.Fatalf("expected 3 unique storage devices, got %d", len(result.Hardware.Storage))
}
var front0 *models.Storage
var nvme *models.Storage
for i := range result.Hardware.Storage {
s := &result.Hardware.Storage[i]
if strings.EqualFold(s.SerialNumber, "SER-0") {
front0 = s
}
if strings.EqualFold(s.SerialNumber, "NVME-1") {
nvme = s
}
}
if front0 == nil {
t.Fatalf("expected merged Front0 disk by serial SER-0")
}
if front0.Model != "PM893" {
t.Fatalf("expected Front0 model PM893, got %q", front0.Model)
}
if front0.SizeGB != 480 {
t.Fatalf("expected Front0 size 480GB, got %d", front0.SizeGB)
}
if nvme == nil {
t.Fatalf("expected NVMe disk by serial NVME-1")
}
if nvme.Type != "nvme" {
t.Fatalf("expected nvme type, got %q", nvme.Type)
}
}
func TestParseH3CG6(t *testing.T) {
p := &G6Parser{}
files := []parser.ExtractedFile{
{
Path: "static/FRUInfo.ini",
Content: []byte(`[Baseboard]
Board Manufacturer=H3C
Board Product Name=RS36M2C6SB
Product Product Name=H3C UniServer R4700 G6
Product Serial Number=210235A4FYH257000010
Product Part Number=0235A4FY
`),
},
{
Path: "static/firmware_version.json",
Content: []byte(`{
"BMCP": {"Firmware Name":"HDM","Firmware Version":"1.83","Location":"bmc card","Part Model":"-"},
"BIOS": {"Firmware Name":"BIOS","Firmware Version":"6.10.53","Location":"system board","Part Model":"-"}
}`),
},
{
Path: "static/CPUDetailInfo.xml",
Content: []byte(`<Root>
<CPU1>
<Status>Presence</Status>
<Model>INTEL(R) XEON(R) GOLD 6542Y</Model>
<ProcessorSpeed>0xb54</ProcessorSpeed>
<ProcessorMaxSpeed>0x1004</ProcessorMaxSpeed>
<TotalCores>0x18</TotalCores>
<TotalThreads>0x30</TotalThreads>
<SerialNumber>68-5C-81-C1-0E-A3-4E-40</SerialNumber>
<PPIN>68-5C-81-C1-0E-A3-4E-40</PPIN>
</CPU1>
</Root>`),
},
{
Path: "static/MemoryDetailInfo.xml",
Content: []byte(`<Root>
<DIMM1>
<Status>Presence</Status>
<Name>CPU1_CH1_D0 (A0)</Name>
<PartNumber>M321R8GA0PB0-CWMXJ</PartNumber>
<DIMMTech>RDIMM</DIMMTech>
<SerialNumber>80CE032519135C82ED</SerialNumber>
<DIMMRanks>0x2</DIMMRanks>
<DIMMSize>0x10000</DIMMSize>
<CurFreq>0x1130</CurFreq>
<MaxFreq>0x15e0</MaxFreq>
<DIMMSilk>A0</DIMMSilk>
</DIMM1>
</Root>`),
},
{
Path: "static/storage_disk.ini",
Content: []byte(`[Disk_000]
SerialNumber=S6KLNN0Y516813
DiskSlotDesc=Front0
Present=YES
`),
},
{
Path: "static/net_cfg.ini",
Content: []byte(`[Network Configuration]
eth0 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F6
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth0.2 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F6
inet6 addr: fe80::32c6:d7ff:fe94:54f6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1496 Metric:1
eth1 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F5
inet addr:10.201.129.0 Bcast:10.201.143.255 Mask:255.255.240.0
inet6 addr: fe80::32c6:d7ff:fe94:54f5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
`),
},
{
Path: "static/psu_cfg.ini",
Content: []byte(`[Psu0]
SN=210231AGUNH257001569
Max_Power(W)=1600
Manufacturer=Great Wall
Power Status=Input Normal, Output Normal
Present_Status=Present
Power_ID=1
Model=GW-CRPS1600D2
Version=03.02.00
[Psu1]
Manufacturer=Great Wall
Power_ID=2
Version=03.02.00
Power Status=Input Normal, Output Normal
SN=210231AGUNH257001570
Model=GW-CRPS1600D2
Present_Status=Present
Max_Power(W)=1600
`),
},
{
Path: "static/hardware_info.ini",
Content: []byte(`[Ethernet adapters: Port 1]
Device Type : NIC
Network Port : Port 1
Location : PCIE-[1]
MAC Address : E4:3D:1A:6F:B0:30
Speed : 8.0GT/s
Product Name : NIC-BCM957414-F-B-25Gb-2P
[Ethernet adapters: Port 2]
Device Type : NIC
Network Port : Port 2
Location : PCIE-[1]
MAC Address : E4:3D:1A:6F:B0:31
Speed : 8.0GT/s
Product Name : NIC-BCM957414-F-B-25Gb-2P
[PCIe Card: PCIe 1]
Location : 1
Product Name : NIC-BCM957414-F-B-25Gb-2P
Status : Normal
Vendor ID : 0x14E4
Device ID : 0x16D7
Serial Number : NICSN-G6-001
Part Number : NICPN-G6-001
Firmware Version : 22.35.1010
`),
},
{
Path: "static/sensor_info.ini",
Content: []byte(`Sensor Name | Reading | Unit | Status| Crit low
Inlet_Temp | 20.000 | degrees C | ok | na
CPU1_Status | 0x0 | discrete | 0x8080| na
`),
},
{
Path: "user/Sel.json",
Content: []byte(`
{
"Created": "2025-07-14 03:34:18 UTC+08:00",
"Severity": "Info",
"EntryCode": "Asserted",
"EntryType": "Event",
"Id": 1,
"Level": "Info",
"Message": "Processor Presence detected",
"SensorName": "CPU1_Status",
"SensorType": "Processor"
},
{
"Created": "2025-07-14 20:56:45 UTC+08:00",
"Severity": "Critical",
"EntryCode": "Asserted",
"EntryType": "Event",
"Id": 2,
"Level": "Critical",
"Message": "Power Supply AC lost",
"SensorName": "PSU1_Status",
"SensorType": "Power Supply"
}
`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if result.Hardware.BoardInfo.Manufacturer != "H3C" {
t.Fatalf("unexpected board manufacturer: %q", result.Hardware.BoardInfo.Manufacturer)
}
if result.Hardware.BoardInfo.ProductName != "H3C UniServer R4700 G6" {
t.Fatalf("unexpected board product: %q", result.Hardware.BoardInfo.ProductName)
}
if result.Hardware.BoardInfo.SerialNumber != "210235A4FYH257000010" {
t.Fatalf("unexpected board serial: %q", result.Hardware.BoardInfo.SerialNumber)
}
if len(result.Hardware.Firmware) < 2 {
t.Fatalf("expected firmware entries, got %d", len(result.Hardware.Firmware))
}
if len(result.Hardware.CPUs) != 1 {
t.Fatalf("expected 1 cpu, got %d", len(result.Hardware.CPUs))
}
if result.Hardware.CPUs[0].Cores != 24 {
t.Fatalf("expected 24 cores, got %d", result.Hardware.CPUs[0].Cores)
}
if len(result.Hardware.Memory) != 1 {
t.Fatalf("expected 1 dimm, got %d", len(result.Hardware.Memory))
}
if result.Hardware.Memory[0].SizeMB != 65536 {
t.Fatalf("expected 65536MB, got %d", result.Hardware.Memory[0].SizeMB)
}
if len(result.Hardware.Storage) != 1 {
t.Fatalf("expected 1 disk, got %d", len(result.Hardware.Storage))
}
if result.Hardware.Storage[0].SerialNumber != "S6KLNN0Y516813" {
t.Fatalf("unexpected disk serial: %q", result.Hardware.Storage[0].SerialNumber)
}
if len(result.Hardware.PowerSupply) != 2 {
t.Fatalf("expected 2 PSUs from psu_cfg.ini, got %d", len(result.Hardware.PowerSupply))
}
if result.Hardware.PowerSupply[0].WattageW == 0 {
t.Fatalf("expected PSU wattage parsed, got 0")
}
if len(result.Hardware.NetworkAdapters) != 1 {
t.Fatalf("expected 1 host network adapter from hardware_info.ini, got %d", len(result.Hardware.NetworkAdapters))
}
macs := make(map[string]struct{})
var hostNIC models.NetworkAdapter
var hostNICFound bool
for _, nic := range result.Hardware.NetworkAdapters {
if len(nic.MACAddresses) == 0 {
t.Fatalf("expected MAC on network adapter %+v", nic)
}
for _, mac := range nic.MACAddresses {
macs[strings.ToLower(mac)] = struct{}{}
}
if strings.EqualFold(nic.Slot, "PCIe 1") && strings.Contains(strings.ToLower(nic.Model), "bcm957414") {
hostNIC = nic
hostNICFound = true
}
}
if !hostNICFound {
t.Fatalf("expected host NIC from hardware_info.ini, got %+v", result.Hardware.NetworkAdapters)
}
if _, ok := macs["e4:3d:1a:6f:b0:30"]; !ok {
t.Fatalf("expected host NIC MAC e4:3d:1a:6f:b0:30 in adapters, got %+v", result.Hardware.NetworkAdapters)
}
if _, ok := macs["e4:3d:1a:6f:b0:31"]; !ok {
t.Fatalf("expected host NIC MAC e4:3d:1a:6f:b0:31 in adapters, got %+v", result.Hardware.NetworkAdapters)
}
if !strings.Contains(strings.ToLower(hostNIC.Vendor), "broadcom") {
t.Fatalf("expected host NIC vendor enrichment from Vendor ID, got %q", hostNIC.Vendor)
}
if hostNIC.SerialNumber != "NICSN-G6-001" {
t.Fatalf("expected host NIC serial from PCIe card section, got %q", hostNIC.SerialNumber)
}
if hostNIC.PartNumber != "NICPN-G6-001" {
t.Fatalf("expected host NIC part number from PCIe card section, got %q", hostNIC.PartNumber)
}
if hostNIC.Firmware != "22.35.1010" {
t.Fatalf("expected host NIC firmware from PCIe card section, got %q", hostNIC.Firmware)
}
if len(result.Sensors) != 2 {
t.Fatalf("expected 2 sensors, got %d", len(result.Sensors))
}
if result.Sensors[0].Name != "Inlet_Temp" {
t.Fatalf("unexpected first sensor: %q", result.Sensors[0].Name)
}
if len(result.Events) != 2 {
t.Fatalf("expected 2 events, got %d", len(result.Events))
}
if result.Events[0].Timestamp.Year() != 2025 || result.Events[0].Timestamp.Month() != 7 {
t.Fatalf("expected SEL timestamp from payload, got %s", result.Events[0].Timestamp)
}
if result.Events[1].Severity != models.SeverityCritical {
t.Fatalf("expected critical severity for AC lost event, got %q", result.Events[1].Severity)
}
}
func TestParseH3CG5_PCIeArgumentsEnrichesNonNVMeStorage(t *testing.T) {
p := &G5Parser{}
files := []parser.ExtractedFile{
{
Path: "static/storage_disk.ini",
Content: []byte(`[Disk_000]
DiskSlotDesc=Front slot 3
Present=YES
SerialNumber=SAT-03
`),
},
{
Path: "static/NVMe_info.txt",
Content: []byte(`[NVMe_0]
Present=YES
DiskSlotDesc=Front slot 108
SerialNumber=NVME-108
`),
},
{
Path: "static/PCIe_arguments_table.xml",
Content: []byte(`<root>
<PCIE100>
<base_args>
<type>SSD</type>
<name>SSD-SATA-960G</name>
</base_args>
<type_get_args>
<bios_args>
<vendor_id>0x144D</vendor_id>
</bios_args>
</type_get_args>
</PCIE100>
<PCIE200>
<base_args>
<type>SSD</type>
<name>SSD-3.84T-NVMe-SFF</name>
</base_args>
<type_get_args>
<bios_args>
<vendor_id>0x144D</vendor_id>
</bios_args>
</type_get_args>
</PCIE200>
</root>`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if len(result.Hardware.Storage) != 2 {
t.Fatalf("expected 2 storage devices, got %d", len(result.Hardware.Storage))
}
var sata *models.Storage
var nvme *models.Storage
for i := range result.Hardware.Storage {
s := &result.Hardware.Storage[i]
switch s.SerialNumber {
case "SAT-03":
sata = s
case "NVME-108":
nvme = s
}
}
if sata == nil {
t.Fatalf("expected SATA storage SAT-03")
}
if sata.Model != "SSD-SATA-960G" {
t.Fatalf("expected SATA model enrichment from PCIe table, got %q", sata.Model)
}
if !strings.Contains(strings.ToLower(sata.Manufacturer), "samsung") {
t.Fatalf("expected SATA vendor enrichment to Samsung, got %q", sata.Manufacturer)
}
if nvme == nil {
t.Fatalf("expected NVMe storage NVME-108")
}
if nvme.Model != "SSD-3.84T-NVMe-SFF" {
t.Fatalf("expected NVMe model enrichment from PCIe table, got %q", nvme.Model)
}
if !strings.Contains(strings.ToLower(nvme.Manufacturer), "samsung") {
t.Fatalf("expected NVMe vendor enrichment to Samsung, got %q", nvme.Manufacturer)
}
}
func TestParseH3CG5_VariantLayout(t *testing.T) {
p := &G5Parser{}
files := []parser.ExtractedFile{
{
Path: "static/FRUInfo.ini",
Content: []byte(`[Baseboard]
Board Manufacturer=H3C
Product Product Name=H3C UniServer R4900 G5
Product Serial Number=02A6AX5231C003VM
`),
},
{
Path: "static/firmware_version.ini",
Content: []byte(`[System board]
BIOS Version : 5.59 V100R001B05D078
ME Version : 4.4.4.202
HDM Version : 3.34.01 HDM V100R001B05D078SP01
CPLD Version : V00C
`),
},
{
Path: "static/board_cfg.ini",
Content: []byte(`[Board Type]
Board Type : R4900 G5
[Board Version]
Board Version : VER.D
[Customer ID]
CustomerID : 255
[OEM ID]
OEM Flag : 1
`),
},
{
Path: "static/hardware_info.ini",
Content: []byte(`[Processors: Processor 1]
Model : Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz
Status : Normal
Frequency : 2800 MHz
Cores : 24
Threads : 48
L1 Cache : 1920 KB
L2 Cache : 30720 KB
L3 Cache : 36864 KB
CPU PPIN : 49-A9-50-C0-15-9F-2D-DC
[Processors: Processor 2]
Model : Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz
Status : Normal
Frequency : 2800 MHz
Cores : 24
Threads : 48
CPU PPIN : 49-AC-3D-BF-85-7F-17-58
[Memory Details: Dimm Index 0]
Location : Processor 1
Channel : 1
Socket ID : A0
Status : Normal
Size : 65536 MB
Maximum Frequency : 3200 MHz
Type : DDR4
Ranks : 2R DIMM
Technology : RDIMM
Part Number : M393A8G40AB2-CWE
Manufacture : Samsung
Serial Number : S02K0D0243351D7079
[Memory Details: Dimm Index 16]
Location : Processor 2
Channel : 1
Socket ID : A0
Status : Normal
Size : 65536 MB
Maximum Frequency : 3200 MHz
Type : DDR4
Ranks : 2R DIMM
Technology : RDIMM
Part Number : M393A8G40AB2-CWE
Manufacture : Samsung
Serial Number : S02K0D0243351D73F0
[Ethernet adapters: Port 1]
Device Type : NIC
Network Port : Port 1
Location : PCIE-[1]
MAC Address : E4:3D:1A:6F:B0:30
Speed : 8.0GT/s
Product Name : NIC-BCM957414-F-B-25Gb-2P
[Ethernet adapters: Port 2]
Device Type : NIC
Network Port : Port 2
Location : PCIE-[1]
MAC Address : E4:3D:1A:6F:B0:31
Speed : 8.0GT/s
Product Name : NIC-BCM957414-F-B-25Gb-2P
[Ethernet adapters: Port 1]
Device Type : NIC
Network Port : Port 1
Location : PCIE-[4]
MAC Address : E8:EB:D3:4F:2E:90
Speed : 8.0GT/s
Product Name : NIC-MCX512A-ACAT-2*25Gb-F
[Ethernet adapters: Port 2]
Device Type : NIC
Network Port : Port 2
Location : PCIE-[4]
MAC Address : E8:EB:D3:4F:2E:91
Speed : 8.0GT/s
Product Name : NIC-MCX512A-ACAT-2*25Gb-F
[PCIe Card: PCIe 1]
Location : 1
Product Name : NIC-BCM957414-F-B-25Gb-2P
Status : Normal
Vendor ID : 0x14E4
Device ID : 0x16D7
Serial Number : NICSN-G5-001
Part Number : NICPN-G5-001
Firmware Version : 21.80.1
[PCIe Card: PCIe 4]
Location : 4
Product Name : NIC-MCX512A-ACAT-2*25Gb-F
Status : Normal
Vendor ID : 0x15B3
Device ID : 0x1017
Serial Number : NICSN-G5-004
Part Number : NICPN-G5-004
Firmware Version : 28.33.15
`),
},
{
Path: "static/hardware.info",
Content: []byte(`[Disk_0_Front_NA]
Present=YES
SlotNum=0
FrontOrRear=Front
SerialNumber=22443C4EE184
[Nvme_Front slot 21]
Present=YES
NvmePhySlot=Front slot 21
SlotNum=121
SerialNumber=NVME-21
[Nvme_255_121]
Present=YES
SlotNum=121
SerialNumber=NVME-21
`),
},
{
Path: "static/raid.json",
Content: []byte(`{
"RAIDCONFIG": {
"Ctrl info": [
{
"CtrlDevice Slot": 3,
"CtrlDevice Name": "AVAGO MegaRAID SAS 9460-8i",
"LDInfo": [
{
"LD ID": 0,
"LD_name": "SystemRAID",
"RAID_level(RAID 0,RAID 1,RAID 5,RAID 6,RAID 00,RAID 10,RAID 50,RAID 60)": "RAID1",
"Logical_capicity(per 512byte)": 936640512
}
]
},
{
"CtrlDevice Slot": 6,
"CtrlDevice Name": "MegaRAID 9560-16i 8GB",
"LDInfo": [
{
"LD ID": 0,
"LD_name": "DataRAID",
"RAID_level(RAID 0,RAID 1,RAID 5,RAID 6,RAID 00,RAID 10,RAID 50,RAID 60)": "RAID50",
"Logical_capicity(per 512byte)": 90004783104
}
]
}
]
}
}`),
},
{
Path: "static/Raid_BP_Conf_Info.ini",
Content: []byte(`[BP Information]
Description | BP TYPE | I2cPort | BpConnectorNum | FrontOrRear | Node Num | DiskSlotRange |
8SFF SAS/SATA | BP_G5_8SFF | AUX_1 | ~ | ~ | ~ | ~ |
8SFF SAS/SATA | BP_G5_8SFF | AUX_2 | ~ | ~ | ~ | ~ |
8SFF SAS/SATA | BP_G5_8SFF | AUX_3 | ~ | ~ | ~ | ~ |
[RAID Information]
PCIE SLOT | RAID SAS_NUM |
3 | 2 |
6 | 4 |
`),
},
{
Path: "static/PCIe_arguments_table.xml",
Content: []byte(`<root>
<PCIE100>
<base_args>
<type>SSD</type>
<name>SSD-1.92T/3.84T-NVMe-EV-SFF-sa</name>
</base_args>
<type_get_args>
<bios_args>
<vendor_id>0x144D</vendor_id>
</bios_args>
</type_get_args>
</PCIE100>
</root>`),
},
{
Path: "static/psu_cfg.ini",
Content: []byte(`[Active / Standby configuration]
Power ID : 1
Present Status : Present
Cold Status : Active Power
Model : DPS-1300AB-6 R
SN : 210231ACT9H232000080
Max Power(W) : 1300
Power ID : 2
Present Status : Present
Cold Status : Active Power
Model : DPS-1300AB-6 R
SN : 210231ACT9H232000079
Max Power(W) : 1300
`),
},
{
Path: "static/net_cfg.ini",
Content: []byte(`[Network Configuration]
eth0 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F6
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth0.2 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F6
inet6 addr: fe80::32c6:d7ff:fe94:54f6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1496 Metric:1
eth1 Link encap:Ethernet HWaddr 30:C6:D7:94:54:F5
inet addr:10.201.129.0 Bcast:10.201.143.255 Mask:255.255.240.0
inet6 addr: fe80::32c6:d7ff:fe94:54f5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
`),
},
{
Path: "static/smartdata/Front0/first_date_analysis.txt",
Content: []byte(`The Current System Time Is 2023_09_22_14_19_39
Model Info: ATA Micron_5300_MTFD
Serial Number: 22443C4EE184
`),
},
{
Path: "user/test1.csv",
Content: []byte(`Record Time Stamp,Severity Level,Severity Level ID,SensorTypeStr,SensorName,Event Dir,Event Occurred Time,DescInfo,Explanation,Suggestion
2025-04-01 08:50:13,Minor,0x1,NA,NA,NA,2025-04-01 08:50:13,"SSH login failed from IP: 10.200.10.121 user: admin"," "," "
Pre-Init,Info,0x0,Management Subsystem Health,Health,Assertion event,Pre-Init,"Management controller off-line"," "," "
2025-04-01 08:51:10,Major,0x2,Power Supply,PSU1_Status,Assertion event,2025-04-01 08:51:10,"Power Supply AC lost"," "," "
`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
if len(result.Hardware.CPUs) != 2 {
t.Fatalf("expected 2 CPUs from hardware_info.ini, got %d", len(result.Hardware.CPUs))
}
if result.Hardware.CPUs[0].FrequencyMHz != 2800 {
t.Fatalf("expected CPU frequency 2800MHz, got %d", result.Hardware.CPUs[0].FrequencyMHz)
}
if len(result.Hardware.Memory) != 2 {
t.Fatalf("expected 2 DIMMs from hardware_info.ini, got %d", len(result.Hardware.Memory))
}
if result.Hardware.Memory[0].SizeMB != 65536 {
t.Fatalf("expected DIMM size 65536MB, got %d", result.Hardware.Memory[0].SizeMB)
}
if len(result.Hardware.Firmware) < 4 {
t.Fatalf("expected firmware entries from firmware_version.ini, got %d", len(result.Hardware.Firmware))
}
if result.Hardware.BoardInfo.Version == "" {
t.Fatalf("expected board version from board_cfg.ini")
}
if !strings.Contains(result.Hardware.BoardInfo.Description, "CustomerID: 255") {
t.Fatalf("expected board description enrichment from board_cfg.ini, got %q", result.Hardware.BoardInfo.Description)
}
if len(result.Hardware.Storage) != 2 {
t.Fatalf("expected 2 unique storage devices from hardware.info, got %d", len(result.Hardware.Storage))
}
var nvmeFound bool
var diskModelEnriched bool
for _, s := range result.Hardware.Storage {
if s.SerialNumber == "NVME-21" {
nvmeFound = true
if s.Type != "nvme" {
t.Fatalf("expected NVME-21 type nvme, got %q", s.Type)
}
if !strings.Contains(strings.ToLower(s.Manufacturer), "samsung") {
t.Fatalf("expected NVME vendor enrichment to Samsung, got %q", s.Manufacturer)
}
if s.Model != "SSD-1.92T/3.84T-NVMe-EV-SFF-sa" {
t.Fatalf("expected NVME model enrichment from PCIe table, got %q", s.Model)
}
}
if s.SerialNumber == "22443C4EE184" && strings.Contains(s.Model, "Micron") {
diskModelEnriched = true
}
}
if !nvmeFound {
t.Fatalf("expected deduped NVME storage by serial NVME-21")
}
if !diskModelEnriched {
t.Fatalf("expected disk model enrichment from smartdata by serial")
}
if len(result.Hardware.PowerSupply) != 2 {
t.Fatalf("expected 2 PSUs from psu_cfg.ini, got %d", len(result.Hardware.PowerSupply))
}
if result.Hardware.PowerSupply[0].WattageW == 0 {
t.Fatalf("expected PSU wattage parsed, got 0")
}
if len(result.Hardware.NetworkAdapters) != 2 {
t.Fatalf("expected 2 host network adapters from hardware_info.ini, got %d", len(result.Hardware.NetworkAdapters))
}
if len(result.Hardware.NetworkCards) != 2 {
t.Fatalf("expected 2 network cards synthesized from adapters, got %d", len(result.Hardware.NetworkCards))
}
var g5NIC models.NetworkAdapter
var g5NICFound bool
for _, nic := range result.Hardware.NetworkAdapters {
if strings.EqualFold(nic.Slot, "PCIe 1") && strings.Contains(strings.ToLower(nic.Model), "bcm957414") {
g5NIC = nic
g5NICFound = true
break
}
}
if !g5NICFound {
t.Fatalf("expected host NIC PCIe 1 from hardware_info.ini, got %+v", result.Hardware.NetworkAdapters)
}
if !strings.Contains(strings.ToLower(g5NIC.Vendor), "broadcom") {
t.Fatalf("expected G5 NIC vendor from Vendor ID, got %q", g5NIC.Vendor)
}
if g5NIC.SerialNumber != "NICSN-G5-001" {
t.Fatalf("expected G5 NIC serial from PCIe card section, got %q", g5NIC.SerialNumber)
}
if g5NIC.PartNumber != "NICPN-G5-001" {
t.Fatalf("expected G5 NIC part number from PCIe card section, got %q", g5NIC.PartNumber)
}
if g5NIC.Firmware != "21.80.1" {
t.Fatalf("expected G5 NIC firmware from PCIe card section, got %q", g5NIC.Firmware)
}
if len(result.Hardware.Devices) != 5 {
t.Fatalf("expected 5 topology devices from Raid_BP_Conf_Info.ini (3 BP + 2 RAID), got %d", len(result.Hardware.Devices))
}
var bpFound bool
var raidFound bool
for _, d := range result.Hardware.Devices {
if strings.Contains(d.ID, "h3c-bp-") && strings.Contains(d.Model, "BP_G5_8SFF") {
bpFound = true
}
desc, _ := d.Details["description"].(string)
if strings.Contains(d.ID, "h3c-raid-slot-3") && strings.Contains(desc, "SAS ports: 2") {
raidFound = true
}
}
if !bpFound || !raidFound {
t.Fatalf("expected parsed backplane and RAID topology devices, got %+v", result.Hardware.Devices)
}
if len(result.Hardware.Volumes) != 2 {
t.Fatalf("expected 2 RAID volumes (same LD ID on different controllers), got %d", len(result.Hardware.Volumes))
}
var raid1Found bool
var raid50Found bool
for _, v := range result.Hardware.Volumes {
if strings.Contains(v.Controller, "slot 3") {
raid1Found = v.RAIDLevel == "RAID1" && v.CapacityBytes > 0
}
if strings.Contains(v.Controller, "slot 6") {
raid50Found = v.RAIDLevel == "RAID50" && v.CapacityBytes > 0
}
}
if !raid1Found || !raid50Found {
t.Fatalf("expected RAID1 and RAID50 volumes with parsed capacities, got %+v", result.Hardware.Volumes)
}
if len(result.Events) != 2 {
t.Fatalf("expected 2 CSV events (Pre-Init skipped), got %d", len(result.Events))
}
if result.Events[0].Severity != models.SeverityWarning {
t.Fatalf("expected Minor CSV severity mapped to warning, got %q", result.Events[0].Severity)
}
if result.Events[1].Severity != models.SeverityCritical {
t.Fatalf("expected Major CSV severity mapped to critical, got %q", result.Events[1].Severity)
}
}

View File

@@ -3,12 +3,15 @@ package inspur
import (
"encoding/json"
"fmt"
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser/vendors/pciids"
)
var rawHexPCIDeviceRegex = regexp.MustCompile(`(?i)^0x[0-9a-f]+$`)
// AssetJSON represents the structure of Inspur asset.json file
type AssetJSON struct {
VersionInfo []struct {
@@ -55,6 +58,7 @@ type AssetJSON struct {
} `json:"MemInfo"`
HddInfo []struct {
PresentBitmap []int `json:"PresentBitmap"`
SerialNumber string `json:"SerialNumber"`
Manufacturer string `json:"Manufacturer"`
ModelName string `json:"ModelName"`
@@ -90,8 +94,12 @@ type AssetJSON struct {
} `json:"PcieInfo"`
}
// ParseAssetJSON parses Inspur asset.json content
func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
// ParseAssetJSON parses Inspur asset.json content.
// - pcieSlotDeviceNames: optional map from integer PCIe slot ID to device name string,
// sourced from devicefrusdr.log PCIe REST section. Fills missing NVMe model names.
// - pcieSlotSerials: optional map from integer PCIe slot ID to serial number string,
// sourced from audit.log SN-changed events. Fills missing NVMe serial numbers.
func ParseAssetJSON(content []byte, pcieSlotDeviceNames map[int]string, pcieSlotSerials map[int]string) (*models.HardwareConfig, error) {
var asset AssetJSON
if err := json.Unmarshal(content, &asset); err != nil {
return nil, err
@@ -158,8 +166,36 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
}
// Parse storage info
seenHDDFW := make(map[string]bool)
for _, hdd := range asset.HddInfo {
slot := normalizeAssetHDDSlot(hdd.LocationString, hdd.Location, hdd.DiskInterfaceType)
modelName := strings.TrimSpace(hdd.ModelName)
serial := normalizeRedisValue(hdd.SerialNumber)
present := bitmapHasAnyValue(hdd.PresentBitmap)
if !present && (slot != "" || modelName != "" || serial != "" || hdd.Capacity > 0) {
present = true
}
if !present && slot == "" && modelName == "" && serial == "" && hdd.Capacity == 0 {
continue
}
// Enrich model name from PCIe device name (supplied from devicefrusdr.log).
// BMC does not populate HddInfo.ModelName for NVMe drives, but the PCIe REST
// section in devicefrusdr.log carries the drive model as device_name.
if modelName == "" && hdd.PcieSlot > 0 && len(pcieSlotDeviceNames) > 0 {
if devName, ok := pcieSlotDeviceNames[hdd.PcieSlot]; ok && devName != "" {
modelName = devName
}
}
// Enrich serial number from audit.log SN-changed events (supplied via pcieSlotSerials).
// BMC asset.json does not carry NVMe serial numbers; audit.log logs every SN change.
if serial == "" && hdd.PcieSlot > 0 && len(pcieSlotSerials) > 0 {
if sn, ok := pcieSlotSerials[hdd.PcieSlot]; ok && sn != "" {
serial = sn
}
}
storageType := "HDD"
if hdd.DiskInterfaceType == 5 {
storageType = "NVMe"
@@ -168,35 +204,21 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
}
// Resolve manufacturer: try vendor ID first, then model name extraction
modelName := strings.TrimSpace(hdd.ModelName)
manufacturer := resolveManufacturer(hdd.Manufacturer, modelName)
config.Storage = append(config.Storage, models.Storage{
Slot: hdd.LocationString,
Slot: slot,
Type: storageType,
Model: modelName,
SizeGB: hdd.Capacity,
SerialNumber: hdd.SerialNumber,
SerialNumber: serial,
Manufacturer: manufacturer,
Firmware: hdd.FirmwareVersion,
Interface: diskInterfaceToString(hdd.DiskInterfaceType),
Present: present,
})
// Add HDD firmware to firmware list (deduplicated by model+version)
if hdd.FirmwareVersion != "" {
fwKey := modelName + ":" + hdd.FirmwareVersion
if !seenHDDFW[fwKey] {
slot := hdd.LocationString
if slot == "" {
slot = fmt.Sprintf("%s %dGB", storageType, hdd.Capacity)
}
config.Firmware = append(config.Firmware, models.FirmwareInfo{
DeviceName: fmt.Sprintf("%s (%s)", modelName, slot),
Version: hdd.FirmwareVersion,
})
seenHDDFW[fwKey] = true
}
}
// Disk firmware is already stored in Storage.Firmware — do not duplicate in Hardware.Firmware.
}
// Parse PCIe info
@@ -207,8 +229,8 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
VendorID: pcie.VendorId,
DeviceID: pcie.DeviceId,
BDF: formatBDF(pcie.BusNumber, pcie.DeviceNumber, pcie.FunctionNumber),
LinkWidth: pcie.NegotiatedLinkWidth,
LinkSpeed: pcieLinkSpeedToString(pcie.CurrentLinkSpeed),
LinkWidth: pcie.NegotiatedLinkWidth,
LinkSpeed: pcieLinkSpeedToString(pcie.CurrentLinkSpeed),
MaxLinkWidth: pcie.MaxLinkWidth,
MaxLinkSpeed: pcieLinkSpeedToString(pcie.MaxLinkSpeed),
DeviceClass: pcieClassToString(pcie.ClassCode, pcie.SubClassCode),
@@ -225,25 +247,22 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
}
// Use device name from PCI IDs database if available
if deviceName != "" {
device.DeviceClass = deviceName
device.DeviceClass = normalizeModelLabel(deviceName)
}
config.PCIeDevices = append(config.PCIeDevices, device)
// Extract GPUs (class 3 = display controller)
if pcie.ClassCode == 3 {
gpuModel := deviceName
if gpuModel == "" {
gpuModel = pcieClassToString(pcie.ClassCode, pcie.SubClassCode)
}
gpuModel := normalizeGPUModel(pcie.VendorId, pcie.DeviceId, deviceName, pcie.ClassCode, pcie.SubClassCode)
gpu := models.GPU{
Slot: pcie.LocString,
Model: gpuModel,
Manufacturer: vendor,
VendorID: pcie.VendorId,
DeviceID: pcie.DeviceId,
BDF: formatBDF(pcie.BusNumber, pcie.DeviceNumber, pcie.FunctionNumber),
CurrentLinkWidth: pcie.NegotiatedLinkWidth,
CurrentLinkSpeed: pcieLinkSpeedToString(pcie.CurrentLinkSpeed),
Slot: pcie.LocString,
Model: gpuModel,
Manufacturer: vendor,
VendorID: pcie.VendorId,
DeviceID: pcie.DeviceId,
BDF: formatBDF(pcie.BusNumber, pcie.DeviceNumber, pcie.FunctionNumber),
CurrentLinkWidth: pcie.NegotiatedLinkWidth,
CurrentLinkSpeed: pcieLinkSpeedToString(pcie.CurrentLinkSpeed),
MaxLinkWidth: pcie.MaxLinkWidth,
MaxLinkSpeed: pcieLinkSpeedToString(pcie.MaxLinkSpeed),
}
@@ -260,6 +279,45 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
return config, nil
}
func normalizeModelLabel(v string) string {
v = strings.TrimSpace(v)
if v == "" {
return ""
}
return strings.Join(strings.Fields(v), " ")
}
func normalizeGPUModel(vendorID, deviceID int, model string, classCode, subClass int) string {
model = normalizeModelLabel(model)
if model == "" || rawHexPCIDeviceRegex.MatchString(model) || isGenericGPUModelLabel(model) {
if pciModel := normalizeModelLabel(pciids.DeviceName(vendorID, deviceID)); pciModel != "" {
model = pciModel
}
}
if model == "" || isGenericGPUModelLabel(model) {
model = pcieClassToString(classCode, subClass)
}
// Last fallback for unknown NVIDIA display devices: expose PCI DeviceID
// instead of generic "3D Controller".
if (model == "" || strings.EqualFold(model, "3D Controller")) && vendorID == 0x10de && deviceID > 0 {
return fmt.Sprintf("0x%04X", deviceID)
}
return model
}
func isGenericGPUModelLabel(model string) bool {
switch strings.ToLower(strings.TrimSpace(model)) {
case "", "gpu", "display", "display controller", "vga", "3d controller", "other", "unknown":
return true
default:
return false
}
}
func memoryTypeToString(memType int) string {
switch memType {
case 26:
@@ -284,6 +342,29 @@ func diskInterfaceToString(ifType int) string {
}
}
func normalizeAssetHDDSlot(locationString string, location int, diskInterfaceType int) string {
slot := strings.TrimSpace(locationString)
if slot != "" {
return slot
}
if location < 0 {
return ""
}
if diskInterfaceType == 5 {
return fmt.Sprintf("OB%02d", location+1)
}
return fmt.Sprintf("%d", location)
}
func bitmapHasAnyValue(values []int) bool {
for _, v := range values {
if v != 0 {
return true
}
}
return false
}
func pcieLinkSpeedToString(speed int) string {
switch speed {
case 1:

View File

@@ -0,0 +1,48 @@
package inspur
import "testing"
func TestParseAssetJSON_NVIDIAGPUModelFromPCIIDs(t *testing.T) {
raw := []byte(`{
"VersionInfo": [],
"CpuInfo": [],
"MemInfo": {"MemCommonInfo": [], "DimmInfo": []},
"HddInfo": [],
"PcieInfo": [{
"VendorId": 4318,
"DeviceId": 9019,
"BusNumber": 12,
"DeviceNumber": 0,
"FunctionNumber": 0,
"MaxLinkWidth": 16,
"MaxLinkSpeed": 5,
"NegotiatedLinkWidth": 16,
"CurrentLinkSpeed": 5,
"ClassCode": 3,
"SubClassCode": 2,
"PcieSlot": 11,
"LocString": "#CPU0_PCIE2",
"PartNumber": null,
"SerialNumber": null,
"Mac": []
}]
}`)
hw, err := ParseAssetJSON(raw, nil, nil)
if err != nil {
t.Fatalf("ParseAssetJSON failed: %v", err)
}
if len(hw.GPUs) != 1 {
t.Fatalf("expected 1 GPU, got %d", len(hw.GPUs))
}
if hw.GPUs[0].Model != "GH100 [H200 NVL]" {
t.Fatalf("expected model GH100 [H200 NVL], got %q", hw.GPUs[0].Model)
}
}
func TestNormalizeGPUModel_FallbackToDeviceIDForUnknownNVIDIA(t *testing.T) {
got := normalizeGPUModel(0x10de, 0xbeef, "0xBEEF\t", 3, 2)
if got != "0xBEEF" {
t.Fatalf("expected 0xBEEF, got %q", got)
}
}

94
internal/parser/vendors/inspur/audit.go vendored Normal file
View File

@@ -0,0 +1,94 @@
package inspur
import (
"fmt"
"regexp"
"strconv"
"strings"
)
// auditSNChangedNVMeRegex matches:
// "Front Back Plane N NVMe DiskM SN changed from X to Y"
// Captures: disk_num, new_serial
var auditSNChangedNVMeRegex = regexp.MustCompile(`NVMe Disk(\d+)\s+SN changed from \S+\s+to\s+(\S+)`)
// auditSNChangedRAIDRegex matches:
// "Raid(Pcie Slot:N) HDD(enclosure id:E slot:S) SN changed from X to Y"
// Captures: pcie_slot, enclosure_id, slot_num, new_serial
var auditSNChangedRAIDRegex = regexp.MustCompile(`Raid\(Pcie Slot:(\d+)\) HDD\(enclosure id:(\d+) slot:(\d+)\)\s+SN changed from \S+\s+to\s+(\S+)`)
// ParseAuditLogNVMeSerials parses audit.log and returns the final (latest) serial number
// per NVMe disk number. The disk number matches the numeric suffix in PCIe location
// strings like "#NVME0", "#NVME2", etc. from devicefrusdr.log.
// Entries where the serial changed to "NULL" are excluded.
func ParseAuditLogNVMeSerials(content []byte) map[int]string {
serials := make(map[int]string)
for _, line := range strings.Split(string(content), "\n") {
m := auditSNChangedNVMeRegex.FindStringSubmatch(line)
if m == nil {
continue
}
diskNum, err := strconv.Atoi(m[1])
if err != nil {
continue
}
serial := strings.TrimSpace(m[2])
if strings.EqualFold(serial, "NULL") || serial == "" {
delete(serials, diskNum)
} else {
serials[diskNum] = serial
}
}
if len(serials) == 0 {
return nil
}
return serials
}
// ParseAuditLogRAIDSerials parses audit.log and returns the final (latest) serial number
// per RAID backplane disk. Key format is "BP{enclosure_id-1}:{slot_num}" (e.g. "BP0:0").
//
// Each disk slot is claimed by a specific RAID controller (Pcie Slot:N). NULL events from
// an old controller do not clear serials assigned by a newer controller, preventing stale
// deletions when disks are migrated between RAID arrays.
func ParseAuditLogRAIDSerials(content []byte) map[string]string {
// owner tracks which PCIe RAID controller slot last assigned a serial to a disk key.
serials := make(map[string]string)
owner := make(map[string]int)
for _, line := range strings.Split(string(content), "\n") {
m := auditSNChangedRAIDRegex.FindStringSubmatch(line)
if m == nil {
continue
}
pcieSlot, err := strconv.Atoi(m[1])
if err != nil {
continue
}
enclosureID, err := strconv.Atoi(m[2])
if err != nil {
continue
}
slotNum, err := strconv.Atoi(m[3])
if err != nil {
continue
}
serial := strings.TrimSpace(m[4])
key := fmt.Sprintf("BP%d:%d", enclosureID-1, slotNum)
if strings.EqualFold(serial, "NULL") || serial == "" {
// Only clear if this controller was the last to set the serial.
if owner[key] == pcieSlot {
delete(serials, key)
delete(owner, key)
}
} else {
serials[key] = serial
owner[key] = pcieSlot
}
}
if len(serials) == 0 {
return nil
}
return serials
}

View File

@@ -8,6 +8,7 @@ import (
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser/vendors/pciids"
)
// ParseComponentLog parses component.log file and extracts detailed hardware info
@@ -45,27 +46,38 @@ func ParseComponentLogEvents(content []byte) []models.Event {
// Parse RESTful Memory info for Warning/Error status
memEvents := parseMemoryEvents(text)
events = append(events, memEvents...)
events = append(events, parseFanEvents(text)...)
return events
}
// ParseComponentLogSensors extracts sensor readings from component.log JSON sections.
func ParseComponentLogSensors(content []byte) []models.SensorReading {
text := string(content)
var out []models.SensorReading
out = append(out, parseFanSensors(text)...)
out = append(out, parseDiskBackplaneSensors(text)...)
out = append(out, parsePSUSummarySensors(text)...)
return out
}
// MemoryRESTInfo represents the RESTful Memory info structure
type MemoryRESTInfo struct {
MemModules []struct {
MemModID int `json:"mem_mod_id"`
ConfigStatus int `json:"config_status"`
MemModSlot string `json:"mem_mod_slot"`
MemModStatus int `json:"mem_mod_status"`
MemModSize int `json:"mem_mod_size"`
MemModType string `json:"mem_mod_type"`
MemModTechnology string `json:"mem_mod_technology"`
MemModFrequency int `json:"mem_mod_frequency"`
MemModCurrentFreq int `json:"mem_mod_current_frequency"`
MemModVendor string `json:"mem_mod_vendor"`
MemModPartNum string `json:"mem_mod_part_num"`
MemModSerial string `json:"mem_mod_serial_num"`
MemModRanks int `json:"mem_mod_ranks"`
Status string `json:"status"`
MemModID int `json:"mem_mod_id"`
ConfigStatus int `json:"config_status"`
MemModSlot string `json:"mem_mod_slot"`
MemModStatus int `json:"mem_mod_status"`
MemModSize int `json:"mem_mod_size"`
MemModType string `json:"mem_mod_type"`
MemModTechnology string `json:"mem_mod_technology"`
MemModFrequency int `json:"mem_mod_frequency"`
MemModCurrentFreq int `json:"mem_mod_current_frequency"`
MemModVendor string `json:"mem_mod_vendor"`
MemModPartNum string `json:"mem_mod_part_num"`
MemModSerial string `json:"mem_mod_serial_num"`
MemModRanks int `json:"mem_mod_ranks"`
Status string `json:"status"`
} `json:"mem_modules"`
TotalMemoryCount int `json:"total_memory_count"`
PresentMemoryCount int `json:"present_memory_count"`
@@ -88,10 +100,18 @@ func parseMemoryInfo(text string, hw *models.HardwareConfig) {
return
}
// Replace memory data with detailed info from component.log
hw.Memory = nil
var merged []models.MemoryDIMM
seen := make(map[string]int)
for _, existing := range hw.Memory {
key := inspurMemoryKey(existing)
if key == "" {
continue
}
seen[key] = len(merged)
merged = append(merged, existing)
}
for _, mem := range memInfo.MemModules {
hw.Memory = append(hw.Memory, models.MemoryDIMM{
item := models.MemoryDIMM{
Slot: mem.MemModSlot,
Location: mem.MemModSlot,
Present: mem.MemModStatus == 1 && mem.MemModSize > 0,
@@ -105,28 +125,38 @@ func parseMemoryInfo(text string, hw *models.HardwareConfig) {
PartNumber: strings.TrimSpace(mem.MemModPartNum),
Status: mem.Status,
Ranks: mem.MemModRanks,
})
}
key := inspurMemoryKey(item)
if idx, ok := seen[key]; ok {
mergeInspurMemoryDIMM(&merged[idx], item)
continue
}
if key != "" {
seen[key] = len(merged)
}
merged = append(merged, item)
}
hw.Memory = merged
}
// PSURESTInfo represents the RESTful PSU info structure
type PSURESTInfo struct {
PowerSupplies []struct {
ID int `json:"id"`
Present int `json:"present"`
VendorID string `json:"vendor_id"`
Model string `json:"model"`
SerialNum string `json:"serial_num"`
PartNum string `json:"part_num"`
FwVer string `json:"fw_ver"`
InputType string `json:"input_type"`
Status string `json:"status"`
RatedPower int `json:"rated_power"`
PSInPower int `json:"ps_in_power"`
PSOutPower int `json:"ps_out_power"`
PSInVolt float64 `json:"ps_in_volt"`
PSOutVolt float64 `json:"ps_out_volt"`
PSUMaxTemp int `json:"psu_max_temperature"`
ID int `json:"id"`
Present int `json:"present"`
VendorID string `json:"vendor_id"`
Model string `json:"model"`
SerialNum string `json:"serial_num"`
PartNum string `json:"part_num"`
FwVer string `json:"fw_ver"`
InputType string `json:"input_type"`
Status string `json:"status"`
RatedPower int `json:"rated_power"`
PSInPower int `json:"ps_in_power"`
PSOutPower int `json:"ps_out_power"`
PSInVolt float64 `json:"ps_in_volt"`
PSOutVolt float64 `json:"ps_out_volt"`
PSUMaxTemp int `json:"psu_max_temperature"`
} `json:"power_supplies"`
PresentPowerReading int `json:"present_power_reading"`
}
@@ -147,10 +177,18 @@ func parsePSUInfo(text string, hw *models.HardwareConfig) {
return
}
// Clear existing PSU data and populate with RESTful data
hw.PowerSupply = nil
var merged []models.PSU
seen := make(map[string]int)
for _, existing := range hw.PowerSupply {
key := inspurPSUKey(existing)
if key == "" {
continue
}
seen[key] = len(merged)
merged = append(merged, existing)
}
for _, psu := range psuInfo.PowerSupplies {
hw.PowerSupply = append(hw.PowerSupply, models.PSU{
item := models.PSU{
Slot: fmt.Sprintf("PSU%d", psu.ID),
Present: psu.Present == 1,
Model: strings.TrimSpace(psu.Model),
@@ -166,8 +204,18 @@ func parsePSUInfo(text string, hw *models.HardwareConfig) {
InputVoltage: psu.PSInVolt,
OutputVoltage: psu.PSOutVolt,
TemperatureC: psu.PSUMaxTemp,
})
}
key := inspurPSUKey(item)
if idx, ok := seen[key]; ok {
mergeInspurPSU(&merged[idx], item)
continue
}
if key != "" {
seen[key] = len(merged)
}
merged = append(merged, item)
}
hw.PowerSupply = merged
}
// HDDRESTInfo represents the RESTful HDD info structure
@@ -209,20 +257,49 @@ func parseHDDInfo(text string, hw *models.HardwareConfig) {
})
for _, hdd := range hddInfo {
if hdd.Present == 1 {
hddMap[hdd.LocationString] = struct {
slot := strings.TrimSpace(hdd.LocationString)
if slot == "" {
slot = fmt.Sprintf("HDD%d", hdd.ID)
}
hddMap[slot] = struct {
SN string
Model string
Firmware string
Mfr string
}{
SN: strings.TrimSpace(hdd.SN),
SN: normalizeRedisValue(hdd.SN),
Model: strings.TrimSpace(hdd.Model),
Firmware: strings.TrimSpace(hdd.Firmware),
Firmware: normalizeRedisValue(hdd.Firmware),
Mfr: strings.TrimSpace(hdd.Manufacture),
}
}
}
// Merge into existing inventory first (asset/other sections).
for i := range hw.Storage {
slot := strings.TrimSpace(hw.Storage[i].Slot)
if slot == "" {
continue
}
detail, ok := hddMap[slot]
if !ok {
continue
}
if normalizeRedisValue(hw.Storage[i].SerialNumber) == "" {
hw.Storage[i].SerialNumber = detail.SN
}
if hw.Storage[i].Model == "" {
hw.Storage[i].Model = detail.Model
}
if normalizeRedisValue(hw.Storage[i].Firmware) == "" {
hw.Storage[i].Firmware = detail.Firmware
}
if hw.Storage[i].Manufacturer == "" {
hw.Storage[i].Manufacturer = detail.Mfr
}
hw.Storage[i].Present = true
}
// If storage is empty, populate from HDD info
if len(hw.Storage) == 0 {
for _, hdd := range hddInfo {
@@ -239,21 +316,42 @@ func parseHDDInfo(text string, hw *models.HardwareConfig) {
if hdd.CapableSpeed == 12 {
iface = "SAS"
}
slot := strings.TrimSpace(hdd.LocationString)
if slot == "" {
slot = fmt.Sprintf("HDD%d", hdd.ID)
}
hw.Storage = append(hw.Storage, models.Storage{
Slot: hdd.LocationString,
Slot: slot,
Type: storType,
Model: model,
SizeGB: hdd.Capacity,
SerialNumber: strings.TrimSpace(hdd.SN),
SerialNumber: normalizeRedisValue(hdd.SN),
Manufacturer: extractStorageManufacturer(model),
Firmware: strings.TrimSpace(hdd.Firmware),
Firmware: normalizeRedisValue(hdd.Firmware),
Interface: iface,
Present: true,
})
}
}
}
// FanRESTInfo represents the RESTful fan info structure.
type FanRESTInfo struct {
Fans []struct {
ID int `json:"id"`
FanName string `json:"fan_name"`
Present string `json:"present"`
Status string `json:"status"`
StatusStr string `json:"status_str"`
SpeedRPM int `json:"speed_rpm"`
SpeedPercent int `json:"speed_percent"`
MaxSpeedRPM int `json:"max_speed_rpm"`
FanModel string `json:"fan_model"`
} `json:"fans"`
FansPower int `json:"fans_power"`
}
// NetworkAdapterRESTInfo represents the RESTful Network Adapter info structure
type NetworkAdapterRESTInfo struct {
SysAdapters []struct {
@@ -295,7 +393,16 @@ func parseNetworkAdapterInfo(text string, hw *models.HardwareConfig) {
return
}
hw.NetworkAdapters = nil
var merged []models.NetworkAdapter
seen := make(map[string]int)
for _, existing := range hw.NetworkAdapters {
key := inspurNICKey(existing)
if key == "" {
continue
}
seen[key] = len(merged)
merged = append(merged, existing)
}
for _, adapter := range netInfo.SysAdapters {
var macs []string
for _, port := range adapter.Ports {
@@ -304,23 +411,474 @@ func parseNetworkAdapterInfo(text string, hw *models.HardwareConfig) {
}
}
hw.NetworkAdapters = append(hw.NetworkAdapters, models.NetworkAdapter{
model := normalizeModelLabel(adapter.Model)
if model == "" || looksLikeRawDeviceID(model) {
if resolved := normalizeModelLabel(pciids.DeviceName(adapter.VendorID, adapter.DeviceID)); resolved != "" {
model = resolved
}
}
vendor := normalizeModelLabel(adapter.Vendor)
if vendor == "" {
vendor = normalizeModelLabel(pciids.VendorName(adapter.VendorID))
}
item := models.NetworkAdapter{
Slot: fmt.Sprintf("Slot %d", adapter.Slot),
Location: adapter.Location,
Present: adapter.Present == 1,
Model: strings.TrimSpace(adapter.Model),
Vendor: strings.TrimSpace(adapter.Vendor),
Model: model,
Vendor: vendor,
VendorID: adapter.VendorID,
DeviceID: adapter.DeviceID,
SerialNumber: strings.TrimSpace(adapter.SN),
PartNumber: strings.TrimSpace(adapter.PN),
Firmware: adapter.FwVer,
SerialNumber: normalizeRedisValue(adapter.SN),
PartNumber: normalizeRedisValue(adapter.PN),
Firmware: normalizeRedisValue(adapter.FwVer),
PortCount: adapter.PortNum,
PortType: adapter.PortType,
MACAddresses: macs,
Status: adapter.Status,
}
key := inspurNICKey(item)
if idx, ok := seen[key]; ok {
mergeInspurNIC(&merged[idx], item)
continue
}
if slotIdx := inspurFindNICBySlot(merged, item.Slot); slotIdx >= 0 {
mergeInspurNIC(&merged[slotIdx], item)
if key != "" {
seen[key] = slotIdx
}
continue
}
if key != "" {
seen[key] = len(merged)
}
merged = append(merged, item)
}
hw.NetworkAdapters = merged
}
func inspurMemoryKey(item models.MemoryDIMM) string {
return strings.ToLower(strings.TrimSpace(inspurFirstNonEmpty(item.SerialNumber, item.Slot, item.Location)))
}
func mergeInspurMemoryDIMM(dst *models.MemoryDIMM, src models.MemoryDIMM) {
if dst == nil {
return
}
if strings.TrimSpace(dst.Slot) == "" {
dst.Slot = src.Slot
}
if strings.TrimSpace(dst.Location) == "" {
dst.Location = src.Location
}
dst.Present = dst.Present || src.Present
if dst.SizeMB == 0 {
dst.SizeMB = src.SizeMB
}
if strings.TrimSpace(dst.Type) == "" {
dst.Type = src.Type
}
if strings.TrimSpace(dst.Technology) == "" {
dst.Technology = src.Technology
}
if dst.MaxSpeedMHz == 0 {
dst.MaxSpeedMHz = src.MaxSpeedMHz
}
if dst.CurrentSpeedMHz == 0 {
dst.CurrentSpeedMHz = src.CurrentSpeedMHz
}
if strings.TrimSpace(dst.Manufacturer) == "" {
dst.Manufacturer = src.Manufacturer
}
if strings.TrimSpace(dst.SerialNumber) == "" {
dst.SerialNumber = src.SerialNumber
}
if strings.TrimSpace(dst.PartNumber) == "" {
dst.PartNumber = src.PartNumber
}
if strings.TrimSpace(dst.Status) == "" {
dst.Status = src.Status
}
if dst.Ranks == 0 {
dst.Ranks = src.Ranks
}
}
func inspurPSUKey(item models.PSU) string {
return strings.ToLower(strings.TrimSpace(inspurFirstNonEmpty(item.SerialNumber, item.Slot, item.Model)))
}
func mergeInspurPSU(dst *models.PSU, src models.PSU) {
if dst == nil {
return
}
if strings.TrimSpace(dst.Slot) == "" {
dst.Slot = src.Slot
}
dst.Present = dst.Present || src.Present
if strings.TrimSpace(dst.Model) == "" {
dst.Model = src.Model
}
if strings.TrimSpace(dst.Vendor) == "" {
dst.Vendor = src.Vendor
}
if dst.WattageW == 0 {
dst.WattageW = src.WattageW
}
if strings.TrimSpace(dst.SerialNumber) == "" {
dst.SerialNumber = src.SerialNumber
}
if strings.TrimSpace(dst.PartNumber) == "" {
dst.PartNumber = src.PartNumber
}
if strings.TrimSpace(dst.Firmware) == "" {
dst.Firmware = src.Firmware
}
if strings.TrimSpace(dst.Status) == "" {
dst.Status = src.Status
}
if strings.TrimSpace(dst.InputType) == "" {
dst.InputType = src.InputType
}
if dst.InputPowerW == 0 {
dst.InputPowerW = src.InputPowerW
}
if dst.OutputPowerW == 0 {
dst.OutputPowerW = src.OutputPowerW
}
if dst.InputVoltage == 0 {
dst.InputVoltage = src.InputVoltage
}
if dst.OutputVoltage == 0 {
dst.OutputVoltage = src.OutputVoltage
}
if dst.TemperatureC == 0 {
dst.TemperatureC = src.TemperatureC
}
}
func inspurNICKey(item models.NetworkAdapter) string {
return strings.ToLower(strings.TrimSpace(inspurFirstNonEmpty(item.SerialNumber, strings.Join(item.MACAddresses, ","), item.Slot, item.Location)))
}
func mergeInspurNIC(dst *models.NetworkAdapter, src models.NetworkAdapter) {
if dst == nil {
return
}
if strings.TrimSpace(dst.Slot) == "" {
dst.Slot = src.Slot
}
if strings.TrimSpace(dst.Location) == "" {
dst.Location = src.Location
}
dst.Present = dst.Present || src.Present
if strings.TrimSpace(dst.BDF) == "" {
dst.BDF = src.BDF
}
if strings.TrimSpace(dst.Model) == "" {
dst.Model = src.Model
}
if strings.TrimSpace(dst.Description) == "" {
dst.Description = src.Description
}
if strings.TrimSpace(dst.Vendor) == "" {
dst.Vendor = src.Vendor
}
if dst.VendorID == 0 {
dst.VendorID = src.VendorID
}
if dst.DeviceID == 0 {
dst.DeviceID = src.DeviceID
}
if strings.TrimSpace(dst.SerialNumber) == "" {
dst.SerialNumber = src.SerialNumber
}
if strings.TrimSpace(dst.PartNumber) == "" {
dst.PartNumber = src.PartNumber
}
if strings.TrimSpace(dst.Firmware) == "" {
dst.Firmware = src.Firmware
}
if dst.PortCount == 0 {
dst.PortCount = src.PortCount
}
if strings.TrimSpace(dst.PortType) == "" {
dst.PortType = src.PortType
}
if dst.LinkWidth == 0 {
dst.LinkWidth = src.LinkWidth
}
if strings.TrimSpace(dst.LinkSpeed) == "" {
dst.LinkSpeed = src.LinkSpeed
}
if dst.MaxLinkWidth == 0 {
dst.MaxLinkWidth = src.MaxLinkWidth
}
if strings.TrimSpace(dst.MaxLinkSpeed) == "" {
dst.MaxLinkSpeed = src.MaxLinkSpeed
}
if dst.NUMANode == 0 {
dst.NUMANode = src.NUMANode
}
if strings.TrimSpace(dst.Status) == "" {
dst.Status = src.Status
}
for _, mac := range src.MACAddresses {
mac = strings.TrimSpace(mac)
if mac == "" {
continue
}
found := false
for _, existing := range dst.MACAddresses {
if strings.EqualFold(strings.TrimSpace(existing), mac) {
found = true
break
}
}
if !found {
dst.MACAddresses = append(dst.MACAddresses, mac)
}
}
}
func inspurFindNICBySlot(items []models.NetworkAdapter, slot string) int {
slot = strings.ToLower(strings.TrimSpace(slot))
if slot == "" {
return -1
}
for i := range items {
if strings.ToLower(strings.TrimSpace(items[i].Slot)) == slot {
return i
}
}
return -1
}
func inspurFirstNonEmpty(values ...string) string {
for _, value := range values {
if strings.TrimSpace(value) != "" {
return strings.TrimSpace(value)
}
}
return ""
}
func parseFanSensors(text string) []models.SensorReading {
re := regexp.MustCompile(`RESTful fan info:\s*(\{[\s\S]*?\})\s*RESTful diskbackplane`)
match := re.FindStringSubmatch(text)
if match == nil {
return nil
}
jsonStr := strings.ReplaceAll(match[1], "\n", "")
var fanInfo FanRESTInfo
if err := json.Unmarshal([]byte(jsonStr), &fanInfo); err != nil {
return nil
}
out := make([]models.SensorReading, 0, len(fanInfo.Fans)+1)
for _, fan := range fanInfo.Fans {
name := strings.TrimSpace(fan.FanName)
if name == "" {
name = fmt.Sprintf("FAN%d", fan.ID)
}
status := normalizeComponentStatus(fan.StatusStr, fan.Status, fan.Present)
raw := fmt.Sprintf("rpm=%d pct=%d model=%s max_rpm=%d", fan.SpeedRPM, fan.SpeedPercent, fan.FanModel, fan.MaxSpeedRPM)
out = append(out, models.SensorReading{
Name: name,
Type: "fan_speed",
Value: float64(fan.SpeedRPM),
Unit: "RPM",
RawValue: raw,
Status: status,
})
}
if fanInfo.FansPower > 0 {
out = append(out, models.SensorReading{
Name: "Fans_Power",
Type: "power",
Value: float64(fanInfo.FansPower),
Unit: "W",
RawValue: fmt.Sprintf("%d", fanInfo.FansPower),
Status: "OK",
})
}
return out
}
func parseFanEvents(text string) []models.Event {
re := regexp.MustCompile(`RESTful fan info:\s*(\{[\s\S]*?\})\s*RESTful diskbackplane`)
match := re.FindStringSubmatch(text)
if match == nil {
return nil
}
jsonStr := strings.ReplaceAll(match[1], "\n", "")
var fanInfo FanRESTInfo
if err := json.Unmarshal([]byte(jsonStr), &fanInfo); err != nil {
return nil
}
var events []models.Event
for _, fan := range fanInfo.Fans {
status := normalizeComponentStatus(fan.StatusStr, fan.Status, fan.Present)
if isHealthyComponentStatus(status) {
continue
}
name := strings.TrimSpace(fan.FanName)
if name == "" {
name = fmt.Sprintf("FAN%d", fan.ID)
}
severity := models.SeverityWarning
lowStatus := strings.ToLower(status)
if strings.Contains(lowStatus, "critical") || strings.Contains(lowStatus, "fail") || strings.Contains(lowStatus, "error") {
severity = models.SeverityCritical
}
events = append(events, models.Event{
ID: fmt.Sprintf("fan_%d_status", fan.ID),
Timestamp: time.Now(),
Source: "Fan",
SensorType: "fan",
SensorName: name,
EventType: "Fan Status",
Severity: severity,
Description: fmt.Sprintf("%s reports %s", name, status),
RawData: fmt.Sprintf("rpm=%d pct=%d model=%s", fan.SpeedRPM, fan.SpeedPercent, fan.FanModel),
})
}
return events
}
func parseDiskBackplaneSensors(text string) []models.SensorReading {
re := regexp.MustCompile(`RESTful diskbackplane info:\s*(\[[\s\S]*?\])\s*BMC`)
match := re.FindStringSubmatch(text)
if match == nil {
return nil
}
jsonStr := strings.ReplaceAll(match[1], "\n", "")
var backplaneInfo DiskBackplaneRESTInfo
if err := json.Unmarshal([]byte(jsonStr), &backplaneInfo); err != nil {
return nil
}
out := make([]models.SensorReading, 0, len(backplaneInfo))
for _, bp := range backplaneInfo {
if bp.Present != 1 {
continue
}
name := fmt.Sprintf("Backplane%d_Temp", bp.BackplaneIndex)
status := "OK"
if bp.Temperature <= 0 {
status = "unknown"
}
raw := fmt.Sprintf("front=%d ports=%d drives=%d cpld=%s", bp.Front, bp.PortCount, bp.DriverCount, bp.CPLDVersion)
out = append(out, models.SensorReading{
Name: name,
Type: "temperature",
Value: float64(bp.Temperature),
Unit: "C",
RawValue: raw,
Status: status,
})
}
return out
}
func parsePSUSummarySensors(text string) []models.SensorReading {
re := regexp.MustCompile(`RESTful PSU info:\s*(\{[\s\S]*?\})\s*RESTful Network`)
match := re.FindStringSubmatch(text)
if match == nil {
return nil
}
jsonStr := strings.ReplaceAll(match[1], "\n", "")
var psuInfo PSURESTInfo
if err := json.Unmarshal([]byte(jsonStr), &psuInfo); err != nil {
return nil
}
out := make([]models.SensorReading, 0, len(psuInfo.PowerSupplies)*3+1)
if psuInfo.PresentPowerReading > 0 {
out = append(out, models.SensorReading{
Name: "PSU_Present_Power_Reading",
Type: "power",
Value: float64(psuInfo.PresentPowerReading),
Unit: "W",
RawValue: fmt.Sprintf("%d", psuInfo.PresentPowerReading),
Status: "OK",
})
}
for _, psu := range psuInfo.PowerSupplies {
if psu.Present != 1 {
continue
}
status := normalizeComponentStatus(psu.Status)
out = append(out, models.SensorReading{
Name: fmt.Sprintf("PSU%d_InputPower", psu.ID),
Type: "power",
Value: float64(psu.PSInPower),
Unit: "W",
RawValue: fmt.Sprintf("%d", psu.PSInPower),
Status: status,
})
out = append(out, models.SensorReading{
Name: fmt.Sprintf("PSU%d_OutputPower", psu.ID),
Type: "power",
Value: float64(psu.PSOutPower),
Unit: "W",
RawValue: fmt.Sprintf("%d", psu.PSOutPower),
Status: status,
})
out = append(out, models.SensorReading{
Name: fmt.Sprintf("PSU%d_Temp", psu.ID),
Type: "temperature",
Value: float64(psu.PSUMaxTemp),
Unit: "C",
RawValue: fmt.Sprintf("%d", psu.PSUMaxTemp),
Status: status,
})
}
return out
}
func normalizeComponentStatus(values ...string) string {
for _, v := range values {
s := strings.TrimSpace(v)
if s == "" {
continue
}
return s
}
return "unknown"
}
func isHealthyComponentStatus(status string) bool {
switch strings.ToLower(strings.TrimSpace(status)) {
case "", "ok", "normal", "present", "enabled":
return true
default:
return false
}
}
var rawDeviceIDLikeRegex = regexp.MustCompile(`(?i)^(?:0x)?[0-9a-f]{3,4}$`)
func looksLikeRawDeviceID(v string) bool {
v = strings.TrimSpace(v)
if v == "" {
return true
}
return rawDeviceIDLikeRegex.MatchString(v)
}
func parseMemoryEvents(text string) []models.Event {
@@ -423,6 +981,63 @@ func extractComponentFirmware(text string, hw *models.HardwareConfig) {
}
}
}
// Extract BMC, CPLD and VR firmware from RESTful version info section.
// The JSON is a flat array: [{"id":N,"dev_name":"...","dev_version":"..."}, ...]
reVer := regexp.MustCompile(`RESTful version info:\s*(\[[\s\S]*?\])\s*RESTful`)
if match := reVer.FindStringSubmatch(text); match != nil {
type verEntry struct {
DevName string `json:"dev_name"`
DevVersion string `json:"dev_version"`
}
var entries []verEntry
if err := json.Unmarshal([]byte(match[1]), &entries); err == nil {
for _, e := range entries {
name := normalizeVersionInfoName(e.DevName)
if name == "" {
continue
}
version := strings.TrimSpace(e.DevVersion)
if version == "" {
continue
}
if existingFW[name] {
continue
}
hw.Firmware = append(hw.Firmware, models.FirmwareInfo{
DeviceName: name,
Version: version,
})
existingFW[name] = true
}
}
}
}
// normalizeVersionInfoName converts RESTful version info dev_name to a clean label.
// Returns "" for entries that should be skipped (inactive BMC, PSU slots).
func normalizeVersionInfoName(name string) string {
name = strings.TrimSpace(name)
if name == "" {
return ""
}
// Skip PSU_N entries — firmware already extracted from PSU info section.
if regexp.MustCompile(`(?i)^PSU_\d+$`).MatchString(name) {
return ""
}
// Skip the inactive BMC partition.
if strings.HasPrefix(strings.ToLower(name), "inactivate(") {
return ""
}
// Active BMC: "Activate(BMC1)" → "BMC"
if strings.HasPrefix(strings.ToLower(name), "activate(") {
return "BMC"
}
// Strip trailing "Version" suffix (case-insensitive), e.g. "MainBoard0CPLDVersion" → "MainBoard0CPLD"
if strings.HasSuffix(strings.ToLower(name), "version") {
name = name[:len(name)-len("version")]
}
return strings.TrimSpace(name)
}
// DiskBackplaneRESTInfo represents the RESTful diskbackplane info structure
@@ -452,28 +1067,88 @@ func parseDiskBackplaneInfo(text string, hw *models.HardwareConfig) {
return
}
// Create storage entries based on backplane info
presentByBackplane := make(map[int]int)
totalPresent := 0
for _, bp := range backplaneInfo {
if bp.Present != 1 {
continue
}
if bp.DriverCount <= 0 {
continue
}
limit := bp.DriverCount
if bp.PortCount > 0 && limit > bp.PortCount {
limit = bp.PortCount
}
presentByBackplane[bp.BackplaneIndex] = limit
totalPresent += limit
}
if totalPresent == 0 {
return
}
existingPresent := countPresentStorage(hw.Storage)
remaining := totalPresent - existingPresent
if remaining <= 0 {
return
}
for _, bp := range backplaneInfo {
if bp.Present != 1 || remaining <= 0 {
continue
}
driveCount := presentByBackplane[bp.BackplaneIndex]
if driveCount <= 0 {
continue
}
location := "Rear"
if bp.Front == 1 {
location = "Front"
}
// Create entries for each port (disk slot)
for i := 0; i < bp.PortCount; i++ {
isPresent := i < bp.DriverCount
for i := 0; i < driveCount && remaining > 0; i++ {
slot := fmt.Sprintf("BP%d:%d", bp.BackplaneIndex, i)
if hasStorageSlot(hw.Storage, slot) {
continue
}
hw.Storage = append(hw.Storage, models.Storage{
Slot: fmt.Sprintf("%d", i),
Present: isPresent,
Slot: slot,
Present: true,
Location: location,
BackplaneID: bp.BackplaneIndex,
Type: "HDD",
})
remaining--
}
}
}
func countPresentStorage(storage []models.Storage) int {
count := 0
for _, dev := range storage {
if dev.Present {
count++
continue
}
if strings.TrimSpace(dev.Slot) != "" && (normalizeRedisValue(dev.Model) != "" || normalizeRedisValue(dev.SerialNumber) != "" || dev.SizeGB > 0) {
count++
}
}
return count
}
func hasStorageSlot(storage []models.Storage, slot string) bool {
slot = strings.ToLower(strings.TrimSpace(slot))
if slot == "" {
return false
}
for _, dev := range storage {
if strings.ToLower(strings.TrimSpace(dev.Slot)) == slot {
return true
}
}
return false
}

View File

@@ -0,0 +1,224 @@
package inspur
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestParseNetworkAdapterInfo_ResolvesModelFromPCIIDsForRawHexModel(t *testing.T) {
text := `RESTful Network Adapter info:
{
"sys_adapters": [
{
"id": 1,
"name": "NIC1",
"Location": "#CPU0_PCIE4",
"present": 1,
"slot": 4,
"vendor_id": 32902,
"device_id": 5409,
"vendor": "",
"model": "0x1521",
"fw_ver": "",
"status": "OK",
"sn": "",
"pn": "",
"port_num": 4,
"port_type": "Base-T",
"ports": []
}
]
}
RESTful fan`
hw := &models.HardwareConfig{}
parseNetworkAdapterInfo(text, hw)
if len(hw.NetworkAdapters) != 1 {
t.Fatalf("expected 1 network adapter, got %d", len(hw.NetworkAdapters))
}
got := hw.NetworkAdapters[0]
if got.Model == "" {
t.Fatalf("expected NIC model resolved from pci.ids, got empty")
}
if !strings.Contains(strings.ToUpper(got.Model), "I350") {
t.Fatalf("expected I350 in model, got %q", got.Model)
}
if got.Vendor == "" {
t.Fatalf("expected NIC vendor resolved from pci.ids")
}
}
func TestParseNetworkAdapterInfo_MergesIntoExistingInventory(t *testing.T) {
text := `RESTful Network Adapter info:
{
"sys_adapters": [
{
"id": 1,
"name": "NIC1",
"Location": "#CPU0_PCIE4",
"present": 1,
"slot": 4,
"vendor_id": 32902,
"device_id": 5409,
"vendor": "Mellanox",
"model": "ConnectX-6",
"fw_ver": "22.1.0",
"status": "OK",
"sn": "",
"pn": "",
"port_num": 2,
"port_type": "QSFP",
"ports": [
{ "id": 1, "mac_addr": "00:11:22:33:44:55" }
]
}
]
}
RESTful fan`
hw := &models.HardwareConfig{
NetworkAdapters: []models.NetworkAdapter{
{
Slot: "Slot 4",
BDF: "0000:17:00.0",
SerialNumber: "NIC-SN-1",
Present: true,
},
},
}
parseNetworkAdapterInfo(text, hw)
if len(hw.NetworkAdapters) != 1 {
t.Fatalf("expected merged single adapter, got %d", len(hw.NetworkAdapters))
}
got := hw.NetworkAdapters[0]
if got.BDF != "0000:17:00.0" {
t.Fatalf("expected existing BDF to survive merge, got %q", got.BDF)
}
if got.Model != "ConnectX-6" {
t.Fatalf("expected model from component log, got %q", got.Model)
}
if got.SerialNumber != "NIC-SN-1" {
t.Fatalf("expected serial from existing inventory to survive merge, got %q", got.SerialNumber)
}
if len(got.MACAddresses) != 1 || got.MACAddresses[0] != "00:11:22:33:44:55" {
t.Fatalf("expected MAC addresses from component log, got %#v", got.MACAddresses)
}
}
func TestParseComponentLogSensors_ExtractsFanBackplaneAndPSUSummary(t *testing.T) {
text := `RESTful PSU info:
{
"power_supplies": [
{ "id": 0, "present": 1, "status": "OK", "ps_in_power": 123, "ps_out_power": 110, "psu_max_temperature": 41 }
],
"present_power_reading": 999
}
RESTful Network Adapter info:
{ "sys_adapters": [] }
RESTful fan info:
{
"fans": [
{ "id": 1, "fan_name": "FAN0_F_Speed", "present": "OK", "status": "OK", "status_str": "OK", "speed_rpm": 9200, "speed_percent": 35, "max_speed_rpm": 20000, "fan_model": "6056" }
],
"fans_power": 33
}
RESTful diskbackplane info:
[
{ "port_count": 8, "driver_count": 4, "front": 1, "backplane_index": 0, "present": 1, "cpld_version": "3.1", "temperature": 18 }
]
BMC`
sensors := ParseComponentLogSensors([]byte(text))
if len(sensors) == 0 {
t.Fatalf("expected sensors from component.log, got none")
}
has := func(name string) bool {
for _, s := range sensors {
if s.Name == name {
return true
}
}
return false
}
if !has("FAN0_F_Speed") {
t.Fatalf("expected FAN0_F_Speed sensor in parsed output")
}
if !has("Backplane0_Temp") {
t.Fatalf("expected Backplane0_Temp sensor in parsed output")
}
if !has("PSU_Present_Power_Reading") {
t.Fatalf("expected PSU_Present_Power_Reading sensor in parsed output")
}
}
func TestParseComponentLogEvents_FanCriticalStatus(t *testing.T) {
text := `RESTful fan info:
{
"fans": [
{ "id": 7, "fan_name": "FAN3_R_Speed", "present": "OK", "status": "Critical", "status_str": "Critical", "speed_rpm": 0, "speed_percent": 0, "max_speed_rpm": 20000, "fan_model": "6056" }
],
"fans_power": 0
}
RESTful diskbackplane info:
[]
BMC`
events := ParseComponentLogEvents([]byte(text))
if len(events) != 1 {
t.Fatalf("expected 1 fan event, got %d", len(events))
}
if events[0].EventType != "Fan Status" {
t.Fatalf("expected Fan Status event type, got %q", events[0].EventType)
}
if events[0].Severity != models.SeverityCritical {
t.Fatalf("expected critical severity, got %q", events[0].Severity)
}
}
func TestParseHDDInfo_MergesIntoExistingStorage(t *testing.T) {
text := `RESTful HDD info:
[
{
"id": 1,
"present": 1,
"enable": 1,
"SN": "SER123",
"model": "Sample SSD",
"capacity": 1024,
"manufacture": "ACME",
"firmware": "1.0.0",
"locationstring": "OB01",
"capablespeed": 6
}
]
RESTful PSU`
hw := &models.HardwareConfig{
Storage: []models.Storage{
{
Slot: "OB01",
Type: "SSD",
},
},
}
parseHDDInfo(text, hw)
if len(hw.Storage) != 1 {
t.Fatalf("expected 1 storage item, got %d", len(hw.Storage))
}
if hw.Storage[0].SerialNumber != "SER123" {
t.Fatalf("expected serial from HDD section, got %q", hw.Storage[0].SerialNumber)
}
if hw.Storage[0].Model != "Sample SSD" {
t.Fatalf("expected model from HDD section, got %q", hw.Storage[0].Model)
}
if hw.Storage[0].Firmware != "1.0.0" {
t.Fatalf("expected firmware from HDD section, got %q", hw.Storage[0].Firmware)
}
}

View File

@@ -0,0 +1,33 @@
package inspur
import "testing"
func TestParseIDLLog_UsesBMCSourceForEventLogs(t *testing.T) {
content := []byte(`|2025-12-02T17:54:27+08:00|MEMORY|Assert|Warning|0C180401|CPU1_C4D0 Memory Device Disabled - Assert|`)
events := ParseIDLLog(content)
if len(events) != 1 {
t.Fatalf("expected 1 event, got %d", len(events))
}
if events[0].Source != "BMC" {
t.Fatalf("expected IDL events to use BMC source, got %#v", events[0])
}
if events[0].SensorName != "CPU1_C4D0" {
t.Fatalf("expected extracted DIMM component ref, got %#v", events[0])
}
}
func TestParseSyslog_UsesHostSourceAndProcessAsSensorName(t *testing.T) {
content := []byte(`<13>2026-03-15T14:03:11+00:00 host123 systemd[1]: Started Example Service`)
events := ParseSyslog(content, "syslog/info")
if len(events) != 1 {
t.Fatalf("expected 1 event, got %d", len(events))
}
if events[0].Source != "syslog" {
t.Fatalf("expected syslog source, got %#v", events[0])
}
if events[0].SensorName != "systemd[1]" {
t.Fatalf("expected process name in sensor/component slot, got %#v", events[0])
}
}

View File

@@ -103,8 +103,9 @@ func extractBoardInfo(fruList []models.FRUInfo, hw *models.HardwareConfig) {
return
}
// Look for the main board/chassis FRU entry
// Usually it's the first entry or one with "Builtin FRU" or containing board info
// Look for the main board/chassis FRU entry.
// Keep the first non-empty serial as the server serial and avoid overwriting it
// with module-specific serials (e.g., SCM_FRU).
for _, fru := range fruList {
// Skip empty entries
if fru.ProductName == "" && fru.SerialNumber == "" {
@@ -118,25 +119,23 @@ func extractBoardInfo(fruList []models.FRUInfo, hw *models.HardwareConfig) {
strings.Contains(desc, "chassis") ||
strings.Contains(desc, "board")
// If we haven't set board info yet, or this is a main board entry
if hw.BoardInfo.ProductName == "" || isMainBoard {
if fru.ProductName != "" {
hw.BoardInfo.ProductName = fru.ProductName
}
if fru.SerialNumber != "" {
hw.BoardInfo.SerialNumber = fru.SerialNumber
}
if fru.Manufacturer != "" {
hw.BoardInfo.Manufacturer = fru.Manufacturer
}
if fru.PartNumber != "" {
hw.BoardInfo.PartNumber = fru.PartNumber
}
if fru.SerialNumber != "" && hw.BoardInfo.SerialNumber == "" {
hw.BoardInfo.SerialNumber = fru.SerialNumber
}
if fru.ProductName != "" && (hw.BoardInfo.ProductName == "" || isMainBoard) {
hw.BoardInfo.ProductName = fru.ProductName
}
// Manufacturer from non-main FRU entries (e.g. PSU vendor) should not become server vendor.
if fru.Manufacturer != "" && isMainBoard && hw.BoardInfo.Manufacturer == "" {
hw.BoardInfo.Manufacturer = fru.Manufacturer
}
if fru.PartNumber != "" && (hw.BoardInfo.PartNumber == "" || isMainBoard) {
hw.BoardInfo.PartNumber = fru.PartNumber
}
// If we found a main board entry, stop searching
if isMainBoard && fru.ProductName != "" && fru.SerialNumber != "" {
break
}
// Main board entry with complete data is good enough to stop.
if isMainBoard && hw.BoardInfo.ProductName != "" && hw.BoardInfo.SerialNumber != "" {
break
}
}
}

View File

@@ -0,0 +1,59 @@
package inspur
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestExtractBoardInfo_PreservesBuiltinSerial(t *testing.T) {
hw := &models.HardwareConfig{}
fruList := []models.FRUInfo{
{
Description: "Builtin FRU Device (ID 0)",
SerialNumber: "21D634101",
},
{
Description: "SCM_FRU (ID 8)",
SerialNumber: "CAR509K10613C10",
ProductName: "CA",
Manufacturer: "inagile",
PartNumber: "YZCA-02758-105",
},
}
extractBoardInfo(fruList, hw)
if hw.BoardInfo.SerialNumber != "21D634101" {
t.Fatalf("expected board serial 21D634101, got %q", hw.BoardInfo.SerialNumber)
}
if hw.BoardInfo.ProductName != "CA" {
t.Fatalf("expected product name CA, got %q", hw.BoardInfo.ProductName)
}
}
func TestExtractBoardInfo_DoesNotUsePSUVendorAsBoardManufacturer(t *testing.T) {
hw := &models.HardwareConfig{}
fruList := []models.FRUInfo{
{
Description: "Builtin FRU Device (ID 0)",
SerialNumber: "2KD605238",
},
{
Description: "PSU0_FRU (ID 30)",
SerialNumber: "PMR315HS10F1A",
ProductName: "AP-CR3000F12BY",
Manufacturer: "APLUSPOWER",
PartNumber: "18XA1M43400C2",
},
}
extractBoardInfo(fruList, hw)
if hw.BoardInfo.SerialNumber != "2KD605238" {
t.Fatalf("expected board serial 2KD605238, got %q", hw.BoardInfo.SerialNumber)
}
if hw.BoardInfo.Manufacturer != "" {
t.Fatalf("expected empty board manufacturer, got %q", hw.BoardInfo.Manufacturer)
}
}

View File

@@ -0,0 +1,117 @@
package inspur
import (
"regexp"
"sort"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
var reFaultGPU = regexp.MustCompile(`\bF_GPU(\d+)\b`)
func applyGPUStatusFromEvents(hw *models.HardwareConfig, events []models.Event) {
if hw == nil || len(hw.GPUs) == 0 {
return
}
gpuByIndex := make(map[int]*models.GPU)
for i := range hw.GPUs {
gpu := &hw.GPUs[i]
idx, ok := extractLogicalGPUIndex(gpu.Slot)
if !ok {
continue
}
gpuByIndex[idx] = gpu
gpu.StatusHistory = nil
gpu.ErrorDescription = ""
}
relevantEvents := make([]models.Event, 0)
for _, e := range events {
if !isGPUFaultEvent(e) || len(extractFaultyGPUSet(e.Description)) == 0 {
continue
}
relevantEvents = append(relevantEvents, e)
}
if len(relevantEvents) == 0 {
for _, gpu := range gpuByIndex {
if strings.TrimSpace(gpu.Status) == "" {
gpu.Status = "OK"
}
}
return
}
sort.Slice(relevantEvents, func(i, j int) bool {
return relevantEvents[i].Timestamp.Before(relevantEvents[j].Timestamp)
})
currentStatus := make(map[int]string, len(gpuByIndex))
lastCriticalDetails := make(map[int]string, len(gpuByIndex))
for idx := range gpuByIndex {
currentStatus[idx] = "OK"
}
for _, e := range relevantEvents {
faultySet := extractFaultyGPUSet(e.Description)
for idx, gpu := range gpuByIndex {
newStatus := "OK"
if faultySet[idx] {
newStatus = "Critical"
lastCriticalDetails[idx] = strings.TrimSpace(e.Description)
}
if currentStatus[idx] != newStatus {
gpu.StatusHistory = append(gpu.StatusHistory, models.StatusHistoryEntry{
Status: newStatus,
ChangedAt: e.Timestamp,
Details: strings.TrimSpace(e.Description),
})
ts := e.Timestamp
gpu.StatusChangedAt = &ts
currentStatus[idx] = newStatus
}
ts := e.Timestamp
gpu.StatusCheckedAt = &ts
}
}
for idx, gpu := range gpuByIndex {
gpu.Status = currentStatus[idx]
if gpu.Status == "Critical" {
gpu.ErrorDescription = lastCriticalDetails[idx]
} else {
gpu.ErrorDescription = ""
}
if gpu.StatusCheckedAt == nil && strings.TrimSpace(gpu.Status) == "" {
gpu.Status = "OK"
}
}
}
func extractFaultyGPUSet(description string) map[int]bool {
faulty := make(map[int]bool)
matches := reFaultGPU.FindAllStringSubmatch(description, -1)
for _, m := range matches {
if len(m) < 2 {
continue
}
idx, err := strconv.Atoi(m[1])
if err == nil && idx >= 0 {
faulty[idx] = true
}
}
return faulty
}
func isGPUFaultEvent(e models.Event) bool {
desc := strings.ToLower(e.Description)
if strings.Contains(desc, "bios miss f_gpu") {
return true
}
return strings.EqualFold(strings.TrimSpace(e.ID), "17FFB002")
}

View File

@@ -0,0 +1,69 @@
package inspur
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestAppendHGXFirmwareFromHWInfo_AppendsInventoryEntries(t *testing.T) {
hw := &models.HardwareConfig{
Firmware: []models.FirmwareInfo{
{DeviceName: "BIOS", Version: "1.0.0"},
},
}
content := []byte(`
{
"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/HGX_FW_BMC_0",
"Id": "HGX_FW_BMC_0",
"Oem": {
"Nvidia": {
"ActiveFirmwareSlot": {"Version": "25.05-A"},
"InactiveFirmwareSlot": {"Version": "25.04-B"}
}
},
"Version": "25.05-A",
"WriteProtected": false
}
{
"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/HGX_FW_GPU_SXM_1",
"Id": "HGX_FW_GPU_SXM_1",
"Version": "97.00.C5.00.0E",
"WriteProtected": false
}
{
"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/HGX_Driver_GPU_SXM_1",
"Id": "HGX_Driver_GPU_SXM_1",
"Version": "",
"WriteProtected": false
}
`)
appendHGXFirmwareFromHWInfo(content, hw)
if len(hw.Firmware) != 5 {
t.Fatalf("expected 5 firmware entries after append, got %d", len(hw.Firmware))
}
seen := make(map[string]string)
for _, fw := range hw.Firmware {
seen[fw.DeviceName] = fw.Version
}
if seen["HGX_FW_BMC_0"] != "25.05-A" {
t.Fatalf("expected HGX_FW_BMC_0 version 25.05-A, got %q", seen["HGX_FW_BMC_0"])
}
if seen["HGX_FW_BMC_0 Active Slot"] != "25.05-A" {
t.Fatalf("expected active slot version, got %q", seen["HGX_FW_BMC_0 Active Slot"])
}
if seen["HGX_FW_BMC_0 Inactive Slot"] != "25.04-B" {
t.Fatalf("expected inactive slot version, got %q", seen["HGX_FW_BMC_0 Inactive Slot"])
}
if seen["HGX_FW_GPU_SXM_1"] != "97.00.C5.00.0E" {
t.Fatalf("expected GPU FW entry, got %q", seen["HGX_FW_GPU_SXM_1"])
}
if _, ok := seen["HGX_Driver_GPU_SXM_1"]; ok {
t.Fatalf("did not expect empty version driver entry")
}
}

View File

@@ -0,0 +1,174 @@
package inspur
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestEnrichGPUsFromHGXHWInfo_UsesHGXLogicalMapping(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#GPU6"},
{Slot: "#GPU7"},
{Slot: "#GPU0"},
{Slot: "#CPU0_PE1_E_BMC", Model: "AST2500 VGA"},
},
}
content := []byte(`
# curl -X GET http://127.0.0.1/redfish/v1/Chassis/HGX_GPU_SXM_1/Assembly
{"Name":"GPU Board Assembly","Model":"B200 180GB HBM3e","PartNumber":"PN1","SerialNumber":"SXM1SN"}
# curl -X GET http://127.0.0.1/redfish/v1/Chassis/HGX_GPU_SXM_3/Assembly
{"Name":"GPU Board Assembly","Model":"B200 180GB HBM3e","PartNumber":"PN3","SerialNumber":"SXM3SN"}
# curl -X GET http://127.0.0.1/redfish/v1/Chassis/HGX_GPU_SXM_5/Assembly
{"Name":"GPU Board Assembly","Model":"B200 180GB HBM3e","PartNumber":"PN5","SerialNumber":"SXM5SN"}
{"Id":"HGX_FW_GPU_SXM_1","Version":"FW1"}
{"Id":"HGX_FW_GPU_SXM_3","Version":"FW3"}
{"Id":"HGX_FW_GPU_SXM_5","Version":"FW5"}
{"Id":"HGX_InfoROM_GPU_SXM_3","Version":"IR3"}
`)
enrichGPUsFromHGXHWInfo(content, hw)
if hw.GPUs[0].SerialNumber != "SXM3SN" {
t.Fatalf("expected #GPU6 to map to SXM3 serial, got %q", hw.GPUs[0].SerialNumber)
}
if hw.GPUs[1].SerialNumber != "SXM1SN" {
t.Fatalf("expected #GPU7 to map to SXM1 serial, got %q", hw.GPUs[1].SerialNumber)
}
if hw.GPUs[2].SerialNumber != "SXM5SN" {
t.Fatalf("expected #GPU0 to map to SXM5 serial, got %q", hw.GPUs[2].SerialNumber)
}
if hw.GPUs[0].Firmware != "FW3" {
t.Fatalf("expected #GPU6 firmware FW3, got %q", hw.GPUs[0].Firmware)
}
if hw.GPUs[0].VideoBIOS != "IR3" {
t.Fatalf("expected #GPU6 InfoROM in VideoBIOS IR3, got %q", hw.GPUs[0].VideoBIOS)
}
if hw.GPUs[2].Firmware != "FW5" {
t.Fatalf("expected #GPU0 firmware FW5, got %q", hw.GPUs[2].Firmware)
}
for _, g := range hw.GPUs {
if g.Slot == "#CPU0_PE1_E_BMC" {
t.Fatalf("expected non-HGX BMC VGA entry to be filtered out")
}
}
}
func TestEnrichGPUsFromHGXHWInfo_AddsMissingLogicalGPU(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#GPU0"},
{Slot: "#GPU1"},
{Slot: "#GPU2"},
{Slot: "#GPU3"},
{Slot: "#GPU4"},
{Slot: "#GPU5"},
{Slot: "#GPU7"},
},
}
content := []byte(`
# curl -X GET http://127.0.0.1/redfish/v1/Chassis/HGX_GPU_SXM_3/Assembly
{"Name":"GPU Board Assembly","Model":"B200 180GB HBM3e","PartNumber":"PN3","SerialNumber":"SXM3SN"}
`)
enrichGPUsFromHGXHWInfo(content, hw)
found := false
for _, g := range hw.GPUs {
if g.Slot == "#GPU6" {
found = true
if g.SerialNumber != "SXM3SN" {
t.Fatalf("expected synthesized #GPU6 serial SXM3SN, got %q", g.SerialNumber)
}
}
}
if !found {
t.Fatalf("expected synthesized #GPU6 entry")
}
}
func TestApplyGPUStatusFromEvents_MarksFaultedGPU(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#GPU6"},
{Slot: "#GPU5"},
},
}
events := []models.Event{
{
ID: "17FFB002",
Timestamp: time.Now(),
Description: "PCIe Present mismatch BIOS miss F_GPU6",
},
}
applyGPUStatusFromEvents(hw, events)
if hw.GPUs[0].Status != "Critical" {
t.Fatalf("expected #GPU6 status Critical, got %q", hw.GPUs[0].Status)
}
if hw.GPUs[1].Status != "OK" {
t.Fatalf("expected healthy GPU status OK, got %q", hw.GPUs[1].Status)
}
}
func TestApplyGPUStatusFromEvents_UsesLatestEventAsCurrentStatusAndKeepsHistory(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#GPU1"},
{Slot: "#GPU3"},
{Slot: "#GPU6"},
},
}
events := []models.Event{
{
ID: "17FFB002",
Timestamp: time.Date(2026, 1, 12, 22, 51, 16, 0, time.FixedZone("UTC+8", 8*3600)),
Description: "PCIe Present mismatch BIOS miss F_GPU1 F_GPU3 F_GPU6",
},
{
ID: "17FFB002",
Timestamp: time.Date(2026, 1, 12, 23, 5, 18, 0, time.FixedZone("UTC+8", 8*3600)),
Description: "PCIe Present mismatch BIOS miss F_GPU6",
},
}
applyGPUStatusFromEvents(hw, events)
if hw.GPUs[0].Status != "OK" {
t.Fatalf("expected #GPU1 to recover to OK on latest event, got %q", hw.GPUs[0].Status)
}
if hw.GPUs[1].Status != "OK" {
t.Fatalf("expected #GPU3 to recover to OK on latest event, got %q", hw.GPUs[1].Status)
}
if hw.GPUs[2].Status != "Critical" {
t.Fatalf("expected #GPU6 to remain Critical, got %q", hw.GPUs[2].Status)
}
if len(hw.GPUs[0].StatusHistory) == 0 {
t.Fatalf("expected #GPU1 status history to be populated")
}
}
func TestParseIDLLog_ParsesStructuredJSONLine(t *testing.T) {
content := []byte(`{ "MESSAGE": "|2026-01-12T23:05:18+08:00|PCIE|Assert|Critical|17FFB002|PCIe Present mismatch BIOS miss F_GPU6 - Assert|" }`)
events := ParseIDLLog(content)
if len(events) != 1 {
t.Fatalf("expected 1 event from JSON line, got %d", len(events))
}
if events[0].ID != "17FFB002" {
t.Fatalf("expected event ID 17FFB002, got %q", events[0].ID)
}
if events[0].Source != "BMC" {
t.Fatalf("expected BMC source for IDL event, got %q", events[0].Source)
}
if events[0].SensorType != "pcie" {
t.Fatalf("expected component type pcie, got %#v", events[0])
}
}

View File

@@ -0,0 +1,360 @@
package inspur
import (
"fmt"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
type hgxGPUAssemblyInfo struct {
Model string
Part string
Serial string
}
type hgxGPUFirmwareInfo struct {
Firmware string
InfoROM string
}
type hgxFirmwareInventoryEntry struct {
ID string
Version string
ActiveVersion string
InactiveVersion string
}
// Logical GPU index mapping used by HGX B200 UI ordering.
// Example from real logs/UI:
// GPU0->SXM5, GPU1->SXM7, GPU2->SXM6, GPU3->SXM8, GPU4->SXM2, GPU5->SXM4, GPU6->SXM3, GPU7->SXM1.
var hgxLogicalToSXM = map[int]int{
0: 5,
1: 7,
2: 6,
3: 8,
4: 2,
5: 4,
6: 3,
7: 1,
}
var (
reHGXGPUBlock = regexp.MustCompile(`(?s)/redfish/v1/Chassis/HGX_GPU_SXM_(\d+)/Assembly.*?"Name":\s*"GPU Board Assembly".*?"Model":\s*"([^"]+)".*?"PartNumber":\s*"([^"]+)".*?"SerialNumber":\s*"([^"]+)"`)
reHGXFWBlock = regexp.MustCompile(`(?s)"Id":\s*"HGX_FW_GPU_SXM_(\d+)".*?"Version":\s*"([^"]*)"`)
reHGXInfoROM = regexp.MustCompile(`(?s)"Id":\s*"HGX_InfoROM_GPU_SXM_(\d+)".*?"Version":\s*"([^"]*)"`)
reIDLine = regexp.MustCompile(`"Id":\s*"([^"]+)"`)
reVersion = regexp.MustCompile(`"Version":\s*"([^"]*)"`)
reSlotGPU = regexp.MustCompile(`(?i)gpu\s*#?\s*(\d+)`)
)
func enrichGPUsFromHGXHWInfo(content []byte, hw *models.HardwareConfig) {
if hw == nil || len(hw.GPUs) == 0 || len(content) == 0 {
return
}
bySXM := parseHGXGPUAssembly(content)
if len(bySXM) == 0 {
return
}
fwBySXM := parseHGXGPUFirmware(content)
normalizeHGXGPUInventory(hw, bySXM)
for i := range hw.GPUs {
gpu := &hw.GPUs[i]
logicalIdx, ok := extractLogicalGPUIndex(gpu.Slot)
if !ok {
// Keep existing info if slot index cannot be determined.
continue
}
sxm := resolveSXMIndex(logicalIdx, bySXM)
info, found := bySXM[sxm]
if !found {
continue
}
if strings.TrimSpace(gpu.SerialNumber) == "" {
gpu.SerialNumber = info.Serial
}
if shouldReplaceGPUModel(gpu.Model) {
gpu.Model = info.Model
}
if strings.TrimSpace(gpu.PartNumber) == "" {
gpu.PartNumber = info.Part
}
if strings.TrimSpace(gpu.Manufacturer) == "" {
gpu.Manufacturer = "NVIDIA"
}
if fw, ok := fwBySXM[sxm]; ok {
if strings.TrimSpace(gpu.Firmware) == "" && strings.TrimSpace(fw.Firmware) != "" {
gpu.Firmware = fw.Firmware
}
if strings.TrimSpace(gpu.VideoBIOS) == "" && strings.TrimSpace(fw.InfoROM) != "" {
gpu.VideoBIOS = fw.InfoROM
}
}
}
}
func appendHGXFirmwareFromHWInfo(content []byte, hw *models.HardwareConfig) {
if hw == nil || len(content) == 0 {
return
}
entries := parseHGXFirmwareInventory(content)
if len(entries) == 0 {
return
}
existing := make(map[string]bool, len(hw.Firmware))
for _, fw := range hw.Firmware {
key := strings.ToLower(strings.TrimSpace(fw.DeviceName) + "|" + strings.TrimSpace(fw.Version))
existing[key] = true
}
appendFW := func(name, version string) {
name = strings.TrimSpace(name)
version = strings.TrimSpace(version)
if name == "" || version == "" {
return
}
key := strings.ToLower(name + "|" + version)
if existing[key] {
return
}
existing[key] = true
hw.Firmware = append(hw.Firmware, models.FirmwareInfo{
DeviceName: name,
Version: version,
})
}
for _, e := range entries {
appendFW(e.ID, e.Version)
if e.ActiveVersion != "" && e.InactiveVersion != "" && e.ActiveVersion != e.InactiveVersion {
appendFW(e.ID+" Active Slot", e.ActiveVersion)
appendFW(e.ID+" Inactive Slot", e.InactiveVersion)
}
}
}
func parseHGXGPUAssembly(content []byte) map[int]hgxGPUAssemblyInfo {
result := make(map[int]hgxGPUAssemblyInfo)
matches := reHGXGPUBlock.FindAllSubmatch(content, -1)
for _, m := range matches {
if len(m) != 5 {
continue
}
sxmIdx, err := strconv.Atoi(string(m[1]))
if err != nil || sxmIdx <= 0 {
continue
}
result[sxmIdx] = hgxGPUAssemblyInfo{
Model: strings.TrimSpace(string(m[2])),
Part: strings.TrimSpace(string(m[3])),
Serial: strings.TrimSpace(string(m[4])),
}
}
return result
}
func parseHGXGPUFirmware(content []byte) map[int]hgxGPUFirmwareInfo {
result := make(map[int]hgxGPUFirmwareInfo)
matchesFW := reHGXFWBlock.FindAllSubmatch(content, -1)
for _, m := range matchesFW {
if len(m) != 3 {
continue
}
sxmIdx, err := strconv.Atoi(string(m[1]))
if err != nil || sxmIdx <= 0 {
continue
}
version := strings.TrimSpace(string(m[2]))
if version == "" {
continue
}
current := result[sxmIdx]
if current.Firmware == "" {
current.Firmware = version
}
result[sxmIdx] = current
}
matchesInfoROM := reHGXInfoROM.FindAllSubmatch(content, -1)
for _, m := range matchesInfoROM {
if len(m) != 3 {
continue
}
sxmIdx, err := strconv.Atoi(string(m[1]))
if err != nil || sxmIdx <= 0 {
continue
}
version := strings.TrimSpace(string(m[2]))
if version == "" {
continue
}
current := result[sxmIdx]
if current.InfoROM == "" {
current.InfoROM = version
}
result[sxmIdx] = current
}
return result
}
func parseHGXFirmwareInventory(content []byte) []hgxFirmwareInventoryEntry {
lines := strings.Split(string(content), "\n")
result := make([]hgxFirmwareInventoryEntry, 0)
var current *hgxFirmwareInventoryEntry
section := ""
flush := func() {
if current == nil {
return
}
if current.Version == "" && current.ActiveVersion == "" && current.InactiveVersion == "" {
current = nil
section = ""
return
}
result = append(result, *current)
current = nil
section = ""
}
for _, line := range lines {
if m := reIDLine.FindStringSubmatch(line); len(m) > 1 {
flush()
id := strings.TrimSpace(m[1])
if strings.HasPrefix(id, "HGX_") {
current = &hgxFirmwareInventoryEntry{ID: id}
}
continue
}
if current == nil {
continue
}
if strings.Contains(line, `"ActiveFirmwareSlot"`) {
section = "active"
}
if strings.Contains(line, `"InactiveFirmwareSlot"`) {
section = "inactive"
}
if m := reVersion.FindStringSubmatch(line); len(m) > 1 {
version := strings.TrimSpace(m[1])
if version == "" {
section = ""
continue
}
switch section {
case "active":
if current.ActiveVersion == "" {
current.ActiveVersion = version
}
case "inactive":
if current.InactiveVersion == "" {
current.InactiveVersion = version
}
default:
// Keep top-level version from the last seen plain "Version" in current entry.
current.Version = version
}
section = ""
}
}
flush()
return result
}
func extractLogicalGPUIndex(slot string) (int, bool) {
m := reSlotGPU.FindStringSubmatch(slot)
if len(m) < 2 {
return 0, false
}
idx, err := strconv.Atoi(m[1])
if err != nil || idx < 0 {
return 0, false
}
return idx, true
}
func resolveSXMIndex(logicalIdx int, bySXM map[int]hgxGPUAssemblyInfo) int {
if sxm, ok := hgxLogicalToSXM[logicalIdx]; ok {
if _, exists := bySXM[sxm]; exists {
return sxm
}
}
identity := logicalIdx + 1
if _, exists := bySXM[identity]; exists {
return identity
}
return identity
}
func shouldReplaceGPUModel(model string) bool {
trimmed := strings.TrimSpace(model)
if trimmed == "" {
return true
}
switch strings.ToLower(trimmed) {
case "vga", "3d controller", "display controller", "unknown":
return true
default:
return false
}
}
func normalizeHGXGPUInventory(hw *models.HardwareConfig, bySXM map[int]hgxGPUAssemblyInfo) {
// Keep only logical HGX GPUs (#GPU0..#GPU7) and remove BMC VGA entries.
filtered := make([]models.GPU, 0, len(hw.GPUs))
present := make(map[int]bool)
for _, gpu := range hw.GPUs {
idx, ok := extractLogicalGPUIndex(gpu.Slot)
if !ok || idx < 0 || idx > 7 {
continue
}
present[idx] = true
filtered = append(filtered, gpu)
}
// If some logical GPUs are missing in asset.json, add placeholders from HGX Redfish assembly.
for logicalIdx := 0; logicalIdx <= 7; logicalIdx++ {
if present[logicalIdx] {
continue
}
sxm := resolveSXMIndex(logicalIdx, bySXM)
info, ok := bySXM[sxm]
if !ok {
continue
}
filtered = append(filtered, models.GPU{
Slot: fmt.Sprintf("#GPU%d", logicalIdx),
Model: info.Model,
Manufacturer: "NVIDIA",
SerialNumber: info.Serial,
PartNumber: info.Part,
})
}
hw.GPUs = filtered
}

View File

@@ -8,8 +8,10 @@ import (
"git.mchus.pro/mchus/logpile/internal/models"
)
// ParseIDLLog parses the IDL (Inspur Diagnostic Log) file for BMC alarms
// Format: |timestamp|component|type|severity|eventID|description|
// ParseIDLLog parses IDL-style entries for BMC alarms.
// Works for both plain idl.log lines and JSON structured logs (idl_json/run_json)
// where MESSAGE/LOG2_FMTMSG contains:
// |timestamp|component|type|severity|eventID|description|
func ParseIDLLog(content []byte) []models.Event {
var events []models.Event
@@ -21,10 +23,6 @@ func ParseIDLLog(content []byte) []models.Event {
seenEvents := make(map[string]bool) // Deduplicate events
for _, line := range lines {
if !strings.Contains(line, "CommerDiagnose") {
continue
}
matches := re.FindStringSubmatch(line)
if matches == nil {
continue
@@ -62,7 +60,7 @@ func ParseIDLLog(content []byte) []models.Event {
events = append(events, models.Event{
ID: eventID,
Timestamp: ts,
Source: component,
Source: "BMC",
SensorType: strings.ToLower(component),
SensorName: sensorName,
EventType: eventType,

View File

@@ -8,6 +8,7 @@ package inspur
import (
"fmt"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
@@ -15,7 +16,7 @@ import (
// parserVersion - version of this parser module
// IMPORTANT: Increment this version when making changes to parser logic!
const parserVersion = "1.0.0"
const parserVersion = "1.8"
func init() {
parser.Register(&Parser{})
@@ -86,15 +87,49 @@ func containsInspurMarkers(content []byte) bool {
// Parse parses Inspur/Kaytus archive
func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, error) {
selLocation := inferInspurArchiveLocation(files)
result := &models.AnalysisResult{
Events: make([]models.Event, 0),
FRU: make([]models.FRUInfo, 0),
Sensors: make([]models.SensorReading, 0),
}
// Pre-parse enrichment maps from devicefrusdr.log for use inside ParseAssetJSON.
// BMC does not populate HddInfo.ModelName or SerialNumber for NVMe drives.
var pcieSlotDeviceNames map[int]string
var nvmeLocToSlot map[int]int
if f := parser.FindFileByName(files, "devicefrusdr.log"); f != nil {
pcieSlotDeviceNames = ParsePCIeSlotDeviceNames(f.Content)
nvmeLocToSlot = ParsePCIeNVMeLocToSlot(f.Content)
}
// Parse NVMe serial numbers from audit.log: every disk SN change is logged there.
// Combine with the NVMe loc→slot mapping to build pcieSlot→serial map.
// Also parse RAID disk serials by backplane slot key (e.g. "BP0:0").
var pcieSlotSerials map[int]string
var raidSlotSerials map[string]string
if f := parser.FindFileByName(files, "audit.log"); f != nil {
if len(nvmeLocToSlot) > 0 {
nvmeDiskSerials := ParseAuditLogNVMeSerials(f.Content)
if len(nvmeDiskSerials) > 0 {
pcieSlotSerials = make(map[int]string, len(nvmeDiskSerials))
for diskNum, serial := range nvmeDiskSerials {
if slot, ok := nvmeLocToSlot[diskNum]; ok {
pcieSlotSerials[slot] = serial
}
}
if len(pcieSlotSerials) == 0 {
pcieSlotSerials = nil
}
}
}
raidSlotSerials = ParseAuditLogRAIDSerials(f.Content)
}
// Parse asset.json first (base hardware info)
if f := parser.FindFileByName(files, "asset.json"); f != nil {
if hw, err := ParseAssetJSON(f.Content); err == nil {
if hw, err := ParseAssetJSON(f.Content, pcieSlotDeviceNames, pcieSlotSerials); err == nil {
result.Hardware = hw
}
}
@@ -123,17 +158,29 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
// Extract events from component.log (memory errors, etc.)
componentEvents := ParseComponentLogEvents(f.Content)
result.Events = append(result.Events, componentEvents...)
// Extract additional telemetry sensors from component.log sections
// (fan RPM, backplane temperature, PSU summary power, etc.).
componentSensors := ParseComponentLogSensors(f.Content)
result.Sensors = mergeSensorReadings(result.Sensors, componentSensors)
}
// Parse IDL log (BMC alarms/diagnose events)
if f := parser.FindFileByName(files, "idl.log"); f != nil {
// Enrich runtime component data from Redis snapshot (serials, FW, telemetry),
// when text logs miss these fields.
if f := parser.FindFileByName(files, "redis-dump.rdb"); f != nil && result.Hardware != nil {
enrichFromRedisDump(f.Content, result.Hardware)
}
// Parse IDL-like logs (plain and structured JSON logs with embedded IDL messages)
idlFiles := parser.FindFileByPattern(files, "/idl.log", "idl_json.log", "run_json.log")
for _, f := range idlFiles {
idlEvents := ParseIDLLog(f.Content)
result.Events = append(result.Events, idlEvents...)
}
// Parse SEL list (selelist.csv)
if f := parser.FindFileByName(files, "selelist.csv"); f != nil {
selEvents := ParseSELList(f.Content)
selEvents := ParseSELListWithLocation(f.Content, selLocation)
result.Events = append(result.Events, selEvents...)
}
@@ -144,9 +191,75 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
result.Events = append(result.Events, events...)
}
// Fallback for archives where board serial is missing in parsed FRU/asset data:
// recover it from log content, never from archive filename.
if strings.TrimSpace(result.Hardware.BoardInfo.SerialNumber) == "" {
if serial := inferBoardSerialFromFallbackLogs(files); serial != "" {
result.Hardware.BoardInfo.SerialNumber = serial
}
}
if strings.TrimSpace(result.Hardware.BoardInfo.ProductName) == "" {
if model := inferBoardModelFromFallbackLogs(files); model != "" {
result.Hardware.BoardInfo.ProductName = model
}
}
// Enrich GPU inventory from HGX Redfish snapshot (serial/model/part mapping).
if f := parser.FindFileByName(files, "HGX_HWInfo_FWVersion.log"); f != nil && result.Hardware != nil {
enrichGPUsFromHGXHWInfo(f.Content, result.Hardware)
appendHGXFirmwareFromHWInfo(f.Content, result.Hardware)
}
// Mark problematic GPUs from IDL errors like "BIOS miss F_GPU6".
if result.Hardware != nil {
applyGPUStatusFromEvents(result.Hardware, result.Events)
enrichStorageFromSerialFallbackFiles(files, result.Hardware)
// Apply RAID disk serials from audit.log (authoritative: last non-NULL SN change).
// These override redis/component.log serials which may be stale after disk replacement.
applyRAIDSlotSerials(result.Hardware, raidSlotSerials)
parser.ApplyManufacturedYearWeekFromFRU(result.FRU, result.Hardware)
}
return result, nil
}
func inferInspurArchiveLocation(files []parser.ExtractedFile) *time.Location {
fallback := parser.DefaultArchiveLocation()
f := parser.FindFileByName(files, "timezone.conf")
if f == nil {
return fallback
}
locName := parseTimezoneConfigLocation(f.Content)
if strings.TrimSpace(locName) == "" {
return fallback
}
loc, err := time.LoadLocation(locName)
if err != nil {
return fallback
}
return loc
}
func parseTimezoneConfigLocation(content []byte) string {
lines := strings.Split(string(content), "\n")
for _, line := range lines {
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "[") || strings.HasPrefix(line, "#") || strings.HasPrefix(line, ";") {
continue
}
parts := strings.SplitN(line, "=", 2)
if len(parts) != 2 {
continue
}
key := strings.ToLower(strings.TrimSpace(parts[0]))
val := strings.TrimSpace(parts[1])
if key == "timezone" && val != "" {
return val
}
}
return ""
}
func (p *Parser) parseDeviceFruSDR(content []byte, result *models.AnalysisResult) {
lines := string(content)
@@ -174,14 +287,9 @@ func (p *Parser) parseDeviceFruSDR(content []byte, result *models.AnalysisResult
// This supplements data from asset.json with serial numbers, firmware, etc.
pcieDevicesFromREST := ParsePCIeDevices(content)
// Merge PCIe data: keep asset.json data but add RESTful data if available
// Merge PCIe data: asset.json is the base inventory, RESTful data enriches names/links/serials.
if result.Hardware != nil {
// If asset.json didn't have PCIe devices, use RESTful data
if len(result.Hardware.PCIeDevices) == 0 && len(pcieDevicesFromREST) > 0 {
result.Hardware.PCIeDevices = pcieDevicesFromREST
}
// If we have both, merge them (RESTful data takes precedence for detailed info)
// For now, we keep asset.json data which has more details
result.Hardware.PCIeDevices = MergePCIeDevices(result.Hardware.PCIeDevices, pcieDevicesFromREST)
}
// Parse GPU devices and add temperature data from sensors
@@ -236,3 +344,38 @@ func extractSlotNumberFromGPU(slot string) int {
}
return 0
}
func mergeSensorReadings(base, extra []models.SensorReading) []models.SensorReading {
if len(extra) == 0 {
return base
}
out := append([]models.SensorReading{}, base...)
seen := make(map[string]struct{}, len(out))
for _, s := range out {
if key := sensorMergeKey(s); key != "" {
seen[key] = struct{}{}
}
}
for _, s := range extra {
key := sensorMergeKey(s)
if key != "" {
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
}
out = append(out, s)
}
return out
}
func sensorMergeKey(s models.SensorReading) string {
name := strings.ToLower(strings.TrimSpace(s.Name))
if name == "" {
return ""
}
return name
}

View File

@@ -3,36 +3,117 @@ package inspur
import (
"encoding/json"
"fmt"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser/vendors/pciids"
)
// PCIeRESTInfo represents the RESTful PCIE Device info structure
type PCIeRESTInfo []struct {
ID int `json:"id"`
Present int `json:"present"`
Enable int `json:"enable"`
Status int `json:"status"`
VendorID int `json:"vendor_id"`
VendorName string `json:"vendor_name"`
DeviceID int `json:"device_id"`
DeviceName string `json:"device_name"`
BusNum int `json:"bus_num"`
DevNum int `json:"dev_num"`
FuncNum int `json:"func_num"`
MaxLinkWidth int `json:"max_link_width"`
MaxLinkSpeed int `json:"max_link_speed"`
CurrentLinkWidth int `json:"current_link_width"`
CurrentLinkSpeed int `json:"current_link_speed"`
Slot int `json:"slot"`
Location string `json:"location"`
DeviceLocator string `json:"DeviceLocator"`
DevType int `json:"dev_type"`
DevSubtype int `json:"dev_subtype"`
PartNum string `json:"part_num"`
SerialNum string `json:"serial_num"`
FwVer string `json:"fw_ver"`
ID int `json:"id"`
Present int `json:"present"`
Enable int `json:"enable"`
Status int `json:"status"`
VendorID int `json:"vendor_id"`
VendorName string `json:"vendor_name"`
DeviceID int `json:"device_id"`
DeviceName string `json:"device_name"`
BusNum int `json:"bus_num"`
DevNum int `json:"dev_num"`
FuncNum int `json:"func_num"`
MaxLinkWidth int `json:"max_link_width"`
MaxLinkSpeed int `json:"max_link_speed"`
CurrentLinkWidth int `json:"current_link_width"`
CurrentLinkSpeed int `json:"current_link_speed"`
Slot int `json:"slot"`
Location string `json:"location"`
DeviceLocator string `json:"DeviceLocator"`
DevType int `json:"dev_type"`
DevSubtype int `json:"dev_subtype"`
PartNum string `json:"part_num"`
SerialNum string `json:"serial_num"`
FwVer string `json:"fw_ver"`
}
// ParsePCIeSlotDeviceNames parses devicefrusdr.log and returns a map from integer PCIe slot ID
// to device name string. Used to enrich HddInfo entries in asset.json that lack model names.
func ParsePCIeSlotDeviceNames(content []byte) map[int]string {
info, ok := parsePCIeRESTJSON(content)
if !ok {
return nil
}
result := make(map[int]string, len(info))
for _, entry := range info {
if entry.Slot <= 0 {
continue
}
name := sanitizePCIeDeviceName(entry.DeviceName)
if name != "" {
result[entry.Slot] = name
}
}
if len(result) == 0 {
return nil
}
return result
}
// parsePCIeRESTJSON parses the RESTful PCIE Device info JSON from devicefrusdr.log content.
func parsePCIeRESTJSON(content []byte) (PCIeRESTInfo, bool) {
text := string(content)
startMarker := "RESTful PCIE Device info:"
endMarker := "BMC sdr Info:"
startIdx := strings.Index(text, startMarker)
if startIdx == -1 {
return nil, false
}
endIdx := strings.Index(text[startIdx:], endMarker)
if endIdx == -1 {
endIdx = len(text) - startIdx
}
jsonText := strings.TrimSpace(text[startIdx+len(startMarker) : startIdx+endIdx])
var info PCIeRESTInfo
if err := json.Unmarshal([]byte(jsonText), &info); err != nil {
return nil, false
}
return info, true
}
// ParsePCIeNVMeLocToSlot parses devicefrusdr.log and returns a map from NVMe location number
// (the numeric suffix in "#NVME0", "#NVME2", etc.) to the integer PCIe slot ID.
// This is used to correlate audit.log NVMe disk numbers with HddInfo PcieSlot values.
func ParsePCIeNVMeLocToSlot(content []byte) map[int]int {
info, ok := parsePCIeRESTJSON(content)
if !ok {
return nil
}
nvmeLocRegex := regexp.MustCompile(`(?i)^#NVME(\d+)$`)
result := make(map[int]int)
for _, entry := range info {
if entry.Slot <= 0 {
continue
}
loc := strings.TrimSpace(entry.Location)
m := nvmeLocRegex.FindStringSubmatch(loc)
if m == nil {
continue
}
locNum, err := strconv.Atoi(m[1])
if err != nil {
continue
}
result[locNum] = entry.Slot
}
if len(result) == 0 {
return nil
}
return result
}
// ParsePCIeDevices parses RESTful PCIE Device info from devicefrusdr.log
@@ -73,9 +154,27 @@ func ParsePCIeDevices(content []byte) []models.PCIeDevice {
// Determine device class based on dev_type
deviceClass := determineDeviceClass(pcie.DevType, pcie.DevSubtype, pcie.DeviceName)
_, pciDeviceName := pciids.DeviceInfo(pcie.VendorID, pcie.DeviceID)
// Build BDF string
bdf := fmt.Sprintf("%04x/%02x/%02x/%02x", 0, pcie.BusNum, pcie.DevNum, pcie.FuncNum)
// Build BDF string in canonical form (bb:dd.f)
bdf := formatBDF(pcie.BusNum, pcie.DevNum, pcie.FuncNum)
partNumber := strings.TrimSpace(pcie.PartNum)
if partNumber == "" {
partNumber = sanitizePCIeDeviceName(pcie.DeviceName)
}
if partNumber == "" {
partNumber = normalizeModelLabel(pciDeviceName)
}
if isGenericPCIeClass(deviceClass) {
if resolved := normalizeModelLabel(pciDeviceName); resolved != "" {
deviceClass = resolved
}
}
manufacturer := strings.TrimSpace(pcie.VendorName)
if manufacturer == "" {
manufacturer = normalizeModelLabel(pciids.VendorName(pcie.VendorID))
}
device := models.PCIeDevice{
Slot: pcie.Location,
@@ -83,12 +182,12 @@ func ParsePCIeDevices(content []byte) []models.PCIeDevice {
DeviceID: pcie.DeviceID,
BDF: bdf,
DeviceClass: deviceClass,
Manufacturer: pcie.VendorName,
Manufacturer: manufacturer,
LinkWidth: pcie.CurrentLinkWidth,
LinkSpeed: currentSpeed,
MaxLinkWidth: pcie.MaxLinkWidth,
MaxLinkSpeed: maxSpeed,
PartNumber: strings.TrimSpace(pcie.PartNum),
PartNumber: partNumber,
SerialNumber: strings.TrimSpace(pcie.SerialNum),
}
@@ -98,6 +197,149 @@ func ParsePCIeDevices(content []byte) []models.PCIeDevice {
return devices
}
var rawHexDeviceNameRegex = regexp.MustCompile(`(?i)^0x[0-9a-f]+$`)
func sanitizePCIeDeviceName(name string) string {
name = strings.TrimSpace(name)
if name == "" {
return ""
}
if strings.EqualFold(name, "N/A") {
return ""
}
if rawHexDeviceNameRegex.MatchString(name) {
return ""
}
return name
}
// MergePCIeDevices enriches base devices (from asset.json) with detailed RESTful PCIe data.
// Matching is done by BDF first, then by slot fallback.
func MergePCIeDevices(base []models.PCIeDevice, rest []models.PCIeDevice) []models.PCIeDevice {
if len(rest) == 0 {
return base
}
if len(base) == 0 {
return append([]models.PCIeDevice(nil), rest...)
}
type ref struct {
index int
}
byBDF := make(map[string]ref, len(base))
bySlot := make(map[string]ref, len(base))
for i := range base {
bdf := normalizePCIeBDF(base[i].BDF)
if bdf != "" {
byBDF[bdf] = ref{index: i}
}
slot := strings.ToLower(strings.TrimSpace(base[i].Slot))
if slot != "" {
bySlot[slot] = ref{index: i}
}
}
for _, detailed := range rest {
idx := -1
if bdf := normalizePCIeBDF(detailed.BDF); bdf != "" {
if found, ok := byBDF[bdf]; ok {
idx = found.index
}
}
if idx == -1 {
slot := strings.ToLower(strings.TrimSpace(detailed.Slot))
if slot != "" {
if found, ok := bySlot[slot]; ok {
idx = found.index
}
}
}
if idx == -1 {
base = append(base, detailed)
newIdx := len(base) - 1
if bdf := normalizePCIeBDF(detailed.BDF); bdf != "" {
byBDF[bdf] = ref{index: newIdx}
}
if slot := strings.ToLower(strings.TrimSpace(detailed.Slot)); slot != "" {
bySlot[slot] = ref{index: newIdx}
}
continue
}
enrichPCIeDevice(&base[idx], detailed)
}
return base
}
func enrichPCIeDevice(dst *models.PCIeDevice, src models.PCIeDevice) {
if dst == nil {
return
}
if strings.TrimSpace(dst.Slot) == "" {
dst.Slot = src.Slot
}
if strings.TrimSpace(dst.BDF) == "" {
dst.BDF = src.BDF
}
if dst.VendorID == 0 {
dst.VendorID = src.VendorID
}
if dst.DeviceID == 0 {
dst.DeviceID = src.DeviceID
}
if strings.TrimSpace(dst.Manufacturer) == "" {
dst.Manufacturer = src.Manufacturer
}
if strings.TrimSpace(dst.SerialNumber) == "" {
dst.SerialNumber = src.SerialNumber
}
if strings.TrimSpace(dst.PartNumber) == "" {
dst.PartNumber = src.PartNumber
}
if strings.TrimSpace(dst.LinkSpeed) == "" || strings.EqualFold(strings.TrimSpace(dst.LinkSpeed), "unknown") {
dst.LinkSpeed = src.LinkSpeed
}
if strings.TrimSpace(dst.MaxLinkSpeed) == "" || strings.EqualFold(strings.TrimSpace(dst.MaxLinkSpeed), "unknown") {
dst.MaxLinkSpeed = src.MaxLinkSpeed
}
if dst.LinkWidth == 0 {
dst.LinkWidth = src.LinkWidth
}
if dst.MaxLinkWidth == 0 {
dst.MaxLinkWidth = src.MaxLinkWidth
}
if isGenericPCIeClass(dst.DeviceClass) && !isGenericPCIeClass(src.DeviceClass) {
dst.DeviceClass = src.DeviceClass
}
}
func normalizePCIeBDF(bdf string) string {
bdf = strings.TrimSpace(strings.ToLower(bdf))
if bdf == "" {
return ""
}
if strings.Contains(bdf, "/") {
parts := strings.Split(bdf, "/")
if len(parts) == 4 {
return fmt.Sprintf("%s:%s.%s", parts[1], parts[2], parts[3])
}
}
return bdf
}
func isGenericPCIeClass(class string) bool {
switch strings.ToLower(strings.TrimSpace(class)) {
case "", "unknown", "other", "bridge", "network", "storage", "sas", "sata", "display", "vga", "3d controller", "serial bus":
return true
default:
return false
}
}
// determineDeviceClass maps device type to human-readable class
func determineDeviceClass(devType, devSubtype int, deviceName string) string {
// dev_type mapping:

View File

@@ -0,0 +1,77 @@
package inspur
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestParsePCIeDevices_UsesDeviceNameAsModelWhenPartNumberMissing(t *testing.T) {
content := []byte(`RESTful PCIE Device info:
[{"id":1,"present":1,"vendor_id":32902,"vendor_name":"Intel","device_id":5409,"device_name":"I350T4V2","bus_num":69,"dev_num":0,"func_num":0,"max_link_width":4,"max_link_speed":2,"current_link_width":4,"current_link_speed":2,"location":"#CPU0_PCIE4","dev_type":2,"dev_subtype":0,"part_num":"","serial_num":"","fw_ver":""}]
BMC sdr Info:`)
devices := ParsePCIeDevices(content)
if len(devices) != 1 {
t.Fatalf("expected 1 device, got %d", len(devices))
}
if devices[0].PartNumber != "I350T4V2" {
t.Fatalf("expected part/model I350T4V2, got %q", devices[0].PartNumber)
}
if devices[0].BDF != "45:00.0" {
t.Fatalf("expected BDF 45:00.0, got %q", devices[0].BDF)
}
}
func TestMergePCIeDevices_EnrichesGenericAssetEntry(t *testing.T) {
base := []models.PCIeDevice{
{
Slot: "#CPU1_PCIE9",
BDF: "98:00.0",
VendorID: 0x9005,
DeviceID: 0x028f,
DeviceClass: "SAS",
Manufacturer: "Adaptec / Microsemi",
},
}
rest := []models.PCIeDevice{
{
Slot: "#CPU1_PCIE9",
BDF: "98:00.0",
VendorID: 0x9005,
DeviceID: 0x028f,
DeviceClass: "Storage Controller",
Manufacturer: "Microchip",
PartNumber: "PM8222-SHBA",
},
}
got := MergePCIeDevices(base, rest)
if len(got) != 1 {
t.Fatalf("expected 1 merged device, got %d", len(got))
}
if got[0].PartNumber != "PM8222-SHBA" {
t.Fatalf("expected merged part number PM8222-SHBA, got %q", got[0].PartNumber)
}
}
func TestParsePCIeDevices_ResolvesModelFromPCIIDsWhenDeviceNameIsRawHex(t *testing.T) {
content := []byte(`RESTful PCIE Device info:
[{"id":5,"present":1,"vendor_id":36869,"vendor_name":"","device_id":655,"device_name":"0x028F","bus_num":152,"dev_num":0,"func_num":0,"max_link_width":8,"max_link_speed":3,"current_link_width":8,"current_link_speed":3,"location":"#CPU1_PCIE9","dev_type":1,"dev_subtype":7,"part_num":"","serial_num":"","fw_ver":""}]
BMC sdr Info:`)
devices := ParsePCIeDevices(content)
if len(devices) != 1 {
t.Fatalf("expected 1 device, got %d", len(devices))
}
if devices[0].PartNumber == "" {
t.Fatalf("expected part number resolved from pci.ids, got empty")
}
if strings.HasPrefix(strings.ToLower(strings.TrimSpace(devices[0].PartNumber)), "0x") {
t.Fatalf("expected resolved name instead of raw hex, got %q", devices[0].PartNumber)
}
if devices[0].Manufacturer == "" {
t.Fatalf("expected manufacturer resolved from pci.ids")
}
}

View File

@@ -0,0 +1,559 @@
package inspur
import (
"encoding/hex"
"regexp"
"sort"
"strconv"
"strings"
"unicode"
"git.mchus.pro/mchus/logpile/internal/models"
)
var (
reRedisGPUKey = regexp.MustCompile(`GPUInfo:REDIS_GPUINFO_T([0-9]+):([A-Za-z0-9_]+)`)
reRedisNICKey = regexp.MustCompile(`RedisNicInfo:redis_nic_info_t:stNicDeviceInfo([0-9]+):([A-Za-z0-9_]+)`)
reRedisRAIDSerial = regexp.MustCompile(`RAIDMSCCInfo:redis_pcie_mscc_raid_info_t([0-9]+):RAIDInfo:SerialNum`)
reRedisPCIESNPN = regexp.MustCompile(`AssetInfoPCIE:SNPN([0-9]+):(SN|PN)`)
)
type redisGPUSnapshot struct {
ByIndex map[int]map[string]string
}
type redisNICSnapshot struct {
ByIndex map[int]map[string]string
}
type redisPCIESerialSnapshot struct {
ByPart map[string]string
}
func enrichFromRedisDump(content []byte, hw *models.HardwareConfig) {
if hw == nil || len(content) == 0 {
return
}
gpuSnap := parseRedisGPUSnapshot(content)
nicSnap := parseRedisNICSnapshot(content)
raidSerials := parseRedisRAIDSerials(content)
pcieSnap := parseRedisPCIESerialSnapshot(content)
applyRedisGPUEnrichment(hw, gpuSnap)
applyRedisNICEnrichment(hw, nicSnap)
applyRedisPCIESNPNEnrichment(hw, pcieSnap)
applyRedisPCIeEnrichment(hw, raidSerials)
}
func parseRedisRAIDSerials(content []byte) []string {
matches := reRedisRAIDSerial.FindAllSubmatchIndex(content, -1)
if len(matches) == 0 {
return nil
}
seen := make(map[string]bool, len(matches))
serials := make([]string, 0, len(matches))
for _, m := range matches {
if len(m) < 4 {
continue
}
value := normalizeRedisValue(extractRedisCandidateValue(content, m[1]))
if value == "" || seen[value] {
continue
}
seen[value] = true
serials = append(serials, value)
}
return serials
}
func parseRedisPCIESerialSnapshot(content []byte) redisPCIESerialSnapshot {
type rec struct {
PN string
SN string
}
tmp := make(map[int]rec)
matches := reRedisPCIESNPN.FindAllSubmatchIndex(content, -1)
for _, m := range matches {
if len(m) < 6 {
continue
}
idxStr := string(content[m[2]:m[3]])
field := string(content[m[4]:m[5]])
idx, err := strconv.Atoi(idxStr)
if err != nil {
continue
}
value := normalizeRedisValue(extractRedisCandidateValue(content, m[1]))
if value == "" {
continue
}
r := tmp[idx]
if field == "PN" {
r.PN = value
} else if field == "SN" {
r.SN = value
}
tmp[idx] = r
}
out := redisPCIESerialSnapshot{ByPart: make(map[string]string)}
for _, r := range tmp {
pn := normalizeRedisValue(r.PN)
sn := normalizeRedisValue(r.SN)
if pn == "" || sn == "" {
continue
}
out.ByPart[strings.ToLower(strings.TrimSpace(pn))] = sn
}
return out
}
func parseRedisGPUSnapshot(content []byte) redisGPUSnapshot {
snap := redisGPUSnapshot{ByIndex: make(map[int]map[string]string)}
matches := reRedisGPUKey.FindAllSubmatchIndex(content, -1)
for _, m := range matches {
if len(m) < 6 {
continue
}
idxStr := string(content[m[2]:m[3]])
field := string(content[m[4]:m[5]])
idx, err := strconv.Atoi(idxStr)
if err != nil {
continue
}
value := extractRedisInlineValue(content, m[1])
if value == "" {
continue
}
byField, ok := snap.ByIndex[idx]
if !ok {
byField = make(map[string]string)
snap.ByIndex[idx] = byField
}
byField[field] = value
}
return snap
}
func parseRedisNICSnapshot(content []byte) redisNICSnapshot {
snap := redisNICSnapshot{ByIndex: make(map[int]map[string]string)}
matches := reRedisNICKey.FindAllSubmatchIndex(content, -1)
for _, m := range matches {
if len(m) < 6 {
continue
}
idxStr := string(content[m[2]:m[3]])
field := string(content[m[4]:m[5]])
idx, err := strconv.Atoi(idxStr)
if err != nil {
continue
}
value := extractRedisInlineValue(content, m[1])
if value == "" {
continue
}
byField, ok := snap.ByIndex[idx]
if !ok {
byField = make(map[string]string)
snap.ByIndex[idx] = byField
}
byField[field] = value
}
return snap
}
func extractRedisInlineValue(content []byte, start int) string {
if start < 0 || start >= len(content) {
return ""
}
i := start
for i < len(content) && content[i] <= 0x20 {
i++
}
if i >= len(content) {
return ""
}
j := i
for j < len(content) {
c := content[j]
if c == 0 || c < 0x20 || c > 0x7e {
break
}
j++
}
if j <= i {
return ""
}
raw := strings.TrimSpace(string(content[i:j]))
if raw == "" {
return ""
}
decoded := maybeDecodeHexString(raw)
if decoded != "" {
return decoded
}
return raw
}
func extractRedisCandidateValue(content []byte, start int) string {
// Fast-path for simple inline string values.
if v := extractRedisInlineValue(content, start); normalizeRedisValue(v) != "" {
return v
}
if start < 0 || start >= len(content) {
return ""
}
end := start + 256
if end > len(content) {
end = len(content)
}
window := content[start:end]
for _, token := range splitAlphaNumTokens(window) {
if len(token) < 6 {
continue
}
lower := strings.ToLower(token)
if strings.Contains(lower, "redis") || strings.Contains(lower, "sensor") || strings.Contains(lower, "fullsdr") {
continue
}
if decoded := maybeDecodeHexString(token); normalizeRedisValue(decoded) != "" {
return decoded
}
if normalizeRedisValue(token) != "" {
return token
}
}
return ""
}
func splitAlphaNumTokens(b []byte) []string {
var out []string
start := -1
for i := 0; i < len(b); i++ {
c := rune(b[i])
if unicode.IsLetter(c) || unicode.IsDigit(c) {
if start == -1 {
start = i
}
continue
}
if start != -1 {
out = append(out, string(b[start:i]))
start = -1
}
}
if start != -1 {
out = append(out, string(b[start:]))
}
return out
}
func maybeDecodeHexString(s string) string {
if len(s) < 8 || len(s)%2 != 0 {
return ""
}
for _, c := range s {
if (c < '0' || c > '9') && (c < 'a' || c > 'f') && (c < 'A' || c > 'F') {
return ""
}
}
b, err := hex.DecodeString(s)
if err != nil {
return ""
}
decoded := strings.TrimSpace(strings.TrimRight(string(b), "\x00"))
if decoded == "" {
return ""
}
for _, c := range decoded {
if c < 0x20 || c > 0x7e {
return ""
}
}
return decoded
}
func applyRedisGPUEnrichment(hw *models.HardwareConfig, snap redisGPUSnapshot) {
if len(hw.GPUs) == 0 || len(snap.ByIndex) == 0 {
return
}
type redisGPU struct {
Index int
Data map[string]string
}
redisGPUs := make([]redisGPU, 0, len(snap.ByIndex))
for idx, data := range snap.ByIndex {
if data == nil {
continue
}
if data["NV_GPU_SerialNumber"] == "" && data["NV_GPU_FWVersion"] == "" && data["NV_GPU_UUID"] == "" {
continue
}
redisGPUs = append(redisGPUs, redisGPU{Index: idx, Data: data})
}
if len(redisGPUs) == 0 {
return
}
sort.Slice(redisGPUs, func(i, j int) bool { return redisGPUs[i].Index < redisGPUs[j].Index })
target := make([]*models.GPU, 0, len(hw.GPUs))
for i := range hw.GPUs {
gpu := &hw.GPUs[i]
if isNVIDIAGPU(gpu) {
target = append(target, gpu)
}
}
if len(target) == 0 || len(target) != len(redisGPUs) {
return
}
sort.Slice(target, func(i, j int) bool {
left := strings.TrimSpace(target[i].BDF)
right := strings.TrimSpace(target[j].BDF)
if left != "" && right != "" {
return left < right
}
return strings.TrimSpace(target[i].Slot) < strings.TrimSpace(target[j].Slot)
})
for i := range target {
applyRedisGPUFields(target[i], redisGPUs[i].Data)
}
}
func isNVIDIAGPU(gpu *models.GPU) bool {
if gpu == nil {
return false
}
if gpu.VendorID == 0x10de {
return true
}
man := strings.ToLower(strings.TrimSpace(gpu.Manufacturer))
return strings.Contains(man, "nvidia")
}
func applyRedisGPUFields(gpu *models.GPU, fields map[string]string) {
if gpu == nil || fields == nil {
return
}
if serial := normalizeRedisValue(fields["NV_GPU_SerialNumber"]); serial != "" && isMissingGPUField(gpu.SerialNumber) {
gpu.SerialNumber = serial
}
if fw := normalizeRedisValue(fields["NV_GPU_FWVersion"]); fw != "" && isMissingGPUField(gpu.Firmware) {
gpu.Firmware = fw
}
if uuid := normalizeRedisValue(fields["NV_GPU_UUID"]); uuid != "" && isMissingGPUField(gpu.UUID) {
gpu.UUID = uuid
}
if part := normalizeRedisValue(fields["NVGPUPartNumber"]); part != "" && isMissingGPUField(gpu.PartNumber) {
gpu.PartNumber = part
}
if model := normalizeRedisValue(fields["NVGPUMarketingName"]); model != "" && isGenericGPUModel(gpu.Model) {
gpu.Model = model
}
if gpu.ClockSpeed == 0 {
if mhz, ok := parseIntField(fields["OperatingSpeedMHz"]); ok {
gpu.ClockSpeed = mhz
}
}
if gpu.Power == 0 {
if pwr, ok := parseIntField(fields["GPUTotalPower"]); ok {
gpu.Power = pwr
}
}
if gpu.Temperature == 0 {
if temp, ok := parseIntField(fields["Temp"]); ok {
gpu.Temperature = temp
}
}
if gpu.MemTemperature == 0 {
if temp, ok := parseIntField(fields["MemTemp"]); ok {
gpu.MemTemperature = temp
}
}
}
func parseIntField(v string) (int, bool) {
v = normalizeRedisValue(v)
if v == "" {
return 0, false
}
n, err := strconv.Atoi(v)
if err != nil {
return 0, false
}
return n, true
}
func normalizeRedisValue(v string) string {
v = strings.TrimSpace(v)
if v == "" {
return ""
}
l := strings.ToLower(v)
if l == "n/a" || l == "na" || l == "null" || l == "unknown" {
return ""
}
return v
}
func isMissingGPUField(v string) bool {
return normalizeRedisValue(v) == ""
}
func isGenericGPUModel(model string) bool {
m := strings.ToLower(strings.TrimSpace(model))
switch m {
case "", "unknown", "display", "display controller", "3d controller", "vga", "gpu":
return true
default:
return false
}
}
func applyRedisNICEnrichment(hw *models.HardwareConfig, snap redisNICSnapshot) {
if len(hw.NetworkAdapters) == 0 || len(snap.ByIndex) == 0 {
return
}
type redisNIC struct {
Index int
Data map[string]string
}
redisNICs := make([]redisNIC, 0, len(snap.ByIndex))
for idx, data := range snap.ByIndex {
if data == nil {
continue
}
if normalizeRedisValue(data["FWVersion"]) == "" {
continue
}
redisNICs = append(redisNICs, redisNIC{Index: idx, Data: data})
}
if len(redisNICs) == 0 {
return
}
sort.Slice(redisNICs, func(i, j int) bool { return redisNICs[i].Index < redisNICs[j].Index })
target := make([]*models.NetworkAdapter, 0, len(hw.NetworkAdapters))
for i := range hw.NetworkAdapters {
nic := &hw.NetworkAdapters[i]
if nic.Present {
target = append(target, nic)
}
}
if len(target) == 0 {
return
}
sort.Slice(target, func(i, j int) bool {
left := strings.TrimSpace(target[i].Location)
right := strings.TrimSpace(target[j].Location)
if left != "" && right != "" {
return left < right
}
return strings.TrimSpace(target[i].Slot) < strings.TrimSpace(target[j].Slot)
})
limit := len(target)
if len(redisNICs) < limit {
limit = len(redisNICs)
}
for i := 0; i < limit; i++ {
nic := target[i]
data := redisNICs[i].Data
if fw := normalizeRedisValue(data["FWVersion"]); fw != "" && normalizeRedisValue(nic.Firmware) == "" {
nic.Firmware = fw
}
if serial := normalizeRedisValue(data["SerialNum"]); serial != "" && normalizeRedisValue(nic.SerialNumber) == "" {
nic.SerialNumber = serial
}
if part := normalizeRedisValue(data["PartNum"]); part != "" && normalizeRedisValue(nic.PartNumber) == "" {
nic.PartNumber = part
}
}
}
func applyRedisPCIeEnrichment(hw *models.HardwareConfig, raidSerials []string) {
if hw == nil || len(hw.PCIeDevices) == 0 || len(raidSerials) == 0 {
return
}
target := make([]*models.PCIeDevice, 0, len(hw.PCIeDevices))
for i := range hw.PCIeDevices {
dev := &hw.PCIeDevices[i]
if normalizeRedisValue(dev.SerialNumber) != "" {
continue
}
class := strings.ToLower(strings.TrimSpace(dev.DeviceClass))
part := strings.ToLower(strings.TrimSpace(dev.PartNumber))
if strings.Contains(class, "raid") || strings.Contains(class, "sas") || strings.Contains(class, "storage") ||
strings.Contains(part, "raid") || strings.Contains(part, "sas") || strings.Contains(part, "hba") {
target = append(target, dev)
}
}
if len(target) == 0 {
return
}
sort.Slice(target, func(i, j int) bool {
left := strings.TrimSpace(target[i].BDF)
right := strings.TrimSpace(target[j].BDF)
if left != "" && right != "" {
return left < right
}
return strings.TrimSpace(target[i].Slot) < strings.TrimSpace(target[j].Slot)
})
limit := len(target)
if len(raidSerials) < limit {
limit = len(raidSerials)
}
for i := 0; i < limit; i++ {
target[i].SerialNumber = raidSerials[i]
}
}
func applyRedisPCIESNPNEnrichment(hw *models.HardwareConfig, snap redisPCIESerialSnapshot) {
if hw == nil || len(hw.PCIeDevices) == 0 || len(snap.ByPart) == 0 {
return
}
for i := range hw.PCIeDevices {
dev := &hw.PCIeDevices[i]
if normalizeRedisValue(dev.SerialNumber) != "" {
continue
}
part := strings.ToLower(strings.TrimSpace(dev.PartNumber))
if part == "" {
continue
}
if serial := normalizeRedisValue(snap.ByPart[part]); serial != "" {
dev.SerialNumber = serial
}
}
}

View File

@@ -0,0 +1,144 @@
package inspur
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestExtractRedisInlineValue_DecodesHexEncodedString(t *testing.T) {
data := []byte("RedisNicInfo:redis_nic_info_t:stNicDeviceInfo0:FWVersion 32362e34332e32353636000000000000\x00tail")
key := []byte("RedisNicInfo:redis_nic_info_t:stNicDeviceInfo0:FWVersion")
pos := indexBytes(data, key)
if pos < 0 {
t.Fatal("key not found")
}
got := extractRedisInlineValue(data, pos+len(key))
if got != "26.43.2566" {
t.Fatalf("expected decoded fw 26.43.2566, got %q", got)
}
}
func TestApplyRedisGPUEnrichment_FillsSerialFirmwareUUID(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#CPU0_PCIE2", BDF: "0c:00.0", VendorID: 0x10de, Model: "3D Controller"},
{Slot: "#CPU0_PCIE1", BDF: "58:00.0", VendorID: 0x10de, Model: "3D Controller"},
},
}
snap := redisGPUSnapshot{
ByIndex: map[int]map[string]string{
1: {
"NV_GPU_SerialNumber": "1321125009572",
"NV_GPU_FWVersion": "96.00.B7.00.02",
"NV_GPU_UUID": "GPU-AAA",
},
2: {
"NV_GPU_SerialNumber": "1321125010420",
"NV_GPU_FWVersion": "96.00.B7.00.02",
"NV_GPU_UUID": "GPU-BBB",
},
},
}
applyRedisGPUEnrichment(hw, snap)
if hw.GPUs[0].SerialNumber != "1321125009572" || hw.GPUs[0].Firmware != "96.00.B7.00.02" || hw.GPUs[0].UUID != "GPU-AAA" {
t.Fatalf("unexpected gpu0 enrichment: %+v", hw.GPUs[0])
}
if hw.GPUs[1].SerialNumber != "1321125010420" || hw.GPUs[1].Firmware != "96.00.B7.00.02" || hw.GPUs[1].UUID != "GPU-BBB" {
t.Fatalf("unexpected gpu1 enrichment: %+v", hw.GPUs[1])
}
}
func TestApplyRedisGPUEnrichment_SkipsOnCountMismatch(t *testing.T) {
hw := &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "#CPU0_PCIE2", BDF: "0c:00.0", VendorID: 0x10de, Model: "3D Controller"},
},
}
snap := redisGPUSnapshot{
ByIndex: map[int]map[string]string{
1: {"NV_GPU_SerialNumber": "1321125009572"},
2: {"NV_GPU_SerialNumber": "1321125010420"},
},
}
applyRedisGPUEnrichment(hw, snap)
if hw.GPUs[0].SerialNumber != "" {
t.Fatalf("expected no enrichment on count mismatch, got %q", hw.GPUs[0].SerialNumber)
}
}
func TestParseRedisRAIDSerials_DecodesHexSerial(t *testing.T) {
raw := []byte("RAIDMSCCInfo:redis_pcie_mscc_raid_info_t0:RAIDInfo:SerialNum\x80%@`5341523531314532 \x00tail")
got := parseRedisRAIDSerials(raw)
if len(got) != 1 {
t.Fatalf("expected 1 raid serial, got %d", len(got))
}
if got[0] != "SAR511E2" {
t.Fatalf("expected decoded serial SAR511E2, got %q", got[0])
}
}
func TestApplyRedisPCIeEnrichment_FillsStorageControllerSerial(t *testing.T) {
hw := &models.HardwareConfig{
PCIeDevices: []models.PCIeDevice{
{Slot: "#CPU1_PCIE9", BDF: "98:00.0", DeviceClass: "Smart Storage PQI SAS", PartNumber: "PM8222-SHBA"},
{Slot: "#CPU0_PCIE3", BDF: "32:00.0", DeviceClass: "Fibre Channel", PartNumber: "LPE32002"},
},
}
applyRedisPCIeEnrichment(hw, []string{"SAR511E2"})
if hw.PCIeDevices[0].SerialNumber != "SAR511E2" {
t.Fatalf("expected PM8222 serial SAR511E2, got %q", hw.PCIeDevices[0].SerialNumber)
}
if hw.PCIeDevices[1].SerialNumber != "" {
t.Fatalf("expected non-storage device serial untouched, got %q", hw.PCIeDevices[1].SerialNumber)
}
}
func TestParseRedisPCIESerialSnapshot_MapsPNToSN(t *testing.T) {
raw := []byte("" +
"AssetInfoPCIE:SNPN9:PN PM8222-SHBA\x00" +
"AssetInfoPCIE:SNPN9:SN SAR511E2\x00")
snap := parseRedisPCIESerialSnapshot(raw)
got := snap.ByPart["pm8222-shba"]
if got != "SAR511E2" {
t.Fatalf("expected SN SAR511E2 for PM8222-SHBA, got %q", got)
}
}
func TestApplyRedisPCIESNPNEnrichment_FillsByPartNumber(t *testing.T) {
hw := &models.HardwareConfig{
PCIeDevices: []models.PCIeDevice{
{Slot: "#CPU1_PCIE9", PartNumber: "PM8222-SHBA"},
},
}
snap := redisPCIESerialSnapshot{ByPart: map[string]string{"pm8222-shba": "SAR511E2"}}
applyRedisPCIESNPNEnrichment(hw, snap)
if hw.PCIeDevices[0].SerialNumber != "SAR511E2" {
t.Fatalf("expected serial SAR511E2, got %q", hw.PCIeDevices[0].SerialNumber)
}
}
func indexBytes(haystack, needle []byte) int {
for i := 0; i+len(needle) <= len(haystack); i++ {
match := true
for j := 0; j < len(needle); j++ {
if haystack[i+j] != needle[j] {
match = false
break
}
}
if match {
return i
}
}
return -1
}

View File

@@ -6,12 +6,19 @@ import (
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
// ParseSELList parses selelist.csv file with SEL events
// Format: ID, Date (MM/DD/YYYY), Time (HH:MM:SS), Sensor, Event, Status
// Example: 1,04/18/2025,09:31:18,Event Logging Disabled SEL_Status,Log area reset/cleared,Asserted
func ParseSELList(content []byte) []models.Event {
return ParseSELListWithLocation(content, parser.DefaultArchiveLocation())
}
// ParseSELListWithLocation parses selelist.csv using provided source timezone
// for timestamps that don't contain an explicit offset.
func ParseSELListWithLocation(content []byte, location *time.Location) []models.Event {
var events []models.Event
text := string(content)
@@ -48,7 +55,7 @@ func ParseSELList(content []byte) []models.Event {
status := strings.TrimSpace(records[5])
// Parse timestamp: MM/DD/YYYY HH:MM:SS
timestamp := parseSELTimestamp(dateStr, timeStr)
timestamp := parseSELTimestamp(dateStr, timeStr, location)
// Extract sensor type and name
sensorType, sensorName := parseSensorInfo(sensorStr)
@@ -76,12 +83,16 @@ func ParseSELList(content []byte) []models.Event {
}
// parseSELTimestamp parses MM/DD/YYYY and HH:MM:SS into time.Time
func parseSELTimestamp(dateStr, timeStr string) time.Time {
func parseSELTimestamp(dateStr, timeStr string, location *time.Location) time.Time {
// Combine date and time: MM/DD/YYYY HH:MM:SS
timestampStr := dateStr + " " + timeStr
if location == nil {
location = parser.DefaultArchiveLocation()
}
// Try parsing with MM/DD/YYYY format
t, err := time.Parse("01/02/2006 15:04:05", timestampStr)
t, err := time.ParseInLocation("01/02/2006 15:04:05", timestampStr, location)
if err != nil {
// Fallback to current time
return time.Now()

View File

@@ -0,0 +1,33 @@
package inspur
import (
"testing"
"time"
)
func TestParseSELListWithLocation_UsesProvidedTimezone(t *testing.T) {
content := []byte("sel elist:\n1,02/28/2026,04:18:18,Sensor X,Event,Asserted\n")
shanghai, err := time.LoadLocation("Asia/Shanghai")
if err != nil {
t.Fatalf("load location: %v", err)
}
events := ParseSELListWithLocation(content, shanghai)
if len(events) != 1 {
t.Fatalf("expected 1 event, got %d", len(events))
}
// 04:18:18 +08:00 == 20:18:18Z (previous day)
want := time.Date(2026, 2, 27, 20, 18, 18, 0, time.UTC)
if !events[0].Timestamp.UTC().Equal(want) {
t.Fatalf("unexpected timestamp: got %s want %s", events[0].Timestamp.UTC(), want)
}
}
func TestParseTimezoneConfigLocation(t *testing.T) {
content := []byte("[TimeZoneConfig]\ntimezone=Asia/Shanghai\n")
got := parseTimezoneConfigLocation(content)
if got != "Asia/Shanghai" {
t.Fatalf("unexpected timezone: %q", got)
}
}

View File

@@ -0,0 +1,92 @@
package inspur
import (
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var (
hostnameJSONRegex = regexp.MustCompile(`"_HOSTNAME"\s*:\s*"([^"]+)"`)
)
func inferBoardSerialFromFallbackLogs(files []parser.ExtractedFile) string {
// Prefer FRU dump when present.
if f := parser.FindFileByName(files, "fru.txt"); f != nil {
fruList := ParseFRU(f.Content)
for _, fru := range fruList {
serial := strings.TrimSpace(fru.SerialNumber)
if serial == "" || serial == "0" {
continue
}
desc := strings.ToLower(strings.TrimSpace(fru.Description))
if strings.Contains(desc, "builtin") || strings.Contains(desc, "fru device") {
return serial
}
}
}
// Fallback to explicit hostname file.
if f := parser.FindFileByName(files, "hostname"); f != nil {
if serial := sanitizeCandidateSerial(firstNonEmptyLine(string(f.Content))); serial != "" {
return serial
}
}
// Last-resort fallback from structured journal logs.
if f := parser.FindFileByName(files, "maintenance_json.log"); f != nil {
if m := hostnameJSONRegex.FindSubmatch(f.Content); len(m) == 2 {
if serial := sanitizeCandidateSerial(string(m[1])); serial != "" {
return serial
}
}
}
return ""
}
func inferBoardModelFromFallbackLogs(files []parser.ExtractedFile) string {
// Prefer FRU dump when present.
if f := parser.FindFileByName(files, "fru.txt"); f != nil {
fruList := ParseFRU(f.Content)
for _, fru := range fruList {
model := sanitizeCandidateModel(fru.ProductName)
if model == "" {
continue
}
desc := strings.ToLower(strings.TrimSpace(fru.Description))
if strings.Contains(desc, "builtin") || strings.Contains(desc, "fru device") {
return model
}
}
}
return ""
}
func firstNonEmptyLine(s string) string {
for _, line := range strings.Split(s, "\n") {
line = strings.TrimSpace(line)
if line != "" {
return line
}
}
return ""
}
func sanitizeCandidateSerial(s string) string {
s = strings.TrimSpace(s)
if s == "" || strings.EqualFold(s, "localhost") || strings.ContainsAny(s, " \t") {
return ""
}
return s
}
func sanitizeCandidateModel(s string) string {
s = strings.TrimSpace(s)
if s == "" || strings.EqualFold(s, "null") || s == "0" {
return ""
}
return s
}

View File

@@ -0,0 +1,76 @@
package inspur
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestInferBoardSerialFromFallbackLogs_PrefersFRU(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "component/fru.txt",
Content: []byte(`FRU Device Description : Builtin FRU Device (ID 0)
Product Serial : 23DB01639
`),
},
{
Path: "runningdata/RTOSDump/hostname",
Content: []byte("HOSTNAME-FALLBACK\n"),
},
{
Path: "log/bmc/struct-log/maintenance_json.log",
Content: []byte(`{ "_HOSTNAME": "JSON-FALLBACK" }`),
},
}
got := inferBoardSerialFromFallbackLogs(files)
if got != "23DB01639" {
t.Fatalf("expected FRU serial 23DB01639, got %q", got)
}
}
func TestInferBoardSerialFromFallbackLogs_UsesHostnameFile(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "runningdata/RTOSDump/hostname",
Content: []byte("23DB01639\n"),
},
}
got := inferBoardSerialFromFallbackLogs(files)
if got != "23DB01639" {
t.Fatalf("expected hostname serial 23DB01639, got %q", got)
}
}
func TestInferBoardSerialFromFallbackLogs_UsesMaintenanceJSON(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "log/bmc/struct-log/maintenance_json.log",
Content: []byte(`{ "_HOSTNAME": "23DB01639", "MESSAGE": "ok" }`),
},
}
got := inferBoardSerialFromFallbackLogs(files)
if got != "23DB01639" {
t.Fatalf("expected JSON hostname serial 23DB01639, got %q", got)
}
}
func TestInferBoardModelFromFallbackLogs_PrefersFRU(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "component/fru.txt",
Content: []byte(`FRU Device Description : Builtin FRU Device (ID 0)
Board Product : KR9288-X3-A0-F0-00
Product Name : KR9288-X3-A0-F0-00
`),
},
}
got := inferBoardModelFromFallbackLogs(files)
if got != "KR9288-X3-A0-F0-00" {
t.Fatalf("expected board model KR9288-X3-A0-F0-00, got %q", got)
}
}

View File

@@ -0,0 +1,166 @@
package inspur
import (
"regexp"
"sort"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var bpHDDSerialTokenRegex = regexp.MustCompile(`[A-Za-z0-9]{8,32}`)
func enrichStorageFromSerialFallbackFiles(files []parser.ExtractedFile, hw *models.HardwareConfig) {
if hw == nil {
return
}
f := parser.FindFileByName(files, "BpHDDSerialNumber.info")
if f == nil {
return
}
serials := extractBPHDDSerials(f.Content)
if len(serials) == 0 {
return
}
applyStorageSerialFallback(hw, serials)
}
func extractBPHDDSerials(content []byte) []string {
if len(content) == 0 {
return nil
}
matches := bpHDDSerialTokenRegex.FindAllString(string(content), -1)
if len(matches) == 0 {
return nil
}
out := make([]string, 0, len(matches))
seen := make(map[string]struct{}, len(matches))
for _, m := range matches {
v := normalizeRedisValue(m)
if !looksLikeStorageSerial(v) {
continue
}
key := strings.ToLower(v)
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
out = append(out, v)
}
return out
}
func looksLikeStorageSerial(v string) bool {
if len(v) < 8 {
return false
}
hasLetter := false
hasDigit := false
for _, r := range v {
switch {
case r >= 'A' && r <= 'Z':
hasLetter = true
case r >= 'a' && r <= 'z':
hasLetter = true
case r >= '0' && r <= '9':
hasDigit = true
default:
return false
}
}
return hasLetter && hasDigit
}
// applyRAIDSlotSerials updates storage serial numbers using the slot→serial map
// derived from audit.log RAID SN change events. Overwrites existing serials since
// audit.log represents the authoritative current state after all disk replacements.
func applyRAIDSlotSerials(hw *models.HardwareConfig, serials map[string]string) {
if hw == nil || len(serials) == 0 {
return
}
for i := range hw.Storage {
slot := strings.TrimSpace(hw.Storage[i].Slot)
if slot == "" {
continue
}
if sn, ok := serials[slot]; ok && sn != "" {
hw.Storage[i].SerialNumber = sn
}
}
}
func applyStorageSerialFallback(hw *models.HardwareConfig, serials []string) {
if hw == nil || len(hw.Storage) == 0 || len(serials) == 0 {
return
}
existing := make(map[string]struct{}, len(hw.Storage))
for _, dev := range hw.Storage {
if sn := normalizeRedisValue(dev.SerialNumber); sn != "" {
existing[strings.ToLower(sn)] = struct{}{}
}
}
filtered := make([]string, 0, len(serials))
for _, sn := range serials {
key := strings.ToLower(sn)
if _, ok := existing[key]; ok {
continue
}
filtered = append(filtered, sn)
}
if len(filtered) == 0 {
return
}
type target struct {
index int
rank int
slot string
}
targets := make([]target, 0, len(hw.Storage))
for i := range hw.Storage {
dev := hw.Storage[i]
if normalizeRedisValue(dev.SerialNumber) != "" {
continue
}
if !dev.Present && strings.TrimSpace(dev.Slot) == "" {
continue
}
rank := 0
if !dev.Present {
rank += 10
}
if strings.EqualFold(strings.TrimSpace(dev.Type), "NVMe") {
rank += 5
}
if strings.TrimSpace(dev.Slot) == "" {
rank += 4
}
targets = append(targets, target{
index: i,
rank: rank,
slot: strings.ToLower(strings.TrimSpace(dev.Slot)),
})
}
if len(targets) == 0 {
return
}
sort.Slice(targets, func(i, j int) bool {
if targets[i].rank != targets[j].rank {
return targets[i].rank < targets[j].rank
}
return targets[i].slot < targets[j].slot
})
for i := 0; i < len(targets) && i < len(filtered); i++ {
dev := &hw.Storage[targets[i].index]
dev.SerialNumber = filtered[i]
if !dev.Present {
dev.Present = true
}
}
}

View File

@@ -0,0 +1,106 @@
package inspur
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestParseAssetJSON_HddSlotFallbackAndPresence(t *testing.T) {
content := []byte(`{
"HddInfo": [
{
"PresentBitmap": [1],
"SerialNumber": "",
"Manufacturer": "",
"ModelName": "",
"FirmwareVersion": "",
"Capacity": 0,
"Location": 2,
"DiskInterfaceType": 5,
"MediaType": 1,
"LocationString": ""
}
]
}`)
hw, err := ParseAssetJSON(content, nil, nil)
if err != nil {
t.Fatalf("ParseAssetJSON failed: %v", err)
}
if len(hw.Storage) != 1 {
t.Fatalf("expected 1 storage entry, got %d", len(hw.Storage))
}
if hw.Storage[0].Slot != "OB03" {
t.Fatalf("expected OB03 slot fallback, got %q", hw.Storage[0].Slot)
}
if !hw.Storage[0].Present {
t.Fatalf("expected fallback storage entry marked present")
}
if hw.Storage[0].Type != "NVMe" {
t.Fatalf("expected NVMe type, got %q", hw.Storage[0].Type)
}
}
func TestParseDiskBackplaneInfo_PopulatesOnlyMissingPresentDrives(t *testing.T) {
text := `RESTful diskbackplane info:
[
{ "port_count": 8, "driver_count": 4, "front": 1, "backplane_index": 0, "present": 1, "cpld_version": "3.1", "temperature": 18 },
{ "port_count": 8, "driver_count": 3, "front": 1, "backplane_index": 1, "present": 1, "cpld_version": "3.1", "temperature": 17 }
]
BMC`
hw := &models.HardwareConfig{
Storage: []models.Storage{
{Slot: "OB01", Type: "NVMe", Present: true},
{Slot: "OB02", Type: "NVMe", Present: true},
{Slot: "OB03", Type: "NVMe", Present: true},
{Slot: "OB04", Type: "NVMe", Present: true},
},
}
parseDiskBackplaneInfo(text, hw)
if len(hw.Storage) != 7 {
t.Fatalf("expected total storage count 7 after backplane merge, got %d", len(hw.Storage))
}
bpCount := 0
for _, dev := range hw.Storage {
if strings.HasPrefix(dev.Slot, "BP0:") || strings.HasPrefix(dev.Slot, "BP1:") {
bpCount++
}
}
if bpCount != 3 {
t.Fatalf("expected 3 synthetic backplane rows, got %d", bpCount)
}
}
func TestEnrichStorageFromSerialFallbackFiles_AssignsSerials(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "onekeylog/configuration/conf/BpHDDSerialNumber.info",
Content: []byte{
0xA0, 0xA1, 0xA2, 0xA3,
'S', '6', 'K', 'N', 'N', 'G', '0', 'W', '4', '2', '8', '5', '5', '2',
0x00,
'P', 'H', 'Y', 'I', '5', '2', '7', '1', '0', '0', '4', 'B', '1', 'P', '9', 'D', 'G', 'N',
},
},
}
hw := &models.HardwareConfig{
Storage: []models.Storage{
{Slot: "BP0:0", Type: "HDD", Present: true},
{Slot: "BP0:1", Type: "HDD", Present: true},
{Slot: "OB01", Type: "NVMe", Present: true},
},
}
enrichStorageFromSerialFallbackFiles(files, hw)
if hw.Storage[0].SerialNumber == "" || hw.Storage[1].SerialNumber == "" {
t.Fatalf("expected serials assigned to present storage entries, got %#v", hw.Storage)
}
}

View File

@@ -48,9 +48,9 @@ func ParseSyslog(content []byte, sourcePath string) []models.Event {
event := models.Event{
ID: generateEventID(sourcePath, lineNum),
Timestamp: timestamp,
Source: matches[4],
Source: "syslog",
SensorType: "syslog",
SensorName: matches[3],
SensorName: matches[4],
Description: matches[5],
Severity: severity,
RawData: line,

View File

@@ -1,175 +0,0 @@
# NVIDIA Field Diagnostics Parser
Парсер для диагностических архивов NVIDIA HGX Field Diagnostics.
Универсальный парсер, не привязанный к конкретному производителю серверов.
## Поддерживаемые архивы
- NVIDIA HGX Field Diag (работает с любыми серверами: Supermicro, Dell, HPE, и т.д.)
- Архивы с результатами GPU диагностики NVIDIA
## Формат архива
Парсер работает с архивами в формате:
- `.tar` (несжатый tar)
- `.tar.gz` (сжатый gzip)
## Распознаваемые файлы
### Основные файлы
1. **output.log** - вывод dmidecode с информацией о системе
- Производитель сервера (Manufacturer)
- Модель сервера (Product Name) - например, SYS-821GE-TNHR
- Серийный номер сервера (Serial Number) - например, A514359X5A07900
- UUID, SKU Number, Family
2. **unified_summary.json** - детальная информация о системе и компонентах
- Информация о GPU (модель, производитель, VBIOS, PCI адреса)
- Информация о NVSwitch (VendorID, DeviceID, Link speed/width)
- Информация о производителе и модели сервера
3. **summary.json** - результаты тестов диагностики
- Результаты тестов GPU (inforom, checkinforom, gpumem, gpustress, pcie, nvlink, nvswitch, power)
- Коды ошибок и статусы тестов
4. **summary.csv** - альтернативный формат результатов тестов
### Дополнительные файлы
- `gpu_fieldiag/*.log` - детальные логи диагностики каждого GPU
- `inventory/*.json` - дополнительная информация о конфигурации
## Извлекаемые данные
### Hardware Configuration
#### GPUs
```json
{
"slot": "GPUSXM1",
"model": "NVIDIA Device 2335",
"manufacturer": "NVIDIA Corporation",
"firmware": "96.00.D0.00.03",
"bdf": "0000:3a:00.0"
}
```
#### NVSwitch (как PCIe устройства)
```json
{
"slot": "NVSWITCHNVSWITCH0",
"device_class": "NVSwitch",
"manufacturer": "NVIDIA Corporation",
"vendor_id": 4318,
"device_id": 8867,
"bdf": "0000:05:00.0",
"link_speed": "16GT/s",
"link_width": 2
}
```
### Events
События создаются для:
- **Предупреждений и ошибок** тестов диагностики
- Примеры событий:
- `Row remapping failed` - ошибка памяти GPU (Warning)
- Различные тесты: connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power
Уровни severity:
- `info` - информационные события (тесты прошли успешно)
- `warning` - предупреждения (например, Row remapping failed)
- `critical` - критические ошибки (коды ошибок 300+)
## Пример использования
```bash
# Запуск веб-интерфейса
./logpile --file /path/to/A514359X5A07900_logs-20260122-074208.tar
# Веб-интерфейс будет доступен на http://localhost:8082
```
## Автоопределение
Парсер автоматически определяет архивы NVIDIA Field Diag по наличию:
- `unified_summary.json` с маркером "HGX Field Diag"
- `summary.json` и `summary.csv` с результатами тестов
- Директории `gpu_fieldiag/`
Confidence score:
- `unified_summary.json` с маркером "HGX Field Diag": +40
- `summary.json`: +20
- `summary.csv`: +15
- `gpu_fieldiag/` directory: +15
## Версионирование
**Текущая версия парсера:** 1.1.0
При модификации логики парсера необходимо увеличивать версию в константе `parserVersion` в файле `parser.go`.
### История версий
- **1.1.0** - Добавлен парсинг output.log (dmidecode) для извлечения модели и серийного номера сервера
- **1.0.0** - Первоначальная версия с парсингом unified_summary.json и summary.json/csv
## Примеры данных
### Пример unified_summary.json
```json
{
"runInfo": {
"diagVersion": "24287-XXXX-FLD-42658",
"diagName": "HGX Field Diag",
"finalResult": "FAIL",
"errorCode": 363
},
"tests": [{
"virtualId": "inventory",
"components": [{
"componentId": "GPUSXM1",
"properties": [
{"id": "Manufacturer", "value": "Any Server Vendor"},
{"id": "VendorID", "value": "10de"},
{"id": "DeviceID", "value": "2335"}
]
}]
}]
}
```
### Пример summary.json
```json
[
{
"Error Code": "005-000-1-000000000363",
"Test": "gpumem",
"Component ID": "SXM5_SN_1653925025497",
"Notes": "Row remapping failed",
"Virtual ID": "gpumem"
}
]
```
## Известные ограничения
1. Парсер фокусируется на данных из `unified_summary.json` и `summary.json`
2. Детальные логи из `gpu_fieldiag/*.log` пока не парсятся
3. Информация о CPU, памяти и дисках не извлекается (в архиве отсутствует)
## Разработка
### Добавление новых полей
1. Изучите структуру JSON в архиве
2. Добавьте поля в структуры `Component` или `Property`
3. Обновите функции `parseGPUComponent` или `parseNVSwitchComponent`
4. Увеличьте версию парсера
### Добавление новых типов файлов
1. Создайте новый файл с парсером (например, `gpu_logs.go`)
2. Добавьте парсинг в функцию `Parse()` в `parser.go`
3. Обновите документацию

View File

@@ -0,0 +1,274 @@
package nvidia
import (
"regexp"
"strconv"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var verboseRunTestingLineRegex = regexp.MustCompile(`^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),\d+\s+-\s+Testing\s+([a-zA-Z0-9_]+)\s*$`)
var runLogStartTimeRegex = regexp.MustCompile(`^Start time\s+([A-Za-z]{3}, \d{2} [A-Za-z]{3} \d{4} \d{2}:\d{2}:\d{2})\s*$`)
var runLogTestDurationRegex = regexp.MustCompile(`^Testing\s+([a-zA-Z0-9_]+)\s+\S+\s+\[\s*([0-9]+):([0-9]{2})s\s*\]\s*$`)
var modsStartLineRegex = regexp.MustCompile(`(?m)^MODS start:\s+([A-Za-z]{3}\s+[A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}\s+\d{4})\s*$`)
var gpuFieldiagOutputPathRegex = regexp.MustCompile(`(?i)gpu_fieldiag[\\/]+sxm(\d+)_sn_([^\\/]+)[\\/]+output\.log$`)
var nvswitchDevnameRegex = regexp.MustCompile(`devname=[^,\s]+,(NVSWITCH\d+)`)
type componentCheckTimes struct {
GPUDefault time.Time
NVSwitchDefault time.Time
GPUBySerial map[string]time.Time // key: GPU serial
GPUBySlot map[string]time.Time // key: GPUSXM<idx>
NVSwitchBySlot map[string]time.Time // key: NVSWITCH<idx>
}
// CollectGPUAndNVSwitchCheckTimes extracts GPU/NVSwitch check timestamps from NVIDIA logs.
// Priority:
// 1) verbose_run.log "Testing <test>" timestamps
// 2) run.log start time + cumulative durations
func CollectGPUAndNVSwitchCheckTimes(files []parser.ExtractedFile) componentCheckTimes {
gpuBySerial := make(map[string]time.Time)
gpuBySlot := make(map[string]time.Time)
nvsBySlot := make(map[string]time.Time)
for _, f := range files {
path := strings.TrimSpace(f.Path)
pathLower := strings.ToLower(path)
// Per-GPU timestamp from gpu_fieldiag/<SXMx_SN_serial>/output.log
if strings.HasSuffix(pathLower, "output.log") && strings.Contains(pathLower, "gpu_fieldiag/") {
ts := parseModsStartTime(f.Content)
if ts.IsZero() {
continue
}
matches := gpuFieldiagOutputPathRegex.FindStringSubmatch(path)
if len(matches) == 3 {
slot := "GPUSXM" + strings.TrimSpace(matches[1])
serial := strings.TrimSpace(matches[2])
if slot != "" {
gpuBySlot[slot] = ts
}
if serial != "" {
gpuBySerial[serial] = ts
}
}
}
// Per-NVSwitch timestamp and slot list from nvswitch/output.log
if strings.HasSuffix(pathLower, "nvswitch/output.log") || strings.HasSuffix(pathLower, "nvswitch\\output.log") {
ts := parseModsStartTime(f.Content)
if ts.IsZero() {
continue
}
for _, slot := range parseNVSwitchSlotsFromOutput(f.Content) {
nvsBySlot[slot] = ts
}
}
}
testStarts := make(map[string]time.Time)
if f := parser.FindFileByName(files, "verbose_run.log"); f != nil {
for testName, ts := range parseVerboseRunTestStartTimes(f.Content) {
testStarts[strings.ToLower(strings.TrimSpace(testName))] = ts
}
}
if len(testStarts) == 0 {
if f := parser.FindFileByName(files, "run.log"); f != nil {
for testName, ts := range parseRunLogTestStartTimes(f.Content) {
testStarts[strings.ToLower(strings.TrimSpace(testName))] = ts
}
}
}
return componentCheckTimes{
GPUDefault: pickFirstTestTime(testStarts, "gpu_fieldiag", "gpumem", "gpustress", "pcie", "inventory"),
NVSwitchDefault: pickFirstTestTime(testStarts, "nvswitch", "inventory"),
GPUBySerial: gpuBySerial,
GPUBySlot: gpuBySlot,
NVSwitchBySlot: nvsBySlot,
}
}
func pickFirstTestTime(testStarts map[string]time.Time, names ...string) time.Time {
for _, name := range names {
if ts := testStarts[strings.ToLower(strings.TrimSpace(name))]; !ts.IsZero() {
return ts
}
}
return time.Time{}
}
func parseVerboseRunTestStartTimes(content []byte) map[string]time.Time {
result := make(map[string]time.Time)
lines := strings.Split(string(content), "\n")
for _, line := range lines {
matches := verboseRunTestingLineRegex.FindStringSubmatch(strings.TrimSpace(line))
if len(matches) != 3 {
continue
}
ts, err := parser.ParseInDefaultArchiveLocation("2006-01-02 15:04:05", strings.TrimSpace(matches[1]))
if err != nil {
continue
}
testName := strings.ToLower(strings.TrimSpace(matches[2]))
if testName == "" {
continue
}
if _, exists := result[testName]; !exists {
result[testName] = ts
}
}
return result
}
func parseRunLogTestStartTimes(content []byte) map[string]time.Time {
lines := strings.Split(string(content), "\n")
start := time.Time{}
for _, line := range lines {
matches := runLogStartTimeRegex.FindStringSubmatch(strings.TrimSpace(line))
if len(matches) != 2 {
continue
}
parsed, err := parser.ParseInDefaultArchiveLocation("Mon, 02 Jan 2006 15:04:05", strings.TrimSpace(matches[1]))
if err != nil {
continue
}
start = parsed
break
}
if start.IsZero() {
return nil
}
result := make(map[string]time.Time)
cursor := start
for _, line := range lines {
matches := runLogTestDurationRegex.FindStringSubmatch(strings.TrimSpace(line))
if len(matches) != 4 {
continue
}
testName := strings.ToLower(strings.TrimSpace(matches[1]))
minutes, errMin := strconv.Atoi(strings.TrimSpace(matches[2]))
seconds, errSec := strconv.Atoi(strings.TrimSpace(matches[3]))
if errMin != nil || errSec != nil {
continue
}
if _, exists := result[testName]; !exists {
result[testName] = cursor
}
cursor = cursor.Add(time.Duration(minutes)*time.Minute + time.Duration(seconds)*time.Second)
}
return result
}
func parseModsStartTime(content []byte) time.Time {
matches := modsStartLineRegex.FindSubmatch(content)
if len(matches) != 2 {
return time.Time{}
}
tsRaw := strings.TrimSpace(string(matches[1]))
if tsRaw == "" {
return time.Time{}
}
ts, err := parser.ParseInDefaultArchiveLocation("Mon Jan 2 15:04:05 2006", tsRaw)
if err != nil {
return time.Time{}
}
return ts
}
func parseNVSwitchSlotsFromOutput(content []byte) []string {
matches := nvswitchDevnameRegex.FindAllSubmatch(content, -1)
if len(matches) == 0 {
return nil
}
seen := make(map[string]struct{})
out := make([]string, 0, len(matches))
for _, m := range matches {
if len(m) != 2 {
continue
}
slot := strings.ToUpper(strings.TrimSpace(string(m[1])))
if slot == "" {
continue
}
if _, exists := seen[slot]; exists {
continue
}
seen[slot] = struct{}{}
out = append(out, slot)
}
return out
}
// ApplyGPUAndNVSwitchCheckTimes writes parsed check timestamps to component status metadata.
func ApplyGPUAndNVSwitchCheckTimes(result *models.AnalysisResult, times componentCheckTimes) {
if result == nil || result.Hardware == nil {
return
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
ts := time.Time{}
if serial := strings.TrimSpace(gpu.SerialNumber); serial != "" {
ts = times.GPUBySerial[serial]
}
if ts.IsZero() {
ts = times.GPUBySlot[strings.ToUpper(strings.TrimSpace(gpu.Slot))]
}
if ts.IsZero() {
ts = times.GPUDefault
}
if ts.IsZero() {
continue
}
gpu.StatusCheckedAt = &ts
status := strings.TrimSpace(gpu.Status)
if status == "" {
status = "Unknown"
}
gpu.StatusAtCollect = &models.StatusAtCollection{
Status: status,
At: ts,
}
}
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
slot := normalizeNVSwitchSlot(strings.TrimSpace(dev.Slot))
if slot == "" {
continue
}
slot = strings.ToUpper(slot)
if !strings.EqualFold(strings.TrimSpace(dev.DeviceClass), "NVSwitch") &&
!strings.HasPrefix(slot, "NVSWITCH") {
continue
}
ts := times.NVSwitchBySlot[slot]
if ts.IsZero() {
ts = times.NVSwitchDefault
}
if ts.IsZero() {
continue
}
dev.StatusCheckedAt = &ts
status := strings.TrimSpace(dev.Status)
if status == "" {
status = "Unknown"
}
dev.StatusAtCollect = &models.StatusAtCollection{
Status: status,
At: ts,
}
}
}

View File

@@ -0,0 +1,143 @@
package nvidia
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestParseVerboseRunTestStartTimes(t *testing.T) {
content := []byte(`
2026-01-22 09:11:32,458 - Testing nvswitch
2026-01-22 09:45:36,016 - Testing gpu_fieldiag
`)
got := parseVerboseRunTestStartTimes(content)
nvs := got["nvswitch"]
if nvs.IsZero() {
t.Fatalf("expected nvswitch timestamp")
}
gpu := got["gpu_fieldiag"]
if gpu.IsZero() {
t.Fatalf("expected gpu_fieldiag timestamp")
}
if nvs.UTC().Format(time.RFC3339) != "2026-01-22T06:11:32Z" {
t.Fatalf("unexpected nvswitch timestamp: %s", nvs.Format(time.RFC3339))
}
if gpu.UTC().Format(time.RFC3339) != "2026-01-22T06:45:36Z" {
t.Fatalf("unexpected gpu_fieldiag timestamp: %s", gpu.Format(time.RFC3339))
}
}
func TestParseRunLogTestStartTimes(t *testing.T) {
content := []byte(`
Start time Thu, 22 Jan 2026 07:42:26
Testing gpumem FAILED [ 26:12s ]
Testing gpustress OK [ 7:10s ]
Testing nvswitch OK [ 9:25s ]
`)
got := parseRunLogTestStartTimes(content)
if got["gpumem"].UTC().Format(time.RFC3339) != "2026-01-22T04:42:26Z" {
t.Fatalf("unexpected gpumem start: %s", got["gpumem"].Format(time.RFC3339))
}
if got["gpustress"].UTC().Format(time.RFC3339) != "2026-01-22T05:08:38Z" {
t.Fatalf("unexpected gpustress start: %s", got["gpustress"].Format(time.RFC3339))
}
if got["nvswitch"].UTC().Format(time.RFC3339) != "2026-01-22T05:15:48Z" {
t.Fatalf("unexpected nvswitch start: %s", got["nvswitch"].Format(time.RFC3339))
}
}
func TestApplyGPUAndNVSwitchCheckTimes(t *testing.T) {
gpuTs := time.Date(2026, 1, 22, 9, 45, 36, 0, time.UTC)
nvsTs := time.Date(2026, 1, 22, 9, 11, 32, 0, time.UTC)
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "GPUSXM5", Status: "FAIL"},
},
PCIeDevices: []models.PCIeDevice{
{Slot: "NVSWITCH0", DeviceClass: "NVSwitch", Status: "PASS"},
{Slot: "NIC0", DeviceClass: "NetworkController", Status: "PASS"},
},
},
}
ApplyGPUAndNVSwitchCheckTimes(result, componentCheckTimes{
GPUBySlot: map[string]time.Time{"GPUSXM5": gpuTs},
NVSwitchBySlot: map[string]time.Time{"NVSWITCH0": nvsTs},
})
if got := result.Hardware.GPUs[0].StatusCheckedAt; got == nil || !got.Equal(gpuTs) {
t.Fatalf("expected gpu status_checked_at %s, got %v", gpuTs.Format(time.RFC3339), got)
}
if result.Hardware.GPUs[0].StatusAtCollect == nil || !result.Hardware.GPUs[0].StatusAtCollect.At.Equal(gpuTs) {
t.Fatalf("expected gpu status_at_collection.at %s", gpuTs.Format(time.RFC3339))
}
if got := result.Hardware.PCIeDevices[0].StatusCheckedAt; got == nil || !got.Equal(nvsTs) {
t.Fatalf("expected nvswitch status_checked_at %s, got %v", nvsTs.Format(time.RFC3339), got)
}
if result.Hardware.PCIeDevices[0].StatusAtCollect == nil || !result.Hardware.PCIeDevices[0].StatusAtCollect.At.Equal(nvsTs) {
t.Fatalf("expected nvswitch status_at_collection.at %s", nvsTs.Format(time.RFC3339))
}
if result.Hardware.PCIeDevices[1].StatusCheckedAt != nil {
t.Fatalf("expected non-nvswitch device status_checked_at to stay nil")
}
}
func TestCollectGPUAndNVSwitchCheckTimes_FromVerboseRun(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "verbose_run.log",
Content: []byte(`
2026-01-22 09:11:32,458 - Testing nvswitch
2026-01-22 09:45:36,016 - Testing gpu_fieldiag
`),
},
}
got := CollectGPUAndNVSwitchCheckTimes(files)
if got.GPUDefault.UTC().Format(time.RFC3339) != "2026-01-22T06:45:36Z" {
t.Fatalf("unexpected GPU check time: %s", got.GPUDefault.Format(time.RFC3339))
}
if got.NVSwitchDefault.UTC().Format(time.RFC3339) != "2026-01-22T06:11:32Z" {
t.Fatalf("unexpected NVSwitch check time: %s", got.NVSwitchDefault.Format(time.RFC3339))
}
}
func TestCollectGPUAndNVSwitchCheckTimes_FromComponentOutputLogs(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "gpu_fieldiag/SXM5_SN_1653925025497/output.log",
Content: []byte(`
$ some command
MODS start: Thu Jan 22 09:45:36 2026
`),
},
{
Path: "nvswitch/output.log",
Content: []byte(`
$ cmd devname=0000:08:00.0,NVSWITCH3 devname=0000:07:00.0,NVSWITCH2 devname=0000:06:00.0,NVSWITCH1 devname=0000:05:00.0,NVSWITCH0
MODS start: Thu Jan 22 09:11:32 2026
`),
},
}
got := CollectGPUAndNVSwitchCheckTimes(files)
if got.GPUBySerial["1653925025497"].UTC().Format(time.RFC3339) != "2026-01-22T06:45:36Z" {
t.Fatalf("unexpected GPU serial check time: %s", got.GPUBySerial["1653925025497"].Format(time.RFC3339))
}
if got.GPUBySlot["GPUSXM5"].UTC().Format(time.RFC3339) != "2026-01-22T06:45:36Z" {
t.Fatalf("unexpected GPU slot check time: %s", got.GPUBySlot["GPUSXM5"].Format(time.RFC3339))
}
if got.NVSwitchBySlot["NVSWITCH0"].UTC().Format(time.RFC3339) != "2026-01-22T06:11:32Z" {
t.Fatalf("unexpected NVSwitch0 check time: %s", got.NVSwitchBySlot["NVSWITCH0"].Format(time.RFC3339))
}
if got.NVSwitchBySlot["NVSWITCH3"].UTC().Format(time.RFC3339) != "2026-01-22T06:11:32Z" {
t.Fatalf("unexpected NVSwitch3 check time: %s", got.NVSwitchBySlot["NVSWITCH3"].Format(time.RFC3339))
}
}

View File

@@ -0,0 +1,374 @@
package nvidia
import (
"encoding/json"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var (
gpuNameWithSerialRegex = regexp.MustCompile(`^SXM(\d+)_SN_(.+)$`)
gpuNameSlotOnlyRegex = regexp.MustCompile(`^SXM(\d+)$`)
skuCodeRegex = regexp.MustCompile(`^(G\d{3})[.-](\d{4})`)
skuCodeInsideRegex = regexp.MustCompile(`(?:^|[^A-Z0-9])(?:\d)?(G\d{3})[.-](\d{4})(?:[^A-Z0-9]|$)`)
inforomPathRegex = regexp.MustCompile(`(?i)(?:^|[\\/])(checkinforom|inforom)[\\/](SXM(\d+))(?:_SN_([^\\/]+))?[\\/]fieldiag\.jso$`)
inforomProductPNRegex = regexp.MustCompile(`"product_part_num"\s*:\s*"([^"]+)"`)
inforomSerialRegex = regexp.MustCompile(`"serial_number"\s*:\s*"([^"]+)"`)
)
type testSpecData struct {
Actions []struct {
VirtualID string `json:"virtual_id"`
Args struct {
SKUToFile map[string]string `json:"sku_to_sku_json_file_map"`
ModsMapping map[string]json.RawMessage `json:"mods_mapping"`
} `json:"args"`
} `json:"actions"`
}
type inventoryFieldDiagSummary struct {
ModsRuns []struct {
ModsHeader []struct {
GPUName string `json:"GpuName"`
BoardInfo string `json:"BoardInfo"`
} `json:"ModsHeader"`
} `json:"ModsRuns"`
}
var hardcodedSKUToFileMap = map[string]string{
"G520-0200": "sku_hgx-h100-8-gpu_80g_aircooled_field.json",
"G520-0201": "sku_hgx-h100-8-gpu_80g_aircooled_field.json",
"G520-0202": "sku_hgx-h100-8-gpu_80g_tpol_field.json",
"G520-0203": "sku_hgx-h100-8-gpu_80g_tpol_field.json",
"G520-0205": "sku_hgx-h800-8-gpu_80g_aircooled_field.json",
"G520-0207": "sku_hgx-h800-8-gpu_80g_tpol_field.json",
"G520-0221": "sku_hgx-h100-8-gpu_96g_aircooled_field.json",
"G520-0236": "sku_hgx-h20-8-gpu_96g_aircooled_field.json",
"G520-0238": "sku_hgx-h20-8-gpu_96g_tpol_field.json",
"G520-0266": "sku_hgx-h20-8-gpu_141g_aircooled_field.json",
"G520-0280": "sku_hgx-h200-8-gpu_141g_aircooled_field.json",
"G520-0282": "sku_hgx-h200-8-gpu_141g_tpol_field.json",
"G520-0292": "sku_hgx-h100-8-gpu_sku_292_field.json",
}
// ApplyGPUModelsFromSKU updates GPU model names using SKU mapping from testspec.json.
// Mapping source:
// - inventory/fieldiag_summary.json: GPUName -> BoardInfo(SKU)
// - hardcoded SKU mapping
// - testspec.json: SKU -> sku_hgx-... filename (fallback for unknown hardcoded SKU)
// - inforom/*/fieldiag.jso: product_part_num (full P/N with embedded SKU)
// - testspec.json gpu_fieldiag.mods_mapping: DeviceID -> GPU generation (last fallback for description)
func ApplyGPUModelsFromSKU(files []parser.ExtractedFile, result *models.AnalysisResult) {
if result == nil || result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
return
}
skuToFile := parseSKUToFileMap(files)
generationByDeviceID := parseGenerationByDeviceID(files)
serialToSKU, slotToSKU, serialToPartNumber, slotToPartNumber := parseGPUSKUMapping(files)
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
slot := strings.TrimSpace(gpu.Slot)
serial := strings.TrimSpace(gpu.SerialNumber)
if gpu.PartNumber == "" && serial != "" {
if pn := strings.TrimSpace(serialToPartNumber[serial]); pn != "" {
gpu.PartNumber = pn
}
}
if gpu.PartNumber == "" {
if pn := strings.TrimSpace(slotToPartNumber[slot]); pn != "" {
gpu.PartNumber = pn
}
}
if partNumber := strings.TrimSpace(gpu.PartNumber); partNumber != "" {
gpu.Model = partNumber
}
sku := extractSKUFromPartNumber(gpu.PartNumber)
if sku == "" && serial != "" {
sku = serialToSKU[serial]
}
if sku == "" {
sku = slotToSKU[slot]
}
if sku != "" {
if desc := resolveDescriptionFromSKU(sku, skuToFile); desc != "" {
gpu.Description = desc
continue
}
}
if gen := resolveGenerationDescription(gpu.DeviceID, generationByDeviceID); gen != "" {
gpu.Description = gen
}
}
}
func parseSKUToFileMap(files []parser.ExtractedFile) map[string]string {
result := make(map[string]string, len(hardcodedSKUToFileMap))
for sku, file := range hardcodedSKUToFileMap {
result[normalizeSKUCode(sku)] = strings.TrimSpace(file)
}
specFile := parser.FindFileByName(files, "testspec.json")
if specFile == nil {
return result
}
var spec testSpecData
if err := json.Unmarshal(specFile.Content, &spec); err != nil {
return result
}
for _, action := range spec.Actions {
for sku, file := range action.Args.SKUToFile {
normSKU := normalizeSKUCode(sku)
if normSKU == "" {
continue
}
// Priority: hardcoded mapping wins, testspec extends unknown SKU list.
if _, exists := result[normSKU]; !exists {
result[normSKU] = strings.TrimSpace(file)
}
}
}
return result
}
func parseGenerationByDeviceID(files []parser.ExtractedFile) map[string]string {
specFile := parser.FindFileByName(files, "testspec.json")
if specFile == nil {
return nil
}
var spec testSpecData
if err := json.Unmarshal(specFile.Content, &spec); err != nil {
return nil
}
familyToGeneration := make(map[string]string)
deviceToGeneration := make(map[string]string)
for _, action := range spec.Actions {
if strings.TrimSpace(strings.ToLower(action.VirtualID)) != "gpu_fieldiag" {
continue
}
for key, raw := range action.Args.ModsMapping {
if strings.HasPrefix(key, "#mods.") {
family := strings.TrimSpace(strings.TrimPrefix(key, "#mods."))
if family == "" {
continue
}
var generation string
if err := json.Unmarshal(raw, &generation); err == nil {
generation = strings.TrimSpace(generation)
if generation != "" {
familyToGeneration[family] = generation
}
}
}
}
for key, raw := range action.Args.ModsMapping {
family := strings.TrimSpace(key)
if family == "" || strings.HasPrefix(family, "#") {
continue
}
generation := strings.TrimSpace(familyToGeneration[family])
if generation == "" {
continue
}
var deviceIDs []string
if err := json.Unmarshal(raw, &deviceIDs); err != nil {
continue
}
for _, id := range deviceIDs {
norm := normalizeDeviceIDHex(id)
if norm != "" {
deviceToGeneration[norm] = generation
}
}
}
}
return deviceToGeneration
}
func parseGPUSKUMapping(files []parser.ExtractedFile) (map[string]string, map[string]string, map[string]string, map[string]string) {
serialToSKU := make(map[string]string)
slotToSKU := make(map[string]string)
serialToPartNumber := make(map[string]string)
slotToPartNumber := make(map[string]string)
// 1) inventory/fieldiag_summary.json mapping (GPUName/BoardInfo).
var summaryFile *parser.ExtractedFile
for _, f := range files {
path := strings.ToLower(f.Path)
if strings.Contains(path, "inventory/fieldiag_summary.json") ||
strings.Contains(path, "inventory\\fieldiag_summary.json") {
summaryFile = &f
break
}
}
if summaryFile == nil {
// Continue: inforom may still contain usable part numbers.
} else {
var summaries []inventoryFieldDiagSummary
if err := json.Unmarshal(summaryFile.Content, &summaries); err == nil {
for _, summary := range summaries {
addSummaryMapping(summary, serialToSKU, slotToSKU)
}
} else {
var summary inventoryFieldDiagSummary
if err := json.Unmarshal(summaryFile.Content, &summary); err == nil {
addSummaryMapping(summary, serialToSKU, slotToSKU)
}
}
}
// 2) inforom/checkinforom fieldiag.jso mapping (full product_part_num).
for _, f := range files {
path := strings.TrimSpace(f.Path)
m := inforomPathRegex.FindStringSubmatch(path)
if len(m) == 0 {
continue
}
slot := "GPU" + strings.ToUpper(strings.TrimSpace(m[2])) // SXM7 -> GPUSXM7
serialFromPath := strings.TrimSpace(m[4])
productPNMatch := inforomProductPNRegex.FindSubmatch(f.Content)
if len(productPNMatch) == 2 {
partNumber := strings.TrimSpace(string(productPNMatch[1]))
if partNumber != "" {
slotToPartNumber[slot] = partNumber
if serialFromPath != "" {
serialToPartNumber[serialFromPath] = partNumber
}
if sku := extractSKUFromPartNumber(partNumber); sku != "" {
slotToSKU[slot] = sku
if serialFromPath != "" {
serialToSKU[serialFromPath] = sku
}
}
}
}
serialMatch := inforomSerialRegex.FindSubmatch(f.Content)
if len(serialMatch) == 2 {
serial := strings.TrimSpace(string(serialMatch[1]))
if serial != "" {
if sku := slotToSKU[slot]; sku != "" {
serialToSKU[serial] = sku
}
if pn := slotToPartNumber[slot]; pn != "" {
serialToPartNumber[serial] = pn
}
}
}
}
return serialToSKU, slotToSKU, serialToPartNumber, slotToPartNumber
}
func addSummaryMapping(summary inventoryFieldDiagSummary, serialToSKU map[string]string, slotToSKU map[string]string) {
for _, run := range summary.ModsRuns {
for _, h := range run.ModsHeader {
sku := normalizeSKUCode(h.BoardInfo)
if sku == "" {
continue
}
gpuName := strings.TrimSpace(h.GPUName)
if matches := gpuNameWithSerialRegex.FindStringSubmatch(gpuName); len(matches) == 3 {
slotToSKU["GPUSXM"+matches[1]] = sku
serialToSKU[strings.TrimSpace(matches[2])] = sku
continue
}
if matches := gpuNameSlotOnlyRegex.FindStringSubmatch(gpuName); len(matches) == 2 {
slotToSKU["GPUSXM"+matches[1]] = sku
}
}
}
}
func resolveDescriptionFromSKU(sku string, skuToFile map[string]string) string {
file := strings.ToLower(strings.TrimSpace(skuToFile[normalizeSKUCode(sku)]))
if file == "" {
return ""
}
return skuFilenameToDescription(file)
}
func normalizeSKUCode(v string) string {
s := strings.TrimSpace(strings.ToUpper(v))
if s == "" {
return ""
}
if m := skuCodeRegex.FindStringSubmatch(s); len(m) == 3 {
return m[1] + "-" + m[2]
}
return s
}
func extractSKUFromPartNumber(partNumber string) string {
s := strings.TrimSpace(strings.ToUpper(partNumber))
if s == "" {
return ""
}
if m := skuCodeInsideRegex.FindStringSubmatch(s); len(m) == 3 {
return m[1] + "-" + m[2]
}
return ""
}
func skuFilenameToDescription(file string) string {
s := strings.TrimSpace(strings.ToLower(file))
if s == "" {
return ""
}
s = strings.TrimSuffix(s, ".json")
s = strings.TrimSuffix(s, "_field")
s = strings.TrimPrefix(s, "sku_")
s = strings.ReplaceAll(s, "-", " ")
s = strings.ReplaceAll(s, "_", " ")
s = strings.Join(strings.Fields(s), " ")
return strings.TrimSpace(s)
}
func resolveGenerationDescription(deviceID int, deviceToGeneration map[string]string) string {
if deviceID <= 0 || len(deviceToGeneration) == 0 {
return ""
}
return strings.TrimSpace(deviceToGeneration[normalizeDeviceIDHex(strconv.FormatInt(int64(deviceID), 16))])
}
func normalizeDeviceIDHex(v string) string {
s := strings.TrimSpace(strings.ToLower(v))
s = strings.TrimPrefix(s, "0x")
if s == "" {
return ""
}
n, err := strconv.ParseUint(s, 16, 32)
if err != nil {
return ""
}
return "0x" + strings.ToLower(strconv.FormatUint(n, 16))
}

View File

@@ -0,0 +1,207 @@
package nvidia
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestApplyGPUModelsFromSKU(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "inventory/fieldiag_summary.json",
Content: []byte(`{
"ModsRuns":[
{"ModsHeader":[
{"GpuName":"SXM5_SN_1653925025497","BoardInfo":"G520-0280"}
]}
]
}`),
},
{
Path: "testspec.json",
Content: []byte(`{
"actions":[
{
"virtual_id":"inventory",
"args":{
"sku_to_sku_json_file_map":{
"G520-0280":"sku_hgx-h200-8-gpu_141g_aircooled_field.json"
}
}
}
]
}`),
},
}
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
SerialNumber: "1653925025497",
Model: "NVIDIA Device 2335",
},
},
},
}
ApplyGPUModelsFromSKU(files, result)
if got := result.Hardware.GPUs[0].Model; got != "NVIDIA Device 2335" {
t.Fatalf("expected model NVIDIA Device 2335, got %q", got)
}
if got := result.Hardware.GPUs[0].Description; got != "hgx h200 8 gpu 141g aircooled" {
t.Fatalf("expected description hgx h200 8 gpu 141g aircooled, got %q", got)
}
}
func TestApplyGPUModelsFromSKU_FromPartNumber(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "inforom/SXM5/fieldiag.jso",
Content: []byte(`[
[
{
"__tag__":"inforom",
"serial_number":"1653925025497",
"product_part_num":"692-2G520-0280-501"
}
]
]`),
},
{
Path: "testspec.json",
Content: []byte(`{
"actions":[
{
"virtual_id":"inventory",
"args":{
"sku_to_sku_json_file_map":{
"G520-0280":"sku_hgx-h200-8-gpu_141g_aircooled_field.json"
}
}
}
]
}`),
},
}
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
SerialNumber: "1653925025497",
Model: "NVIDIA Device 2335",
},
},
},
}
ApplyGPUModelsFromSKU(files, result)
if got := result.Hardware.GPUs[0].Model; got != "692-2G520-0280-501" {
t.Fatalf("expected model 692-2G520-0280-501, got %q", got)
}
if got := result.Hardware.GPUs[0].PartNumber; got != "692-2G520-0280-501" {
t.Fatalf("expected part number 692-2G520-0280-501, got %q", got)
}
if got := result.Hardware.GPUs[0].Description; got != "hgx h200 8 gpu 141g aircooled" {
t.Fatalf("expected description hgx h200 8 gpu 141g aircooled, got %q", got)
}
}
func TestApplyGPUModelsFromSKU_FieldDiagSummaryArrayFormat(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "inventory/fieldiag_summary.json",
Content: []byte(`[
{
"ModsRuns":[
{"ModsHeader":[
{"GpuName":"SXM5_SN_1653925025497","BoardInfo":"G520-0280"}
]}
]
}
]`),
},
{
Path: "testspec.json",
Content: []byte(`{
"actions":[
{
"virtual_id":"inventory",
"args":{
"sku_to_sku_json_file_map":{
"G520-0280":"sku_hgx-h200-8-gpu_141g_aircooled_field.json"
}
}
}
]
}`),
},
}
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
SerialNumber: "1653925025497",
Model: "NVIDIA Device 2335",
},
},
},
}
ApplyGPUModelsFromSKU(files, result)
if got := result.Hardware.GPUs[0].Model; got != "NVIDIA Device 2335" {
t.Fatalf("expected model NVIDIA Device 2335, got %q", got)
}
if got := result.Hardware.GPUs[0].Description; got != "hgx h200 8 gpu 141g aircooled" {
t.Fatalf("expected description hgx h200 8 gpu 141g aircooled, got %q", got)
}
}
func TestApplyGPUModelsFromSKU_FallbackToGenerationFromModsMapping(t *testing.T) {
files := []parser.ExtractedFile{
{
Path: "testspec.json",
Content: []byte(`{
"actions":[
{
"virtual_id":"gpu_fieldiag",
"args":{
"mods_mapping":{
"#mods.525":"Hopper",
"525":["0x2335"]
}
}
}
]
}`),
},
}
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
Model: "NVIDIA Device 2335",
DeviceID: 0x2335,
},
},
},
}
ApplyGPUModelsFromSKU(files, result)
if got := result.Hardware.GPUs[0].Description; got != "Hopper" {
t.Fatalf("expected description Hopper, got %q", got)
}
}

View File

@@ -0,0 +1,155 @@
package nvidia
import (
"bufio"
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var (
// Regex to extract devname mappings from fieldiag command line
// Example: "devname=0000:ba:00.0,SXM5_SN_1653925027099"
devnameRegex = regexp.MustCompile(`devname=([\da-fA-F:\.]+),(\w+)`)
// Regex to capture BDF from commands like:
// "$ lspci -vvvs 0000:05:00.0" or "$ lspci -vvs 0000:05:00.0"
lspciBDFRegex = regexp.MustCompile(`^\$\s+lspci\s+-[^\s]*\s+([0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}\.[0-7])\s*$`)
// Example: "Capabilities: [2f0 v1] Device Serial Number 99-d3-61-c8-ac-2d-b0-48"
deviceSerialRegex = regexp.MustCompile(`Device Serial Number\s+([0-9a-fA-F\-:]+)`)
)
// ParseInventoryLog parses inventory/output.log to extract GPU serial numbers
// from fieldiag devname parameters (e.g., "SXM5_SN_1653925027099")
func ParseInventoryLog(content []byte, result *models.AnalysisResult) error {
if result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
// No GPUs to update
return nil
}
scanner := bufio.NewScanner(strings.NewReader(string(content)))
// First pass: build mapping of PCI BDF -> Slot name and serial number from fieldiag command line
pciToSlot := make(map[string]string)
pciToSerial := make(map[string]string)
for scanner.Scan() {
line := scanner.Text()
// Look for fieldiag command with devname parameters
if strings.Contains(line, "devname=") && strings.Contains(line, "fieldiag") {
matches := devnameRegex.FindAllStringSubmatch(line, -1)
for _, match := range matches {
if len(match) == 3 {
pciBDF := match[1]
slotName := match[2]
// Extract slot number and serial from name like "SXM5_SN_1653925027099"
if strings.HasPrefix(slotName, "SXM") {
parts := strings.Split(slotName, "_")
if len(parts) >= 1 {
// Convert "SXM5" to "GPUSXM5"
slot := "GPU" + parts[0]
pciToSlot[pciBDF] = slot
}
// Extract serial number from "SXM5_SN_1653925027099"
if len(parts) == 3 && parts[1] == "SN" {
serial := parts[2]
pciToSerial[pciBDF] = serial
}
}
}
}
}
}
// Second pass: assign serial numbers to GPUs based on slot mapping
for i := range result.Hardware.GPUs {
slot := result.Hardware.GPUs[i].Slot
// Find the PCI BDF for this slot
var foundSerial string
for pciBDF, mappedSlot := range pciToSlot {
if mappedSlot == slot {
// Found matching slot, get serial number
if serial, ok := pciToSerial[pciBDF]; ok {
foundSerial = serial
break
}
}
}
if foundSerial != "" {
result.Hardware.GPUs[i].SerialNumber = foundSerial
}
}
// Third pass: parse lspci "Device Serial Number" by BDF (useful for NVSwitch serials).
bdfToDeviceSerial := make(map[string]string)
currentBDF := ""
scanner = bufio.NewScanner(strings.NewReader(string(content)))
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
if m := lspciBDFRegex.FindStringSubmatch(line); len(m) == 2 {
currentBDF = strings.ToLower(strings.TrimSpace(m[1]))
continue
}
if currentBDF == "" {
continue
}
if m := deviceSerialRegex.FindStringSubmatch(line); len(m) == 2 {
serial := strings.TrimSpace(m[1])
if serial != "" {
bdfToDeviceSerial[currentBDF] = serial
}
currentBDF = ""
}
}
// Apply to PCIe devices first (includes NVSwitch).
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
if strings.TrimSpace(dev.SerialNumber) != "" {
continue
}
bdf := strings.ToLower(strings.TrimSpace(dev.BDF))
if bdf == "" {
continue
}
if serial := bdfToDeviceSerial[bdf]; serial != "" {
dev.SerialNumber = serial
}
}
// Apply to GPUs only if GPU serial is still empty (do not overwrite prod serial from devname).
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
if strings.TrimSpace(gpu.SerialNumber) != "" {
continue
}
bdf := strings.ToLower(strings.TrimSpace(gpu.BDF))
if bdf == "" {
continue
}
if serial := bdfToDeviceSerial[bdf]; serial != "" {
gpu.SerialNumber = serial
}
}
return scanner.Err()
}
// findInventoryOutputLog finds the inventory/output.log file
func findInventoryOutputLog(files []parser.ExtractedFile) *parser.ExtractedFile {
for _, f := range files {
// Look for inventory/output.log
path := strings.ToLower(f.Path)
if strings.Contains(path, "inventory/output.log") ||
strings.Contains(path, "inventory\\output.log") {
return &f
}
}
return nil
}

View File

@@ -0,0 +1,126 @@
package nvidia
import (
"os"
"path/filepath"
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestParseInventoryLog(t *testing.T) {
// Test with the real archive
archivePath := filepath.Join("../../../../example", "A514359X5A09844_logs-20260115-151707.tar")
// Check if file exists
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
// Extract files from archive
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
// Find inventory/output.log
var inventoryLog *parser.ExtractedFile
for _, f := range files {
if strings.Contains(f.Path, "inventory/output.log") {
inventoryLog = &f
break
}
}
if inventoryLog == nil {
t.Fatal("inventory/output.log not found")
}
content := string(inventoryLog.Content)
// Test devname regex - this extracts both slot mapping and serial numbers
t.Log("Testing devname extraction:")
lines := strings.Split(content, "\n")
serialCount := 0
for i, line := range lines {
if strings.Contains(line, "devname=") && strings.Contains(line, "fieldiag") {
t.Logf("Line %d: Found fieldiag command", i)
matches := devnameRegex.FindAllStringSubmatch(line, -1)
t.Logf(" Found %d devname matches", len(matches))
for _, match := range matches {
if len(match) == 3 {
pciBDF := match[1]
slotName := match[2]
t.Logf(" PCI: %s -> Slot: %s", pciBDF, slotName)
// Extract serial number from slot name
if strings.HasPrefix(slotName, "SXM") {
parts := strings.Split(slotName, "_")
if len(parts) == 3 && parts[1] == "SN" {
serial := parts[2]
t.Logf(" Serial: %s", serial)
serialCount++
}
}
}
}
break
}
}
t.Logf("\nTotal GPU serials extracted: %d", serialCount)
if serialCount == 0 {
t.Error("Expected to find GPU serial numbers, but found none")
}
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
func TestParseInventoryLog_AssignsNVSwitchSerialByBDF(t *testing.T) {
content := []byte(`
$ lspci -vvvs 0000:05:00.0
05:00.0 Bridge: NVIDIA Corporation Device 22a3 (rev a1)
Capabilities: [2f0 v1] Device Serial Number 99-d3-61-c8-ac-2d-b0-48
/tmp/fieldiag devname=0000:ba:00.0,SXM5_SN_1653925025497 fieldiag
`)
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
BDF: "0000:ba:00.0",
SerialNumber: "",
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "NVSWITCH0",
BDF: "0000:05:00.0",
SerialNumber: "",
},
},
},
}
if err := ParseInventoryLog(content, result); err != nil {
t.Fatalf("ParseInventoryLog failed: %v", err)
}
if got := result.Hardware.PCIeDevices[0].SerialNumber; got != "99-d3-61-c8-ac-2d-b0-48" {
t.Fatalf("expected NVSwitch serial 99-d3-61-c8-ac-2d-b0-48, got %q", got)
}
// GPU serial should come from fieldiag devname mapping.
if got := result.Hardware.GPUs[0].SerialNumber; got != "1653925025497" {
t.Fatalf("expected GPU serial 1653925025497, got %q", got)
}
}

View File

@@ -0,0 +1,370 @@
package nvidia
import (
"bufio"
"fmt"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
var (
nvflashAdapterRegex = regexp.MustCompile(`^Adapter:\s+.+\(([\da-fA-F]+),([\da-fA-F]+),([\da-fA-F]+),([\da-fA-F]+)\)\s+S:([0-9A-Fa-f]{2}),B:([0-9A-Fa-f]{2}),D:([0-9A-Fa-f]{2}),F:([0-9A-Fa-f])`)
gpuPCIIDRegex = regexp.MustCompile(`^GPU_SXM(\d+)_PCIID:\s*(\S+)$`)
nvsPCIIDRegex = regexp.MustCompile(`^NVSWITCH_NVSWITCH(\d+)_PCIID:\s*(\S+)$`)
)
var nvswitchProjectToPartNumber = map[string]string{
"5612-0002": "965-25612-0002-000",
}
type nvflashDeviceRecord struct {
BDF string
VendorID int
DeviceID int
SSVendorID int
SSDeviceID int
Version string
BoardID string
HierarchyID string
ChipSKU string
Project string
}
// ParseNVFlashVerboseLog parses inventory/nvflash_verbose.log and applies firmware versions
// to already discovered devices using PCI BDF with optional ID checks.
func ParseNVFlashVerboseLog(content []byte, result *models.AnalysisResult) error {
if result == nil || result.Hardware == nil {
return nil
}
records := parseNVFlashRecords(content)
if len(records) == 0 {
return nil
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
bdf := normalizePCIBDF(gpu.BDF)
if bdf == "" {
continue
}
rec, ok := records[bdf]
if !ok {
continue
}
if gpu.DeviceID != 0 && rec.DeviceID != 0 && gpu.DeviceID != rec.DeviceID {
continue
}
if gpu.VendorID != 0 && rec.VendorID != 0 && gpu.VendorID != rec.VendorID {
continue
}
if strings.TrimSpace(rec.Version) != "" {
gpu.Firmware = strings.TrimSpace(rec.Version)
}
}
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
bdf := normalizePCIBDF(dev.BDF)
if bdf == "" {
continue
}
rec, ok := records[bdf]
if !ok {
continue
}
if dev.DeviceID != 0 && rec.DeviceID != 0 && dev.DeviceID != rec.DeviceID {
continue
}
if dev.VendorID != 0 && rec.VendorID != 0 && dev.VendorID != rec.VendorID {
continue
}
if strings.EqualFold(strings.TrimSpace(dev.DeviceClass), "NVSwitch") || strings.HasPrefix(strings.ToUpper(strings.TrimSpace(dev.Slot)), "NVSWITCH") {
if mappedPN := mapNVSwitchPartNumberByProject(rec.Project); mappedPN != "" {
dev.PartNumber = mappedPN
}
}
if strings.TrimSpace(rec.Version) != "" && strings.TrimSpace(dev.PartNumber) == "" {
// Fallback for non-NVSwitch devices where part number is unknown.
dev.PartNumber = strings.TrimSpace(rec.Version)
}
}
appendNVFlashFirmwareEntries(result, records)
return nil
}
// ApplyInventoryPCIIDs enriches devices with PCI BDFs from inventory/inventory.log.
func ApplyInventoryPCIIDs(content []byte, result *models.AnalysisResult) error {
if result == nil || result.Hardware == nil {
return nil
}
slotToBDF := parseInventoryPCIIDs(content)
if len(slotToBDF) == 0 {
return nil
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
if strings.TrimSpace(gpu.BDF) != "" {
continue
}
if bdf := slotToBDF[strings.TrimSpace(gpu.Slot)]; bdf != "" {
gpu.BDF = bdf
}
}
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
if strings.TrimSpace(dev.BDF) != "" {
continue
}
if bdf := slotToBDF[normalizeNVSwitchSlot(strings.TrimSpace(dev.Slot))]; bdf != "" {
dev.BDF = bdf
}
}
return nil
}
func parseNVFlashRecords(content []byte) map[string]nvflashDeviceRecord {
scanner := bufio.NewScanner(strings.NewReader(string(content)))
records := make(map[string]nvflashDeviceRecord)
var current *nvflashDeviceRecord
commit := func() {
if current == nil {
return
}
if current.BDF == "" || strings.TrimSpace(current.Version) == "" {
return
}
records[current.BDF] = *current
}
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
if m := nvflashAdapterRegex.FindStringSubmatch(line); len(m) == 9 {
commit()
vendorID, _ := parseHexInt(m[1])
deviceID, _ := parseHexInt(m[2])
ssVendorID, _ := parseHexInt(m[3])
ssDeviceID, _ := parseHexInt(m[4])
current = &nvflashDeviceRecord{
BDF: fmt.Sprintf("0000:%s:%s.%s", strings.ToLower(m[6]), strings.ToLower(m[7]), strings.ToLower(m[8])),
VendorID: vendorID,
DeviceID: deviceID,
SSVendorID: ssVendorID,
SSDeviceID: ssDeviceID,
}
continue
}
if current == nil {
continue
}
if !strings.Contains(line, ":") {
continue
}
parts := strings.SplitN(line, ":", 2)
key := strings.TrimSpace(parts[0])
val := strings.TrimSpace(parts[1])
if key == "" || val == "" {
continue
}
switch key {
case "Version":
current.Version = val
case "Board ID":
current.BoardID = strings.ToLower(strings.TrimPrefix(val, "0x"))
case "Vendor ID":
if v, err := parseHexInt(val); err == nil {
current.VendorID = v
}
case "Device ID":
if v, err := parseHexInt(val); err == nil {
current.DeviceID = v
}
case "Hierarchy ID":
current.HierarchyID = val
case "Chip SKU":
current.ChipSKU = val
case "Project":
current.Project = val
}
}
commit()
return records
}
func parseInventoryPCIIDs(content []byte) map[string]string {
scanner := bufio.NewScanner(strings.NewReader(string(content)))
slotToBDF := make(map[string]string)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
if m := gpuPCIIDRegex.FindStringSubmatch(line); len(m) == 3 {
slotToBDF["GPUSXM"+m[1]] = normalizePCIBDF(m[2])
continue
}
if m := nvsPCIIDRegex.FindStringSubmatch(line); len(m) == 3 {
slotToBDF["NVSWITCH"+m[1]] = normalizePCIBDF(m[2])
}
}
return slotToBDF
}
func normalizePCIBDF(v string) string {
s := strings.TrimSpace(strings.ToLower(v))
if s == "" {
return ""
}
// bus:device.func -> 0000:bus:device.func
short := regexp.MustCompile(`^([0-9a-f]{2}:[0-9a-f]{2}\.[0-7])$`)
if m := short.FindStringSubmatch(s); len(m) == 2 {
return "0000:" + m[1]
}
full := regexp.MustCompile(`^([0-9a-f]{4}:[0-9a-f]{2}:[0-9a-f]{2}\.[0-7])$`)
if m := full.FindStringSubmatch(s); len(m) == 2 {
return m[1]
}
return s
}
func parseHexInt(v string) (int, error) {
s := strings.TrimSpace(strings.ToLower(v))
s = strings.TrimPrefix(s, "0x")
if s == "" {
return 0, fmt.Errorf("empty hex value")
}
n, err := strconv.ParseInt(s, 16, 32)
if err != nil {
return 0, err
}
return int(n), nil
}
func findNVFlashVerboseLog(files []parser.ExtractedFile) *parser.ExtractedFile {
for _, f := range files {
path := strings.ToLower(f.Path)
if strings.Contains(path, "inventory/nvflash_verbose.log") ||
strings.Contains(path, "inventory\\nvflash_verbose.log") {
return &f
}
}
return nil
}
func findInventoryInfoLog(files []parser.ExtractedFile) *parser.ExtractedFile {
for _, f := range files {
path := strings.ToLower(f.Path)
if strings.Contains(path, "inventory/inventory.log") ||
strings.Contains(path, "inventory\\inventory.log") {
return &f
}
}
return nil
}
func appendNVFlashFirmwareEntries(result *models.AnalysisResult, records map[string]nvflashDeviceRecord) {
if result == nil || result.Hardware == nil {
return
}
if result.Hardware.Firmware == nil {
result.Hardware.Firmware = make([]models.FirmwareInfo, 0)
}
seen := make(map[string]struct{})
for _, fw := range result.Hardware.Firmware {
key := strings.ToLower(strings.TrimSpace(fw.DeviceName)) + "|" + strings.TrimSpace(fw.Version)
seen[key] = struct{}{}
}
for _, gpu := range result.Hardware.GPUs {
version := strings.TrimSpace(gpu.Firmware)
if version == "" {
continue
}
model := strings.TrimSpace(gpu.PartNumber)
if model == "" {
model = strings.TrimSpace(gpu.Model)
}
if model == "" {
model = strings.TrimSpace(gpu.Slot)
}
deviceName := fmt.Sprintf("GPU %s (%s)", strings.TrimSpace(gpu.Slot), model)
key := strings.ToLower(deviceName) + "|" + version
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
DeviceName: deviceName,
Version: version,
})
}
for _, dev := range result.Hardware.PCIeDevices {
bdf := normalizePCIBDF(dev.BDF)
rec, ok := records[bdf]
if !ok {
continue
}
version := strings.TrimSpace(rec.Version)
if version == "" {
continue
}
slot := strings.TrimSpace(dev.Slot)
deviceClass := strings.TrimSpace(dev.DeviceClass)
if strings.EqualFold(deviceClass, "NVSwitch") || strings.HasPrefix(strings.ToUpper(slot), "NVSWITCH") {
model := slot
if pn := strings.TrimSpace(dev.PartNumber); pn != "" {
model = pn
}
deviceName := fmt.Sprintf("NVSwitch %s (%s)", slot, model)
key := strings.ToLower(deviceName) + "|" + version
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
DeviceName: deviceName,
Version: version,
})
}
}
}
func mapNVSwitchPartNumberByProject(project string) string {
key := strings.TrimSpace(strings.ToLower(project))
if key == "" {
return ""
}
return strings.TrimSpace(nvswitchProjectToPartNumber[key])
}

View File

@@ -0,0 +1,93 @@
package nvidia
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestApplyInventoryPCIIDsAndNVFlashFirmware(t *testing.T) {
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{
Slot: "GPUSXM5",
DeviceID: 0x2335,
},
},
PCIeDevices: []models.PCIeDevice{
{
Slot: "NVSWITCHNVSWITCH2",
DeviceID: 0x22a3,
},
},
},
}
inventoryLog := []byte(`
GPU_SXM5_PCIID: 0000:ba:00.0
NVSWITCH_NVSWITCH2_PCIID: 0000:07:00.0
`)
nvflashLog := []byte(`
Adapter: Graphics Device (10DE,2335,10DE,18BE) S:00,B:BA,D:00,F:00
Version : 96.00.D0.00.03
Board ID : 0x053C
Vendor ID : 0x10DE
Device ID : 0x2335
Hierarchy ID : Normal Board
Chip SKU : 895-0
Project : G520-0280
Adapter: Graphics Device (10DE,22A3,10DE,1796) S:00,B:07,D:00,F:00
Version : 96.10.6D.00.01
Board ID : 0x03B7
Vendor ID : 0x10DE
Device ID : 0x22A3
Hierarchy ID : Normal Board
Chip SKU : 890-0
Project : 5612-0002
`)
if err := ApplyInventoryPCIIDs(inventoryLog, result); err != nil {
t.Fatalf("ApplyInventoryPCIIDs failed: %v", err)
}
if err := ParseNVFlashVerboseLog(nvflashLog, result); err != nil {
t.Fatalf("ParseNVFlashVerboseLog failed: %v", err)
}
if got := result.Hardware.GPUs[0].BDF; got != "0000:ba:00.0" {
t.Fatalf("expected GPU BDF 0000:ba:00.0, got %q", got)
}
if got := result.Hardware.GPUs[0].Firmware; got != "96.00.D0.00.03" {
t.Fatalf("expected GPU firmware 96.00.D0.00.03, got %q", got)
}
if got := result.Hardware.PCIeDevices[0].BDF; got != "0000:07:00.0" {
t.Fatalf("expected NVSwitch BDF 0000:07:00.0, got %q", got)
}
if got := result.Hardware.PCIeDevices[0].PartNumber; got != "965-25612-0002-000" {
t.Fatalf("expected NVSwitch part number 965-25612-0002-000, got %q", got)
}
if len(result.Hardware.Firmware) == 0 {
t.Fatalf("expected firmware entries to be populated from nvflash log")
}
hasGPUFW := false
hasNVSwitchFW := false
for _, fw := range result.Hardware.Firmware {
if fw.Version == "96.00.D0.00.03" {
hasGPUFW = true
}
if fw.Version == "96.10.6D.00.01" {
hasNVSwitchFW = true
}
}
if !hasGPUFW {
t.Fatalf("expected GPU firmware version 96.00.D0.00.03 in hardware firmware list")
}
if !hasNVSwitchFW {
t.Fatalf("expected NVSwitch firmware version 96.10.6D.00.01 in hardware firmware list")
}
}

View File

@@ -14,7 +14,7 @@ import (
// parserVersion - version of this parser module
// IMPORTANT: Increment this version when making changes to parser logic!
const parserVersion = "1.1.0"
const parserVersion = "1.4"
func init() {
parser.Register(&Parser{})
@@ -70,7 +70,7 @@ func (p *Parser) Detect(files []parser.ExtractedFile) int {
if strings.HasSuffix(path, "output.log") {
// Check if it contains dmidecode output
if strings.Contains(string(f.Content), "dmidecode") ||
strings.Contains(string(f.Content), "System Information") {
strings.Contains(string(f.Content), "System Information") {
confidence += 10
}
}
@@ -105,6 +105,9 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
result.Hardware = &models.HardwareConfig{
GPUs: make([]models.GPU, 0),
}
gpuStatuses := make(map[string]string)
gpuFailureDetails := make(map[string]string)
nvswitchStatuses := make(map[string]string)
// Parse output.log first (contains dmidecode system info)
// Find the output.log file that contains dmidecode output
@@ -124,18 +127,75 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
}
}
// Parse inventory/output.log (contains GPU serial numbers from lspci)
inventoryLogFile := findInventoryOutputLog(files)
if inventoryLogFile != nil {
if err := ParseInventoryLog(inventoryLogFile.Content, result); err != nil {
// Log error but continue parsing other files
_ = err // Ignore error for now
}
}
// Parse inventory/inventory.log to enrich PCI BDF mapping for components.
inventoryInfoLog := findInventoryInfoLog(files)
if inventoryInfoLog != nil {
if err := ApplyInventoryPCIIDs(inventoryInfoLog.Content, result); err != nil {
_ = err
}
}
// Enhance GPU model names using SKU mapping from testspec + inventory summary.
ApplyGPUModelsFromSKU(files, result)
// Parse inventory/nvflash_verbose.log and apply firmware versions by BDF + IDs.
// This runs after GPU model/part-number enrichment so firmware tab uses final model labels.
nvflashVerbose := findNVFlashVerboseLog(files)
if nvflashVerbose != nil {
if err := ParseNVFlashVerboseLog(nvflashVerbose.Content, result); err != nil {
_ = err
}
}
// Parse summary.json (test results summary)
if f := parser.FindFileByName(files, "summary.json"); f != nil {
events := ParseSummaryJSON(f.Content)
result.Events = append(result.Events, events...)
for componentID, status := range CollectGPUStatusesFromSummaryJSON(f.Content) {
gpuStatuses[componentID] = mergeGPUStatus(gpuStatuses[componentID], status)
}
for slot, status := range CollectNVSwitchStatusesFromSummaryJSON(f.Content) {
nvswitchStatuses[slot] = mergeGPUStatus(nvswitchStatuses[slot], status)
}
for componentID, detail := range CollectGPUFailureDetailsFromSummaryJSON(f.Content) {
if _, exists := gpuFailureDetails[componentID]; !exists && strings.TrimSpace(detail) != "" {
gpuFailureDetails[componentID] = strings.TrimSpace(detail)
}
}
}
// Parse summary.csv (alternative format)
if f := parser.FindFileByName(files, "summary.csv"); f != nil {
csvEvents := ParseSummaryCSV(f.Content)
result.Events = append(result.Events, csvEvents...)
for componentID, status := range CollectGPUStatusesFromSummaryCSV(f.Content) {
gpuStatuses[componentID] = mergeGPUStatus(gpuStatuses[componentID], status)
}
for slot, status := range CollectNVSwitchStatusesFromSummaryCSV(f.Content) {
nvswitchStatuses[slot] = mergeGPUStatus(nvswitchStatuses[slot], status)
}
for componentID, detail := range CollectGPUFailureDetailsFromSummaryCSV(f.Content) {
if _, exists := gpuFailureDetails[componentID]; !exists && strings.TrimSpace(detail) != "" {
gpuFailureDetails[componentID] = strings.TrimSpace(detail)
}
}
}
// Apply per-GPU PASS/FAIL status derived from summary files.
ApplyGPUStatuses(result, gpuStatuses)
ApplyGPUFailureDetails(result, gpuFailureDetails)
ApplyNVSwitchStatuses(result, nvswitchStatuses)
ApplyGPUAndNVSwitchCheckTimes(result, CollectGPUAndNVSwitchCheckTimes(files))
// Parse GPU field diagnostics logs
gpuFieldiagFiles := parser.FindFileByPattern(files, "gpu_fieldiag/", ".log")
for _, f := range gpuFieldiagFiles {
@@ -158,7 +218,7 @@ func findDmidecodeOutputLog(files []parser.ExtractedFile) *parser.ExtractedFile
// Check if it contains dmidecode output
content := string(f.Content)
if strings.Contains(content, "dmidecode") &&
strings.Contains(content, "System Information") {
strings.Contains(content, "System Information") {
return &f
}
}

View File

@@ -0,0 +1,291 @@
package nvidia
import (
"os"
"path/filepath"
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestNVIDIAParser_RealArchive(t *testing.T) {
// Test with the real archive that was reported as problematic
archivePath := filepath.Join("../../../../example", "A514359X5A09844_logs-20260115-151707.tar")
// Check if file exists
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
// Extract files from archive
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
// Check if inventory/output.log exists
hasInventoryLog := false
for _, f := range files {
if filepath.Base(f.Path) == "output.log" {
t.Logf("Found file: %s", f.Path)
}
if f.Path == "./inventory/output.log" || f.Path == "inventory/output.log" {
hasInventoryLog = true
t.Logf("Found inventory/output.log with %d bytes", len(f.Content))
}
}
if !hasInventoryLog {
t.Error("inventory/output.log not found in extracted files")
}
// Create parser and parse
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
// Verify basic system info
if result.Hardware.BoardInfo.Manufacturer == "" {
t.Error("Expected Manufacturer to be set")
}
if result.Hardware.BoardInfo.ProductName == "" {
t.Error("Expected ProductName to be set")
}
if result.Hardware.BoardInfo.SerialNumber == "" {
t.Error("Expected SerialNumber to be set")
}
t.Logf("System Info:")
t.Logf(" Manufacturer: %s", result.Hardware.BoardInfo.Manufacturer)
t.Logf(" Product: %s", result.Hardware.BoardInfo.ProductName)
t.Logf(" Serial: %s", result.Hardware.BoardInfo.SerialNumber)
// Verify GPUs were found
if len(result.Hardware.GPUs) == 0 {
t.Error("Expected to find GPUs")
}
t.Logf("\nFound %d GPUs:", len(result.Hardware.GPUs))
gpusWithSerials := 0
for _, gpu := range result.Hardware.GPUs {
t.Logf(" %s: %s (Firmware: %s, Serial: %s, BDF: %s)",
gpu.Slot, gpu.Model, gpu.Firmware, gpu.SerialNumber, gpu.BDF)
if gpu.SerialNumber != "" {
gpusWithSerials++
}
}
// Verify that GPU serial numbers were extracted
if gpusWithSerials == 0 {
t.Error("Expected at least some GPUs to have serial numbers")
}
t.Logf("\nGPUs with serial numbers: %d/%d", gpusWithSerials, len(result.Hardware.GPUs))
// Check events for SXM2 failures
t.Logf("\nTotal events: %d", len(result.Events))
// Look for the specific serial or SXM2
sxm2Events := 0
for _, event := range result.Events {
desc := event.Description + " " + event.RawData + " " + event.EventType
if contains(desc, "SXM2") || contains(desc, "1653925025827") {
t.Logf(" SXM2 Event: [%s] %s (Severity: %s)", event.EventType, event.Description, event.Severity)
sxm2Events++
}
}
if sxm2Events == 0 {
t.Error("Expected to find events for SXM2 (faulty GPU 1653925025827)")
}
t.Logf("\nSXM2 failure events: %d", sxm2Events)
}
func TestNVIDIAParser_GPUStatusFromSummary_RealArchive07900(t *testing.T) {
archivePath := filepath.Join("../../../../example", "A514359X5A07900_logs-20260122-074208.tar")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
if result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
t.Fatalf("expected GPUs in parsed result")
}
statusBySerial := make(map[string]string, len(result.Hardware.GPUs))
for _, gpu := range result.Hardware.GPUs {
if gpu.SerialNumber != "" {
statusBySerial[gpu.SerialNumber] = gpu.Status
}
}
if got := statusBySerial["1653925025497"]; got != "FAIL" {
t.Fatalf("expected GPU serial 1653925025497 status FAIL, got %q", got)
}
for serial, st := range statusBySerial {
if serial == "1653925025497" {
continue
}
if st != "PASS" {
t.Fatalf("expected non-failing GPU serial %s status PASS, got %q", serial, st)
}
}
}
func TestNVIDIAParser_GPUErrorDetailsFromSummary_RealArchive07900(t *testing.T) {
archivePath := filepath.Join("../../../../example", "A514359X5A07900_logs-20260122-074208.tar")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
if result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
t.Fatalf("expected GPUs in parsed result")
}
errBySerial := make(map[string]string, len(result.Hardware.GPUs))
for _, gpu := range result.Hardware.GPUs {
if gpu.SerialNumber != "" {
errBySerial[gpu.SerialNumber] = gpu.ErrorDescription
}
}
if got := errBySerial["1653925025497"]; got != "Row remapping failed" {
t.Fatalf("expected GPU serial 1653925025497 error Row remapping failed, got %q", got)
}
}
func TestNVIDIAParser_GPUModelFromSKU_RealArchive07900(t *testing.T) {
archivePath := filepath.Join("../../../../example", "A514359X5A07900_logs-20260122-074208.tar")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
if result.Hardware == nil || len(result.Hardware.GPUs) == 0 {
t.Fatalf("expected GPUs in parsed result")
}
found := false
for _, gpu := range result.Hardware.GPUs {
if gpu.Model == "692-2G520-0280-501" && gpu.Description == "hgx h200 8 gpu 141g aircooled" {
found = true
break
}
}
if !found {
t.Fatalf("expected at least one GPU with model 692-2G520-0280-501 and description hgx h200 8 gpu 141g aircooled")
}
}
func TestNVIDIAParser_ComponentCheckTimes_RealArchive07900(t *testing.T) {
archivePath := filepath.Join("../../../../example", "A514359X5A07900_logs-20260122-074208.tar")
if _, err := os.Stat(archivePath); os.IsNotExist(err) {
t.Skip("Test archive not found, skipping test")
}
files, err := parser.ExtractArchive(archivePath)
if err != nil {
t.Fatalf("Failed to extract archive: %v", err)
}
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Failed to parse archive: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware in parsed result")
}
expectedGPU := time.Date(2026, 1, 22, 6, 45, 36, 0, time.UTC)
expectedNVSwitch := time.Date(2026, 1, 22, 6, 11, 32, 0, time.UTC)
if len(result.Hardware.GPUs) == 0 {
t.Fatalf("expected GPUs in parsed result")
}
for _, gpu := range result.Hardware.GPUs {
if !gpu.StatusCheckedAt.Equal(expectedGPU) {
t.Fatalf("expected GPU %s status_checked_at %s, got %s", gpu.Slot, expectedGPU.Format(time.RFC3339), gpu.StatusCheckedAt.Format(time.RFC3339))
}
if gpu.StatusAtCollect == nil || !gpu.StatusAtCollect.At.Equal(expectedGPU) {
t.Fatalf("expected GPU %s status_at_collection.at %s", gpu.Slot, expectedGPU.Format(time.RFC3339))
}
}
nvsCount := 0
for _, dev := range result.Hardware.PCIeDevices {
slot := normalizeNVSwitchSlot(dev.Slot)
if slot == "" {
continue
}
if dev.DeviceClass != "NVSwitch" && len(slot) < len("NVSWITCH") {
continue
}
if dev.DeviceClass != "NVSwitch" && slot[:len("NVSWITCH")] != "NVSWITCH" {
continue
}
nvsCount++
if !dev.StatusCheckedAt.Equal(expectedNVSwitch) {
t.Fatalf("expected NVSwitch %s status_checked_at %s, got %s", dev.Slot, expectedNVSwitch.Format(time.RFC3339), dev.StatusCheckedAt.Format(time.RFC3339))
}
if dev.StatusAtCollect == nil || !dev.StatusAtCollect.At.Equal(expectedNVSwitch) {
t.Fatalf("expected NVSwitch %s status_at_collection.at %s", dev.Slot, expectedNVSwitch.Format(time.RFC3339))
}
}
if nvsCount == 0 {
t.Fatalf("expected NVSwitch devices in parsed result")
}
}
func contains(s, substr string) bool {
return len(s) >= len(substr) && (s == substr || len(s) > len(substr) &&
(s[:len(substr)] == substr || s[len(s)-len(substr):] == substr ||
findSubstring(s, substr)))
}
func findSubstring(s, substr string) bool {
for i := 0; i <= len(s)-len(substr); i++ {
if s[i:i+len(substr)] == substr {
return true
}
}
return false
}

View File

@@ -4,6 +4,7 @@ import (
"encoding/csv"
"encoding/json"
"fmt"
"regexp"
"strings"
"time"
@@ -20,6 +21,9 @@ type SummaryEntry struct {
IgnoreError string `json:"Ignore Error"`
}
var gpuComponentIDRegex = regexp.MustCompile(`^SXM(\d+)_SN_(.+)$`)
var nvswitchInventoryComponentRegex = regexp.MustCompile(`^NVSWITCH_(NVSWITCH\d+)_`)
// ParseSummaryJSON parses summary.json file and returns events
func ParseSummaryJSON(content []byte) []models.Event {
var entries []SummaryEntry
@@ -92,6 +96,340 @@ func ParseSummaryCSV(content []byte) []models.Event {
return events
}
// CollectGPUStatusesFromSummaryJSON extracts per-GPU PASS/FAIL status from summary.json.
// Key format in returned map is component ID from summary (e.g. "SXM5_SN_1653925025497").
func CollectGPUStatusesFromSummaryJSON(content []byte) map[string]string {
var entries []SummaryEntry
if err := json.Unmarshal(content, &entries); err != nil {
return nil
}
statuses := make(map[string]string)
for _, entry := range entries {
component := strings.TrimSpace(entry.ComponentID)
if component == "" || !gpuComponentIDRegex.MatchString(component) {
continue
}
current := statuses[component]
next := "PASS"
if !isSummaryJSONRecordPassing(entry.ErrorCode, entry.Notes) {
next = "FAIL"
}
statuses[component] = mergeGPUStatus(current, next)
}
return statuses
}
// CollectGPUFailureDetailsFromSummaryJSON extracts per-GPU failure details from summary.json.
// Key format in returned map is component ID from summary (e.g. "SXM5_SN_1653925025497").
func CollectGPUFailureDetailsFromSummaryJSON(content []byte) map[string]string {
var entries []SummaryEntry
if err := json.Unmarshal(content, &entries); err != nil {
return nil
}
details := make(map[string]string)
for _, entry := range entries {
component := strings.TrimSpace(entry.ComponentID)
if component == "" || !gpuComponentIDRegex.MatchString(component) {
continue
}
if isSummaryJSONRecordPassing(entry.ErrorCode, entry.Notes) {
continue
}
note := strings.TrimSpace(entry.Notes)
if note == "" || strings.EqualFold(note, "OK") {
note = strings.TrimSpace(entry.ErrorCode)
}
if note == "" {
continue
}
// Keep first non-empty detail to avoid noisy overrides.
if _, exists := details[component]; !exists {
details[component] = note
}
}
return details
}
// CollectGPUStatusesFromSummaryCSV extracts per-GPU PASS/FAIL status from summary.csv.
// Key format in returned map is component ID from summary (e.g. "SXM5_SN_1653925025497").
func CollectGPUStatusesFromSummaryCSV(content []byte) map[string]string {
reader := csv.NewReader(strings.NewReader(string(content)))
records, err := reader.ReadAll()
if err != nil {
return nil
}
statuses := make(map[string]string)
for i, record := range records {
if i == 0 || len(record) < 7 {
continue
}
component := strings.TrimSpace(record[5])
if component == "" || !gpuComponentIDRegex.MatchString(component) {
continue
}
errorCode := strings.TrimSpace(record[0])
notes := strings.TrimSpace(record[6])
current := statuses[component]
next := "PASS"
if !isSummaryCSVRecordPassing(errorCode, notes) {
next = "FAIL"
}
statuses[component] = mergeGPUStatus(current, next)
}
return statuses
}
// CollectNVSwitchStatusesFromSummaryJSON extracts per-NVSwitch PASS/FAIL status from summary.json.
// Key format in returned map is normalized switch slot (e.g. "NVSWITCH0").
func CollectNVSwitchStatusesFromSummaryJSON(content []byte) map[string]string {
var entries []SummaryEntry
if err := json.Unmarshal(content, &entries); err != nil {
return nil
}
statuses := make(map[string]string)
for _, entry := range entries {
component := strings.TrimSpace(entry.ComponentID)
matches := nvswitchInventoryComponentRegex.FindStringSubmatch(component)
if len(matches) != 2 {
continue
}
slot := strings.TrimSpace(matches[1])
if slot == "" {
continue
}
current := statuses[slot]
next := "PASS"
if !isSummaryJSONRecordPassing(entry.ErrorCode, entry.Notes) {
next = "FAIL"
}
statuses[slot] = mergeGPUStatus(current, next)
}
return statuses
}
// CollectNVSwitchStatusesFromSummaryCSV extracts per-NVSwitch PASS/FAIL status from summary.csv.
// Key format in returned map is normalized switch slot (e.g. "NVSWITCH0").
func CollectNVSwitchStatusesFromSummaryCSV(content []byte) map[string]string {
reader := csv.NewReader(strings.NewReader(string(content)))
records, err := reader.ReadAll()
if err != nil {
return nil
}
statuses := make(map[string]string)
for i, record := range records {
if i == 0 || len(record) < 7 {
continue
}
component := strings.TrimSpace(record[5])
matches := nvswitchInventoryComponentRegex.FindStringSubmatch(component)
if len(matches) != 2 {
continue
}
slot := strings.TrimSpace(matches[1])
if slot == "" {
continue
}
errorCode := strings.TrimSpace(record[0])
notes := strings.TrimSpace(record[6])
current := statuses[slot]
next := "PASS"
if !isSummaryCSVRecordPassing(errorCode, notes) {
next = "FAIL"
}
statuses[slot] = mergeGPUStatus(current, next)
}
return statuses
}
// CollectGPUFailureDetailsFromSummaryCSV extracts per-GPU failure details from summary.csv.
// Key format in returned map is component ID from summary (e.g. "SXM5_SN_1653925025497").
func CollectGPUFailureDetailsFromSummaryCSV(content []byte) map[string]string {
reader := csv.NewReader(strings.NewReader(string(content)))
records, err := reader.ReadAll()
if err != nil {
return nil
}
details := make(map[string]string)
for i, record := range records {
if i == 0 || len(record) < 7 {
continue
}
component := strings.TrimSpace(record[5])
if component == "" || !gpuComponentIDRegex.MatchString(component) {
continue
}
errorCode := strings.TrimSpace(record[0])
notes := strings.TrimSpace(record[6])
if isSummaryCSVRecordPassing(errorCode, notes) {
continue
}
note := notes
if note == "" || strings.EqualFold(note, "OK") {
note = errorCode
}
if note == "" {
continue
}
if _, exists := details[component]; !exists {
details[component] = note
}
}
return details
}
func isSummaryJSONRecordPassing(errorCode, notes string) bool {
_ = errorCode
return strings.TrimSpace(notes) == "OK"
}
func isSummaryCSVRecordPassing(errorCode, notes string) bool {
_ = errorCode
return strings.TrimSpace(notes) == "OK"
}
func mergeGPUStatus(current, next string) string {
// FAIL has highest priority.
if current == "FAIL" || next == "FAIL" {
return "FAIL"
}
if current == "PASS" || next == "PASS" {
return "PASS"
}
return ""
}
// ApplyGPUStatuses applies aggregated PASS/FAIL statuses from summary components to parsed GPUs.
func ApplyGPUStatuses(result *models.AnalysisResult, componentStatuses map[string]string) {
if result == nil || result.Hardware == nil || len(result.Hardware.GPUs) == 0 || len(componentStatuses) == 0 {
return
}
slotStatus := make(map[string]string) // key: GPUSXM<idx>
serialStatus := make(map[string]string) // key: GPU serial
for componentID, status := range componentStatuses {
matches := gpuComponentIDRegex.FindStringSubmatch(strings.TrimSpace(componentID))
if len(matches) != 3 {
continue
}
slotKey := "GPUSXM" + matches[1]
serialKey := strings.TrimSpace(matches[2])
slotStatus[slotKey] = mergeGPUStatus(slotStatus[slotKey], status)
if serialKey != "" {
serialStatus[serialKey] = mergeGPUStatus(serialStatus[serialKey], status)
}
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
next := ""
if serial := strings.TrimSpace(gpu.SerialNumber); serial != "" {
next = serialStatus[serial]
}
if next == "" {
next = slotStatus[strings.TrimSpace(gpu.Slot)]
}
if next != "" {
gpu.Status = next
}
}
}
// ApplyNVSwitchStatuses applies aggregated PASS/FAIL statuses from summary components to parsed NVSwitch devices.
func ApplyNVSwitchStatuses(result *models.AnalysisResult, switchStatuses map[string]string) {
if result == nil || result.Hardware == nil || len(result.Hardware.PCIeDevices) == 0 || len(switchStatuses) == 0 {
return
}
for i := range result.Hardware.PCIeDevices {
dev := &result.Hardware.PCIeDevices[i]
slot := normalizeNVSwitchSlot(strings.TrimSpace(dev.Slot))
if slot == "" {
continue
}
if !strings.HasPrefix(strings.ToUpper(slot), "NVSWITCH") {
continue
}
if st := switchStatuses[slot]; st != "" {
dev.Status = st
}
}
}
// ApplyGPUFailureDetails maps parsed failure details from summary components to GPUs.
func ApplyGPUFailureDetails(result *models.AnalysisResult, componentDetails map[string]string) {
if result == nil || result.Hardware == nil || len(result.Hardware.GPUs) == 0 || len(componentDetails) == 0 {
return
}
slotDetails := make(map[string]string) // key: GPUSXM<idx>
serialDetails := make(map[string]string) // key: GPU serial
for componentID, detail := range componentDetails {
matches := gpuComponentIDRegex.FindStringSubmatch(strings.TrimSpace(componentID))
if len(matches) != 3 {
continue
}
detail = strings.TrimSpace(detail)
if detail == "" {
continue
}
slotKey := "GPUSXM" + matches[1]
serialKey := strings.TrimSpace(matches[2])
if _, exists := slotDetails[slotKey]; !exists {
slotDetails[slotKey] = detail
}
if serialKey != "" {
if _, exists := serialDetails[serialKey]; !exists {
serialDetails[serialKey] = detail
}
}
}
for i := range result.Hardware.GPUs {
gpu := &result.Hardware.GPUs[i]
detail := ""
if serial := strings.TrimSpace(gpu.SerialNumber); serial != "" {
detail = serialDetails[serial]
}
if detail == "" {
detail = slotDetails[strings.TrimSpace(gpu.Slot)]
}
if detail != "" {
gpu.ErrorDescription = detail
}
}
}
// formatSummaryDescription creates a human-readable description from summary entry
func formatSummaryDescription(entry SummaryEntry) string {
component := entry.ComponentID

View File

@@ -0,0 +1,122 @@
package nvidia
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestApplyGPUStatuses_FromSummaryCSV_FailAndPass(t *testing.T) {
csvData := strings.Join([]string{
"ErrorCode,Test,VirtualID,SubTest,Type,ComponentID,Notes,Level,,,IgnoreError",
"0,gpumem,gpumem,,GPU,SXM1_SN_111,OK,1,,,False",
"363,gpumem,gpumem,,GPU,SXM5_SN_1653925025497,Row remapping failed,1,,,False",
"0,gpu_fieldiag,gpu_fieldiag,,GPU,SXM1_SN_111,OK,1,,,False",
"0,gpu_fieldiag,gpu_fieldiag,,GPU,SXM2_SN_222,OK,1,,,False",
}, "\n")
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "GPUSXM1", SerialNumber: "111"},
{Slot: "GPUSXM2", SerialNumber: "222"},
{Slot: "GPUSXM5", SerialNumber: "1653925025497"},
},
},
}
statuses := CollectGPUStatusesFromSummaryCSV([]byte(csvData))
ApplyGPUStatuses(result, statuses)
bySerial := map[string]string{}
for _, gpu := range result.Hardware.GPUs {
bySerial[gpu.SerialNumber] = gpu.Status
}
if bySerial["1653925025497"] != "FAIL" {
t.Fatalf("expected serial 1653925025497 status FAIL, got %q", bySerial["1653925025497"])
}
if bySerial["111"] != "PASS" {
t.Fatalf("expected serial 111 status PASS, got %q", bySerial["111"])
}
if bySerial["222"] != "PASS" {
t.Fatalf("expected serial 222 status PASS, got %q", bySerial["222"])
}
}
func TestApplyGPUFailureDetails_FromSummaryJSON_BySerial(t *testing.T) {
jsonData := []byte(`[
{
"Error Code": "005-000-1-000000000363",
"Test": "gpumem",
"Component ID": "SXM5_SN_1653925025497",
"Notes": "Row remapping failed",
"Virtual ID": "gpumem",
"Ignore Error": "False"
}
]`)
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
GPUs: []models.GPU{
{Slot: "GPUSXM5", SerialNumber: "1653925025497"},
{Slot: "GPUSXM2", SerialNumber: "1653925024190"},
},
},
}
details := CollectGPUFailureDetailsFromSummaryJSON(jsonData)
ApplyGPUFailureDetails(result, details)
if got := result.Hardware.GPUs[0].ErrorDescription; got != "Row remapping failed" {
t.Fatalf("expected serial 1653925025497 error Row remapping failed, got %q", got)
}
if got := result.Hardware.GPUs[1].ErrorDescription; got != "" {
t.Fatalf("expected no error description for healthy GPU, got %q", got)
}
}
func TestApplyNVSwitchStatuses_FromSummaryJSON(t *testing.T) {
jsonData := []byte(`[
{
"Error Code": "0",
"Test": "inventory",
"Component ID": "NVSWITCH_NVSWITCH0_VendorID",
"Notes": "OK",
"Virtual ID": "inventory",
"Ignore Error": "False"
},
{
"Error Code": "1",
"Test": "inventory",
"Component ID": "NVSWITCH_NVSWITCH1_LinkState",
"Notes": "Link down",
"Virtual ID": "inventory",
"Ignore Error": "False"
}
]`)
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{
PCIeDevices: []models.PCIeDevice{
{Slot: "NVSWITCH0", Status: "Unknown"},
{Slot: "NVSWITCH1", Status: "Unknown"},
{Slot: "NVSWITCH2", Status: "Unknown"},
},
},
}
statuses := CollectNVSwitchStatusesFromSummaryJSON(jsonData)
ApplyNVSwitchStatuses(result, statuses)
if got := result.Hardware.PCIeDevices[0].Status; got != "PASS" {
t.Fatalf("expected NVSWITCH0 status PASS, got %q", got)
}
if got := result.Hardware.PCIeDevices[1].Status; got != "FAIL" {
t.Fatalf("expected NVSWITCH1 status FAIL, got %q", got)
}
if got := result.Hardware.PCIeDevices[2].Status; got != "Unknown" {
t.Fatalf("expected NVSWITCH2 status unchanged Unknown, got %q", got)
}
}

View File

@@ -3,6 +3,7 @@ package nvidia
import (
"encoding/json"
"fmt"
"regexp"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
@@ -53,6 +54,8 @@ type Property struct {
Value interface{} `json:"value"` // Can be string or number
}
var nvswitchComponentIDRegex = regexp.MustCompile(`^(NVSWITCH\d+|NVSWITCHNVSWITCH\d+)$`)
// GetValueAsString returns the value as a string
func (p *Property) GetValueAsString() string {
switch v := p.Value.(type) {
@@ -107,7 +110,7 @@ func parseInventoryComponents(components []Component, result *models.AnalysisRes
}
// Parse NVSwitch components
if strings.HasPrefix(comp.ComponentID, "NVSWITCHNVSWITCH") {
if isNVSwitchComponentID(comp.ComponentID) {
nvswitch := parseNVSwitchComponent(comp)
if nvswitch != nil {
// Add as PCIe device for now
@@ -152,7 +155,7 @@ func parseSystemInfo(comp Component, result *models.AnalysisResult) bool {
// Don't overwrite real data from output.log with generic data
// Only set if empty or still has the default placeholder value
if result.Hardware.BoardInfo.ProductName == "" ||
result.Hardware.BoardInfo.ProductName == "GPU Server (Field Diag)" {
result.Hardware.BoardInfo.ProductName == "GPU Server (Field Diag)" {
result.Hardware.BoardInfo.ProductName = value
}
case "SerialNumber", "Serial", "BoardSerial", "SystemSerial":
@@ -183,6 +186,9 @@ func parseGPUComponent(comp Component) *models.GPU {
switch prop.ID {
case "DeviceID":
deviceID = prop.GetValueAsString()
if deviceID != "" {
fmt.Sscanf(deviceID, "%x", &gpu.DeviceID)
}
case "Vendor":
gpu.Manufacturer = prop.GetValueAsString()
case "DeviceName":
@@ -217,7 +223,7 @@ func parseGPUComponent(comp Component) *models.GPU {
// parseNVSwitchComponent parses NVSwitch component information
func parseNVSwitchComponent(comp Component) *models.PCIeDevice {
device := &models.PCIeDevice{
Slot: comp.ComponentID, // e.g., "NVSWITCHNVSWITCH0"
Slot: normalizeNVSwitchSlot(comp.ComponentID),
}
var vendorIDStr, deviceIDStr, vbios, pciID string
@@ -279,3 +285,15 @@ func parseNVSwitchComponent(comp Component) *models.PCIeDevice {
return device
}
func normalizeNVSwitchSlot(componentID string) string {
slot := strings.TrimSpace(componentID)
if strings.HasPrefix(slot, "NVSWITCHNVSWITCH") {
return strings.Replace(slot, "NVSWITCHNVSWITCH", "NVSWITCH", 1)
}
return slot
}
func isNVSwitchComponentID(componentID string) bool {
return nvswitchComponentIDRegex.MatchString(strings.TrimSpace(componentID))
}

View File

@@ -0,0 +1,46 @@
package nvidia
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestParseInventoryComponents_IgnoresNVSwitchPropertyChecks(t *testing.T) {
result := &models.AnalysisResult{
Hardware: &models.HardwareConfig{},
}
components := []Component{
{
ComponentID: "NVSWITCHNVSWITCH1",
Properties: []Property{
{ID: "VendorID", Value: "10de"},
{ID: "DeviceID", Value: "22a3"},
{ID: "PCIID", Value: "0000:06:00.0"},
},
},
{
ComponentID: "NVSWITCHNum",
Properties: []Property{
{ID: "NVSWITCHNum", Value: 4},
},
},
{
ComponentID: "NVSWITCH_NVSWITCH1_VendorID",
Properties: []Property{
{ID: "NVSWITCH_NVSWITCH1_VendorID", Value: "10de"},
},
},
}
parseInventoryComponents(components, result)
if got := len(result.Hardware.PCIeDevices); got != 1 {
t.Fatalf("expected exactly 1 parsed NVSwitch device, got %d", got)
}
if result.Hardware.PCIeDevices[0].Slot != "NVSWITCH1" {
t.Fatalf("expected slot NVSWITCH1, got %q", result.Hardware.PCIeDevices[0].Slot)
}
}

View File

@@ -0,0 +1,35 @@
package nvidia
import "testing"
func TestParseNVSwitchComponent_NormalizesDuplicatedPrefixInSlot(t *testing.T) {
comp := Component{
ComponentID: "NVSWITCHNVSWITCH1",
Properties: []Property{
{ID: "VendorID", Value: "10de"},
{ID: "DeviceID", Value: "22a3"},
{ID: "Vendor", Value: "NVIDIA Corporation"},
{ID: "PCIID", Value: "0000:06:00.0"},
{ID: "PCISpeed", Value: "16GT/s"},
{ID: "PCIWidth", Value: "x2"},
{ID: "VBIOS_version", Value: "96.10.6D.00.01"},
},
}
device := parseNVSwitchComponent(comp)
if device == nil {
t.Fatal("expected non-nil NVSwitch device")
}
if device.Slot != "NVSWITCH1" {
t.Fatalf("expected normalized slot NVSWITCH1, got %q", device.Slot)
}
if device.BDF != "0000:06:00.0" {
t.Fatalf("expected BDF 0000:06:00.0, got %q", device.BDF)
}
if device.DeviceClass != "NVSwitch" {
t.Fatalf("expected device class NVSwitch, got %q", device.DeviceClass)
}
}

View File

@@ -1,275 +0,0 @@
# NVIDIA Bug Report Parser
Парсер для файлов nvidia-bug-report, генерируемых скриптом `nvidia-bug-report.sh`.
## Назначение
Этот парсер обрабатывает диагностические логи NVIDIA драйверов и извлекает:
- Информацию о модулях памяти (из dmidecode)
- Информацию о GPU устройствах
- Версию NVIDIA драйвера
## Формат файла
- Имя файла: `nvidia-bug-report-*.log.gz`
- Формат: Gzip-сжатый текстовый файл
- Генерируется: `nvidia-bug-report.sh` скриптом
## Confidence Score
**85** - высокий приоритет для файлов nvidia-bug-report
## Извлекаемые данные
### 1. System Information (из dmidecode)
Информация о сервере:
- **Serial Number**: Серийный номер сервера (например, 2KD501412)
- **UUID**: Уникальный идентификатор системы (например, 2e4054bc-1dd2-11b2-0284-6b0a21737950)
- **Manufacturer**: Производитель сервера
- **Product Name**: Модель сервера
- **Version**: Версия системы
### 2. CPU Information (из dmidecode)
Для каждого процессора извлекается:
- **Model**: Модель процессора (например, Intel(R) Xeon(R) Platinum 8480+)
- **Serial Number**: Серийный номер (например, 5DB0D6C0DD30ABD8)
- **Core Count**: Количество ядер (например, 56)
- **Thread Count**: Количество потоков (например, 112)
- **Max Speed**: Максимальная частота (например, 3800 MHz)
- **Current Speed**: Текущая частота (например, 2000 MHz)
Пример:
```
Socket 0: Intel(R) Xeon(R) Platinum 8480+
Serial Number: 5DB0D6C0DD30ABD8
Cores: 56, Threads: 112
Frequency: 2000 MHz (Max: 3800 MHz)
```
### 3. Memory Modules (из dmidecode)
Для каждого модуля памяти извлекается:
- **Slot/Location**: Например, CPU0_C0D0
- **Size**: Размер в GB (например, 64 GB)
- **Type**: Тип памяти (DDR5, DDR4, etc.)
- **Manufacturer**: Производитель (Hynix, Samsung, Micron, etc.)
- **Part Number**: P/N модуля (например, HMCG94AGBRA179N)
- **Serial Number**: S/N модуля (например, 80AD0224322B3834E6)
- **Speed**: Max/Current скорость (например, 5600/4400 MHz)
- **Ranks**: Количество рангов
Пример:
```
Slot: CPU0_C0D0
Size: 64 GB
Type: DDR5
Manufacturer: Hynix
Part Number: HMCG94AGBRA179N
Serial Number: 80AD0224322B3834E6
Speed: 5600 MT/s (configured: 4400 MT/s)
Ranks: 2
```
### 4. Power Supplies (из dmidecode)
Для каждого блока питания извлекается:
- **Location**: Позиция (например, PSU0, PSU1)
- **Manufacturer**: Производитель (например, DELTA, Great Wall)
- **Model Part Number**: Модель БП (например, V0310DT000000000)
- **Serial Number**: Серийный номер (например, DGPLV251500LZ)
- **Max Power Capacity**: Максимальная мощность (например, 2700 W)
- **Revision**: Версия прошивки (например, 00.01.04)
- **Status**: Статус (например, Present, OK)
Пример:
```
PSU0: V0310DT000000000 (DELTA)
Serial Number: DGPLV251500LZ
Power: 2700 W, Revision: 00.01.04
Status: Present, OK
```
### 5. Network Adapters (из lspci)
Для каждого сетевого адаптера (Ethernet, Network, InfiniBand) извлекается:
- **Model**: Полное название модели из VPD (например, "NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP, PCIe 5.0 x16")
- **Location**: PCI BDF адрес (например, 0000:0e:00.0)
- **Slot**: Физический слот (например, 108)
- **Part Number**: P/N адаптера (например, MCX75310AAS-NEAT)
- **Serial Number**: S/N адаптера (например, MT2430600249)
- **Vendor**: Производитель (Mellanox, NVIDIA)
- **Vendor ID / Device ID**: PCI идентификаторы (например, 15b3:1021)
- **Port Count**: Количество портов (определяется из модели: Dual-port = 2, Single-port = 1)
- **Port Type**: Тип портов (QSFP56, OSFP, SFP+)
Пример:
```
0000:0e:00.0: NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP
Slot: 108
P/N: MCX75310AAS-NEAT
S/N: MT2430600249
Ports: 1 x OSFP
```
### 6. GPU Devices
Для каждого GPU извлекается:
- **Model**: Модель GPU (например, NVIDIA H100 80GB HBM3)
- **BDF (Bus:Device.Function)**: PCI адрес (например, 0000:0f:00.0)
- **UUID**: Уникальный идентификатор GPU (например, GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3)
- **Video BIOS**: Версия BIOS видеокарты (например, 96.00.99.00.01)
- **IRQ**: Прерывание (например, 17)
- **Bus Type**: Тип шины (PCIe)
- **DMA Size**: Размер DMA (например, 52 bits)
- **DMA Mask**: Маска DMA (например, 0xfffffffffffff)
- **Device Minor**: Номер устройства (например, 0)
- **Manufacturer**: NVIDIA
Пример:
```
0000:0f:00.0: NVIDIA H100 80GB HBM3
UUID: GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3
Video BIOS: 96.00.99.00.01
IRQ: 17
```
### 7. Events
- **Memory Configuration**: Сводка по модулям памяти (количество, производители, общий размер)
- **GPU Detection**: Обнаруженные GPU устройства
- **Driver Version**: Версия NVIDIA драйвера
## Пример использования
```bash
# Запуск с nvidia-bug-report файлом
./logpile --file nvidia-bug-report-2KD501412.log.gz
# Веб-интерфейс будет доступен на http://localhost:8082
```
## Пример вывода
```
✓ Detected vendor: NVIDIA Bug Report Parser
✓ CPUs: 2
✓ Memory: 32 modules
✓ Power Supplies: 8
✓ GPUs: 8
✓ Network Adapters: 12
System Information:
Serial Number: 2KD501412
UUID: 2e4054bc-1dd2-11b2-0284-6b0a21737950
Version: 0
CPU Information:
Socket 0: Intel(R) Xeon(R) Platinum 8480+
S/N: 5DB0D6C0DD30ABD8, Cores: 56, Threads: 112
Socket 1: Intel(R) Xeon(R) Platinum 8480+
S/N: 5DB017C05685B3ED, Cores: 56, Threads: 112
Power Supplies:
PSU0: V0310DT000000000 (DELTA)
S/N: DGPLV251500LZ
Power: 2700 W, Revision: 00.01.04
Status: Present, OK
PSU1: V0310DT000000000 (DELTA)
S/N: DGPLV251500GY
Power: 2700 W, Revision: 00.01.04
Status: Present, OK
[... 6 more PSUs ...]
Memory Modules:
CPU0_C0D0: 64 GB, Hynix
P/N: HMCG94AGBRA179N, S/N: 80AD0224322B3834E6
Type: DDR5, Speed: 4400/5600 MHz
[... 31 more modules ...]
Network Adapters: 12 devices
0000:0e:00.0: NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP
Slot: 108
P/N: MCX75310AAS-NEAT
S/N: MT2430600249
Ports: 1 x OSFP
0000:1f:00.0: ConnectX-6 Dx EN adapter card, 100GbE, Dual-port QSFP56
Slot: 12
P/N: MCX623106AN-CDAT
S/N: MT2434J00PCD
Ports: 2 x QSFP56
[... 10 more adapters ...]
GPUs: 8 devices
0000:0f:00.0: NVIDIA H100 80GB HBM3
UUID: GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3
Video BIOS: 96.00.99.00.01
IRQ: 17
0000:34:00.0: NVIDIA H100 80GB HBM3
UUID: GPU-fa796345-c23a-54aa-1b67-709ac2542852
Video BIOS: 96.00.99.00.01
IRQ: 16
[... 6 more GPUs ...]
```
## Версионирование
**Текущая версия парсера:** 1.0.0
### История версий
- **1.0.0** - Первоначальная версия с парсингом System Info, CPU, Memory, PSU, GPU, Network Adapters и Driver
## Структура данных
Парсер использует следующие секции в bug report:
1. **dmidecode output (System Information)** - для извлечения информации о сервере
2. **dmidecode output (Processor Information)** - для извлечения информации о CPU
3. **dmidecode output (Memory Device)** - для извлечения информации о памяти
4. **dmidecode output (System Power Supply)** - для извлечения информации о блоках питания
5. **lspci -vvv output (Ethernet/Network/Infiniband controller)** - для извлечения информации о сетевых адаптерах
6. **lspci VPD (Vital Product Data)** - для извлечения P/N, S/N и модели сетевых адаптеров
7. **/proc/driver/nvidia/gpus/.../information** - для детальной информации о GPU
8. **NVRM version** - для версии драйвера
## Известные ограничения
1. Ошибки и предупреждения из логов пока не извлекаются
2. Некоторые специфичные характеристики GPU (температура, утилизация) не парсятся
3. Информация о производительности и метрики GPU требуют парсинга других секций
## Расширение
Для добавления новых возможностей:
1. **Ошибки драйвера**: Парсить секции с ошибками NVIDIA драйвера
2. **nvidia-smi output**: Извлекать детальную информацию из вывода nvidia-smi (температура, утилизация)
3. **GPU производительность**: Парсить метрики производительности и использования памяти GPU
4. **PCIe информация**: Извлекать детали о PCIe конфигурации (скорость линка, ширина)
## Пример структуры файла
```
Start of NVIDIA bug report log file
nvidia-bug-report.sh Version: 34275561
Date: Thu Jul 17 18:18:18 EDT 2025
[... system info ...]
Memory Device
Data Width: 64 bits
Size: 64 GB
Form Factor: DIMM
Locator: CPU0_C0D0
Type: DDR5
Speed: 5600 MT/s
Manufacturer: Hynix
Serial Number: 80AD0224322B3834E6
Part Number: HMCG94AGBRA179N
[... more memory modules ...]
*** /proc/driver/nvidia/./gpus/0000:0f:00.0/power
[... GPU info ...]
```

View File

@@ -106,6 +106,8 @@ func parseGPUInfo(content string, result *models.AnalysisResult) {
result.Hardware.GPUs = append(result.Hardware.GPUs, *currentGPU)
}
applyGPUSerialNumbers(content, result.Hardware.GPUs)
// Create event for GPU summary
if len(result.Hardware.GPUs) > 0 {
result.Events = append(result.Events, models.Event{
@@ -168,3 +170,138 @@ func formatGPUSummary(gpus []models.GPU) string {
return summary.String()
}
func applyGPUSerialNumbers(content string, gpus []models.GPU) {
if len(gpus) == 0 {
return
}
serialByBDF := parseGPUSerialsFromNvidiaSMI(content)
if len(serialByBDF) == 0 {
serialByBDF = parseGPUSerialsFromSummary(content)
}
if len(serialByBDF) == 0 {
return
}
for i := range gpus {
bdf := normalizeGPUAddress(gpus[i].BDF)
if bdf == "" {
continue
}
if serial, ok := serialByBDF[bdf]; ok && serial != "" {
gpus[i].SerialNumber = serial
}
}
}
func parseGPUSerialsFromNvidiaSMI(content string) map[string]string {
scanner := bufio.NewScanner(strings.NewReader(content))
reGPU := regexp.MustCompile(`^GPU\s+([0-9A-F]{8}:[0-9A-F]{2}:[0-9A-F]{2}\.[0-9A-F])$`)
serialByBDF := make(map[string]string)
currentBDF := ""
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" {
continue
}
if matches := reGPU.FindStringSubmatch(line); len(matches) == 2 {
currentBDF = normalizeGPUAddress(matches[1])
continue
}
if currentBDF == "" {
continue
}
if strings.HasPrefix(line, "Serial Number") {
parts := strings.SplitN(line, ":", 2)
if len(parts) != 2 {
continue
}
serial := strings.TrimSpace(parts[1])
if serial != "" && !strings.EqualFold(serial, "N/A") {
serialByBDF[currentBDF] = serial
}
}
}
return serialByBDF
}
func parseGPUSerialsFromSummary(content string) map[string]string {
scanner := bufio.NewScanner(strings.NewReader(content))
serialByBDF := make(map[string]string)
inGPUDetails := false
for scanner.Scan() {
line := scanner.Text()
trimmed := strings.TrimSpace(line)
if strings.HasPrefix(trimmed, "NVIDIA GPU Details") {
inGPUDetails = true
}
if !inGPUDetails {
continue
}
if strings.HasPrefix(trimmed, "NVIDIA Switch Details") {
break
}
parts := strings.Split(line, "|")
if len(parts) < 2 {
continue
}
payload := strings.TrimSpace(parts[len(parts)-1])
if payload == "" {
continue
}
fields := strings.Split(payload, ",")
if len(fields) < 6 {
continue
}
bdf := normalizeGPUAddress(strings.TrimSpace(fields[4]))
serial := strings.TrimSpace(fields[5])
if bdf == "" || serial == "" || strings.EqualFold(serial, "N/A") {
continue
}
serialByBDF[bdf] = serial
}
return serialByBDF
}
func normalizeGPUAddress(addr string) string {
addr = strings.TrimSpace(addr)
if addr == "" {
return ""
}
parts := strings.Split(addr, ":")
if len(parts) != 3 {
return strings.ToLower(addr)
}
domain := parts[0]
bus := parts[1]
devFn := parts[2]
devFnParts := strings.Split(devFn, ".")
if len(devFnParts) != 2 {
return strings.ToLower(addr)
}
device := devFnParts[0]
fn := devFnParts[1]
if len(domain) == 8 {
domain = domain[4:]
}
return strings.ToLower(domain + ":" + bus + ":" + device + "." + fn)
}

View File

@@ -0,0 +1,54 @@
package nvidia_bug_report
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestApplyGPUSerialNumbers_FromNvidiaSMI(t *testing.T) {
content := `
/usr/bin/nvidia-smi --query
GPU 00000000:18:00.0
Serial Number : 1653925025827
GPU 00000000:2A:00.0
Serial Number : 1653925050608
`
gpus := []models.GPU{
{BDF: "0000:18:00.0"},
{BDF: "0000:2a:00.0"},
}
applyGPUSerialNumbers(content, gpus)
if gpus[0].SerialNumber != "1653925025827" {
t.Fatalf("unexpected serial for gpu0: %q", gpus[0].SerialNumber)
}
if gpus[1].SerialNumber != "1653925050608" {
t.Fatalf("unexpected serial for gpu1: %q", gpus[1].SerialNumber)
}
}
func TestApplyGPUSerialNumbers_FromSummaryFallback(t *testing.T) {
content := `
NVIDIA GPU Details | NVIDIA H200, 570.172.08, 143771 MiB, 96.00.D0.00.03, 00000000:18:00.0, 1653925025827
| NVIDIA H200, 570.172.08, 143771 MiB, 96.00.D0.00.03, 00000000:2A:00.0, 1653925050608
NVIDIA Switch Details | No devices matching query 'Quantum'
`
gpus := []models.GPU{
{BDF: "0000:18:00.0"},
{BDF: "0000:2a:00.0"},
}
applyGPUSerialNumbers(content, gpus)
if gpus[0].SerialNumber != "1653925025827" {
t.Fatalf("unexpected serial for gpu0: %q", gpus[0].SerialNumber)
}
if gpus[1].SerialNumber != "1653925050608" {
t.Fatalf("unexpected serial for gpu1: %q", gpus[1].SerialNumber)
}
}

View File

@@ -3,14 +3,33 @@
package nvidia_bug_report
import (
"fmt"
"regexp"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
// parserVersion - version of this parser module
const parserVersion = "1.0.0"
const parserVersion = "1.2"
var bugReportDateLineRegex = regexp.MustCompile(`(?m)^Date:\s+(.+?)\s*$`)
var dateWithTZAbbrevRegex = regexp.MustCompile(`^([A-Za-z]{3}\s+[A-Za-z]{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})\s+([A-Za-z]{2,5})\s+(\d{4})$`)
var timezoneAbbrevToOffset = map[string]string{
"UTC": "+00:00",
"GMT": "+00:00",
"EST": "-05:00",
"EDT": "-04:00",
"CST": "-06:00",
"CDT": "-05:00",
"MST": "-07:00",
"MDT": "-06:00",
"PST": "-08:00",
"PDT": "-07:00",
}
func init() {
parser.Register(&Parser{})
@@ -81,6 +100,10 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
}
content := string(files[0].Content)
if collectedAt, tzOffset, ok := parseBugReportCollectedAt(content); ok {
result.CollectedAt = collectedAt.UTC()
result.SourceTimezone = tzOffset
}
// Parse system information
parseSystemInfo(content, result)
@@ -105,3 +128,49 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
return result, nil
}
func parseBugReportCollectedAt(content string) (time.Time, string, bool) {
matches := bugReportDateLineRegex.FindStringSubmatch(content)
if len(matches) != 2 {
return time.Time{}, "", false
}
raw := strings.TrimSpace(matches[1])
if raw == "" {
return time.Time{}, "", false
}
if m := dateWithTZAbbrevRegex.FindStringSubmatch(raw); len(m) == 4 {
if offset, ok := timezoneAbbrevToOffset[strings.ToUpper(strings.TrimSpace(m[2]))]; ok {
layout := "Mon Jan 2 15:04:05 -07:00 2006"
normalized := strings.TrimSpace(m[1]) + " " + offset + " " + strings.TrimSpace(m[3])
if ts, err := time.Parse(layout, normalized); err == nil {
return ts, offset, true
}
}
}
layouts := []string{
"Mon Jan 2 15:04:05 MST 2006",
"Mon Jan 2 15:04:05 2006",
}
for _, layout := range layouts {
ts, err := time.Parse(layout, raw)
if err != nil {
continue
}
return ts, formatOffset(ts), true
}
return time.Time{}, "", false
}
func formatOffset(t time.Time) string {
_, sec := t.Zone()
sign := '+'
if sec < 0 {
sign = '-'
sec = -sec
}
h := sec / 3600
m := (sec % 3600) / 60
return fmt.Sprintf("%c%02d:%02d", sign, h, m)
}

View File

@@ -0,0 +1,54 @@
package nvidia_bug_report
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestParseBugReportCollectedAt(t *testing.T) {
content := `
Start of NVIDIA bug report log file
Date: Fri Dec 12 10:14:49 EST 2025
`
got, tz, ok := parseBugReportCollectedAt(content)
if !ok {
t.Fatalf("expected collected_at to be parsed")
}
if tz != "-05:00" {
t.Fatalf("expected tz offset -05:00, got %q", tz)
}
wantUTC := time.Date(2025, 12, 12, 15, 14, 49, 0, time.UTC)
if !got.UTC().Equal(wantUTC) {
t.Fatalf("expected %s, got %s", wantUTC, got.UTC())
}
}
func TestNvidiaBugReportParser_SetsCollectedAtAndTimezone(t *testing.T) {
p := &Parser{}
files := []parser.ExtractedFile{
{
Path: "nvidia-bug-report-1653925023938.log",
Content: []byte(`
Start of NVIDIA bug report log file
nvidia-bug-report.sh Version: 34275561
Date: Fri Dec 12 10:14:49 EST 2025
`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.SourceTimezone != "-05:00" {
t.Fatalf("expected source timezone -05:00, got %q", result.SourceTimezone)
}
wantUTC := time.Date(2025, 12, 12, 15, 14, 49, 0, time.UTC)
if !result.CollectedAt.Equal(wantUTC) {
t.Fatalf("expected collected_at %s, got %s", wantUTC, result.CollectedAt)
}
}

41507
internal/parser/vendors/pciids/pci.ids vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1,12 +1,27 @@
package pciids
import (
"bufio"
_ "embed"
"fmt"
"os"
"strconv"
"strings"
"sync"
)
var (
//go:embed pci.ids
embeddedPCIIDs string
loadOnce sync.Once
vendors map[int]string
devices map[string]string
)
// VendorName returns vendor name by PCI Vendor ID
func VendorName(vendorID int) string {
loadPCIIDs()
if name, ok := vendors[vendorID]; ok {
return name
}
@@ -15,6 +30,7 @@ func VendorName(vendorID int) string {
// DeviceName returns device name by Vendor ID and Device ID
func DeviceName(vendorID, deviceID int) string {
loadPCIIDs()
key := fmt.Sprintf("%04x:%04x", vendorID, deviceID)
if name, ok := devices[key]; ok {
return name
@@ -46,7 +62,6 @@ func VendorNameFromString(s string) string {
} else if c >= 'a' && c <= 'f' {
id = id*16 + int(c-'a'+10)
} else {
// Not a valid hex string, return original
return ""
}
}
@@ -54,124 +69,99 @@ func VendorNameFromString(s string) string {
return VendorName(id)
}
// Common PCI Vendor IDs
// Source: https://pci-ids.ucw.cz/
var vendors = map[int]string{
// Storage controllers and SSDs
0x1E0F: "KIOXIA",
0x144D: "Samsung Electronics",
0x1C5C: "SK Hynix",
0x15B7: "SanDisk (Western Digital)",
0x1179: "Toshiba",
0x8086: "Intel",
0x1344: "Micron Technology",
0x126F: "Silicon Motion",
0x1987: "Phison Electronics",
0x1CC1: "ADATA Technology",
0x2646: "Kingston Technology",
0x1E95: "Solid State Storage Technology",
0x025E: "Solidigm",
0x1D97: "Shenzhen Longsys Electronics",
0x1E4B: "MAXIO Technology",
func loadPCIIDs() {
loadOnce.Do(func() {
vendors = make(map[int]string)
devices = make(map[string]string)
// Network adapters
0x15B3: "Mellanox Technologies",
0x14E4: "Broadcom",
0x10EC: "Realtek Semiconductor",
0x1077: "QLogic",
0x19A2: "Emulex",
0x1137: "Cisco Systems",
0x1924: "Solarflare Communications",
0x177D: "Cavium",
0x1D6A: "Aquantia",
0x1FC9: "Tehuti Networks",
0x18D4: "Chelsio Communications",
parsePCIIDs(strings.NewReader(embeddedPCIIDs), vendors, devices)
// GPU / Graphics
0x10DE: "NVIDIA",
0x1002: "AMD/ATI",
0x102B: "Matrox Electronics",
0x1A03: "ASPEED Technology",
// Storage controllers (RAID/HBA)
0x1000: "LSI Logic / Broadcom",
0x9005: "Adaptec / Microsemi",
0x1028: "Dell",
0x103C: "Hewlett-Packard",
0x17D3: "Areca Technology",
0x1CC4: "Union Memory",
// Server vendors
0x1014: "IBM",
0x15D9: "Supermicro",
0x8088: "Inspur",
// Other common
0x1022: "AMD",
0x1106: "VIA Technologies",
0x10B5: "PLX Technology",
0x1B21: "ASMedia Technology",
0x1B4B: "Marvell Technology",
0x197B: "JMicron Technology",
for _, path := range candidatePCIIDsPaths() {
f, err := os.Open(path)
if err != nil {
continue
}
parsePCIIDs(f, vendors, devices)
_ = f.Close()
}
})
}
// Device IDs (vendor:device -> name)
var devices = map[string]string{
// NVIDIA GPUs (0x10DE)
"10de:26b9": "L40S 48GB",
"10de:26b1": "L40 48GB",
"10de:2684": "RTX 4090",
"10de:2704": "RTX 4080",
"10de:2782": "RTX 4070 Ti",
"10de:2786": "RTX 4070",
"10de:27b8": "RTX 4060 Ti",
"10de:2882": "RTX 4060",
"10de:2204": "RTX 3090",
"10de:2208": "RTX 3080 Ti",
"10de:2206": "RTX 3080",
"10de:2484": "RTX 3070",
"10de:2503": "RTX 3060",
"10de:20b0": "A100 80GB",
"10de:20b2": "A100 40GB",
"10de:20f1": "A10",
"10de:2236": "A10G",
"10de:25b6": "A16",
"10de:20b5": "A30",
"10de:20b7": "A30X",
"10de:1db4": "V100 32GB",
"10de:1db1": "V100 16GB",
"10de:1e04": "RTX 2080 Ti",
"10de:1e07": "RTX 2080",
"10de:1f02": "RTX 2070",
"10de:26ba": "L40S-PCIE-48G",
"10de:2330": "H100 80GB PCIe",
"10de:2331": "H100 80GB SXM5",
"10de:2322": "H100 NVL",
"10de:2324": "H200",
func candidatePCIIDsPaths() []string {
paths := []string{
"pci.ids",
"/usr/share/hwdata/pci.ids",
"/usr/share/misc/pci.ids",
"/opt/homebrew/share/pciids/pci.ids",
}
// AMD GPUs (0x1002)
"1002:744c": "Instinct MI250X",
"1002:7408": "Instinct MI100",
"1002:73a5": "RX 6950 XT",
"1002:73bf": "RX 6900 XT",
"1002:73df": "RX 6700 XT",
"1002:7480": "RX 7900 XTX",
"1002:7483": "RX 7900 XT",
// ASPEED (0x1A03) - BMC VGA
"1a03:2000": "AST2500 VGA",
"1a03:1150": "AST2600 VGA",
// Intel GPUs
"8086:56c0": "Data Center GPU Flex 170",
"8086:56c1": "Data Center GPU Flex 140",
// Mellanox/NVIDIA NICs (0x15B3)
"15b3:1017": "ConnectX-5 100GbE",
"15b3:1019": "ConnectX-5 Ex",
"15b3:101b": "ConnectX-6",
"15b3:101d": "ConnectX-6 Dx",
"15b3:101f": "ConnectX-6 Lx",
"15b3:1021": "ConnectX-7",
"15b3:a2d6": "ConnectX-4 Lx",
// Env paths have highest priority, so they are applied last.
if env := strings.TrimSpace(os.Getenv("LOGPILE_PCI_IDS_PATH")); env != "" {
for _, p := range strings.Split(env, string(os.PathListSeparator)) {
p = strings.TrimSpace(p)
if p != "" {
paths = append(paths, p)
}
}
}
return paths
}
func parsePCIIDs(r interface{ Read([]byte) (int, error) }, outVendors map[int]string, outDevices map[string]string) {
scanner := bufio.NewScanner(r)
currentVendor := -1
for scanner.Scan() {
line := scanner.Text()
if line == "" || strings.HasPrefix(line, "#") {
continue
}
// Subdevice line (tab-tab) - ignored for now
if strings.HasPrefix(line, "\t\t") {
continue
}
// Device line
if strings.HasPrefix(line, "\t") {
if currentVendor < 0 {
continue
}
trimmed := strings.TrimLeft(line, "\t")
fields := strings.Fields(trimmed)
if len(fields) < 2 {
continue
}
deviceID, err := strconv.ParseInt(fields[0], 16, 32)
if err != nil {
continue
}
name := strings.TrimSpace(trimmed[len(fields[0]):])
if name == "" {
continue
}
key := fmt.Sprintf("%04x:%04x", currentVendor, int(deviceID))
outDevices[key] = name
continue
}
// Vendor line
fields := strings.Fields(line)
if len(fields) < 2 {
currentVendor = -1
continue
}
vendorID, err := strconv.ParseInt(fields[0], 16, 32)
if err != nil {
currentVendor = -1
continue
}
name := strings.TrimSpace(line[len(fields[0]):])
if name == "" {
currentVendor = -1
continue
}
currentVendor = int(vendorID)
outVendors[currentVendor] = name
}
}

View File

@@ -0,0 +1,38 @@
package pciids
import (
"os"
"path/filepath"
"sync"
"testing"
)
func TestExternalPCIIDsLookup(t *testing.T) {
dir := t.TempDir()
idsPath := filepath.Join(dir, "pci.ids")
content := "" +
"# sample\n" +
"10de NVIDIA Corporation\n" +
"\t233b NVIDIA H200 SXM\n" +
"8086 Intel Corporation\n" +
"\t1521 I350 Gigabit Network Connection\n"
if err := os.WriteFile(idsPath, []byte(content), 0o644); err != nil {
t.Fatalf("write pci.ids: %v", err)
}
t.Setenv("LOGPILE_PCI_IDS_PATH", idsPath)
loadOnce = sync.Once{}
vendors = nil
devices = nil
if got := DeviceName(0x10de, 0x233b); got != "NVIDIA H200 SXM" {
t.Fatalf("expected external device name, got %q", got)
}
if got := VendorName(0x10de); got != "NVIDIA Corporation" {
t.Fatalf("expected external vendor name, got %q", got)
}
if got := DeviceName(0x8086, 0x1521); got != "I350 Gigabit Network Connection" {
t.Fatalf("expected external intel device name, got %q", got)
}
}

View File

@@ -1,133 +0,0 @@
# SMC Crash Dump Parser
Парсер для архивов Supermicro (SMC) BMC Crash Dump.
## Поддерживаемые серверы
- Supermicro SYS-821GE-TNHR
- Другие серверы Supermicro с BMC Crashdump функциональностью
## Формат архива
Парсер работает с архивами в формате:
- `.tgz` / `.tar.gz` (сжатый tar)
- `.tar` (несжатый tar)
## Распознаваемые файлы
### Основные файлы
1. **CDump.txt** - JSON файл с данными crashdump
- Metadata (BMC, BIOS, ME версии firmware)
- CPU информация (CPUID, количество ядер, microcode версия, PPIN)
- MCA (Machine Check Architecture) данные - ошибки процессоров
## Извлекаемые данные
### Hardware Configuration
#### CPUs
```json
{
"slot": "CPU0",
"model": "CPUID: 0xc06f2",
"cores": 56,
"manufacturer": "Intel",
"firmware": "Microcode: 0x210002b3"
}
```
### FRU Information
- BMC Firmware Version
- BIOS Version
- ME Firmware Version
- CPU PPIN (Protected Processor Inventory Number)
### Events
События создаются для:
- **Crashdump collection** - когда был собран crashdump
- **MCA Errors** - ошибки Machine Check Architecture
- Corrected errors (Warning severity)
- Uncorrected errors (Critical severity)
Уровни severity:
- `info` - информационные события (crashdump по запросу)
- `warning` - предупреждения (corrected MCA errors, reset detected)
- `critical` - критические ошибки (uncorrected MCA errors)
## Пример использования
```bash
# Запуск веб-интерфейса
./logpile --file /path/to/CDump_090859_01302026.tgz
# Веб-интерфейс будет доступен на http://localhost:8082
```
## Автоопределение
Парсер автоматически определяет архивы SMC Crash Dump по наличию:
- `CDump.txt` с маркерами "crash_data", "METADATA", "bmc_fw_ver"
Confidence score:
- `CDump.txt` с маркерами crashdump: +80
## Версионирование
**Текущая версия парсера:** 1.0.0
При модификации логики парсера необходимо увеличивать версию в константе `parserVersion` в файле `parser.go`.
## Примеры данных
### Пример CDump.txt (metadata)
```json
{
"crash_data": {
"METADATA": {
"cpu0": {
"cpuid": "0xc06f2",
"core_count": "0x38",
"ppin": "0xa3ccbe7d45026592",
"ucode_patch_ver": "0x210002b3"
},
"bmc_fw_ver": "01.03.18",
"bios_id": "BIOS Date: 08/04/2025 Rev 2.7",
"me_fw_ver": "6.1.4.204",
"timestamp": "2026-01-30T09:06:52Z",
"trigger_type": "On-Demand"
}
}
}
```
### MCA Error Detection
Парсер проверяет регистры MCA status на наличие ошибок:
- Bit 63 (Valid) - индикатор валидной ошибки
- Bit 61 (UC) - uncorrected error
- Bit 60 (EN) - error enabled
## Известные ограничения
1. Парсер фокусируется на данных из `CDump.txt`
2. Детальный анализ MCA errors пока упрощен (только проверка status регистров)
3. TOR dump и другие расширенные данные пока не парсятся
## Разработка
### Добавление новых полей
1. Изучите структуру JSON в CDump.txt
2. Добавьте поля в структуры `Metadata`, `CPUMetadata`, или `MCAData`
3. Обновите функции парсинга
4. Увеличьте версию парсера
### Расширение MCA анализа
Для более детального анализа MCA ошибок можно:
1. Добавить декодирование MCA error codes
2. Парсить MISC и ADDR регистры
3. Добавить корреляцию ошибок между банками

Some files were not shown because too many files have changed in this diff Show More