42 Commits

Author SHA1 Message Date
f19a3454fa fix(redfish): gate hgx diagnostic plan-b by debug toggle 2026-04-13 14:45:41 +03:00
Mikhail Chusavitin
becdca1d7e fix(redfish): read PCIeInterface link width for GPU PCIe devices
parseGPUWithSupplementalDocs did not read PCIeInterface from the device
doc, only from function docs. xFusion GPU PCIeCard entries carry link
width/speed in PCIeInterface (LanesInUse/Maxlanes/PCIeType/MaxPCIeType)
so GPU link width was always empty for xFusion servers.

Also apply the xFusion OEM function-level fallback for GPU function docs,
consistent with the NIC and PCIeDevice paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:35:29 +03:00
Mikhail Chusavitin
e10440ae32 fix(redfish): collect PCIe link width from xFusion servers
xFusion iBMC exposes PCIe link width in two non-standard ways:
- PCIeInterface uses "Maxlanes" (lowercase 'l') instead of "MaxLanes"
- PCIeFunction docs carry width/speed in Oem.xFusion.LinkWidth ("X8"),
  Oem.xFusion.LinkWidthAbility, Oem.xFusion.LinkSpeed, and
  Oem.xFusion.LinkSpeedAbility rather than the standard CurrentLinkWidth int

Add redfishEnrichFromOEMxFusionPCIeLink and parseXFusionLinkWidth helpers,
apply them as fallbacks in NIC and PCIeDevice enrichment paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 13:35:29 +03:00
5c2a21aff1 chore: update bible and chart submodules
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-11 12:17:40 +03:00
Mikhail Chusavitin
9df13327aa feat(collect): remove power-on/off, add skip-hung for Redfish collection
Remove power-on and power-off functionality from the Redfish collector;
keep host power-state detection and show a warning in the UI when the
host is powered off before collection starts.

Add a "Пропустить зависшие" (skip hung) button that lets the user abort
stuck Redfish collection phases without losing already-collected data.
Introduces a two-level context model in Collect(): the outer job context
covers the full lifecycle including replay; an inner collectCtx covers
snapshot, prefetch, and plan-B phases only. Closing the skipCh cancels
collectCtx immediately — aborts all in-flight HTTP requests and exits
plan-B loops — then replay runs on whatever rawTree was collected.

Signal path: UI → POST /api/collect/{id}/skip → JobManager.SkipJob()
→ close(skipCh) → goroutine in Collect() → cancelCollect().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 13:12:38 +03:00
Mikhail Chusavitin
7e9af89c46 Add xFusion file-export parser support 2026-04-04 15:07:10 +03:00
Mikhail Chusavitin
db74df9994 fix(redfish): trim MSI replay noise and unify NIC classes 2026-04-01 17:49:00 +03:00
Mikhail Chusavitin
bb82387d48 fix(redfish): narrow MSI PCIeFunctions crawl 2026-04-01 16:50:51 +03:00
Mikhail Chusavitin
475f6ac472 fix(export): keep storage inventory without serials 2026-04-01 16:50:19 +03:00
Mikhail Chusavitin
93ce676f04 fix(redfish): recover MSI NIC serials from PCIe functions 2026-04-01 15:48:47 +03:00
Mikhail Chusavitin
c47c34fd11 feat(hpe): improve inventory extraction and export fidelity 2026-03-30 15:04:17 +03:00
Mikhail Chusavitin
d8c3256e41 chore(hpe_ilo_ahs): normalize module version format — v1.0 2026-03-30 13:43:10 +03:00
Mikhail Chusavitin
1b2d978d29 test: add real fixture coverage for HPE AHS parser 2026-03-30 13:41:02 +03:00
Mikhail Chusavitin
0f310d57c4 feat: HPE iLO support — profile, post-probe hang fix, replay parser fixes, AHS parser
- Add HPE iLO Redfish profile (priority 20): matches on manufacturer/OEM/iLO signals,
  adds SmartStorage/SmartStorageConfig to critical paths, sets realistic ETA baseline
  and rate policy for iLO's known slowness
- Fix post-probe hang on HPE iLO: skip numeric probing of collections where
  Members@odata.count == len(Members); add 4s postProbeClient timeout as safety net
- Exclude /WorkloadPerformanceAdvisor from crawl paths
- Fix replay parser: skip absent CPU sockets, absent DIMM slots, absent drive bays
- Filter N/A version entries from firmware inventory
- Remove drive firmware from general firmware list (already in Storage[].Firmware)
- Add HPE AHS (.ahs) archive parser with hybrid SMBIOS/Redfish extraction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 13:39:14 +03:00
Mikhail Chusavitin
3547ef9083 Skip placeholder firmware versions in API output 2026-03-26 18:42:54 +03:00
Mikhail Chusavitin
99f0d6217c Improve Multillect Redfish replay and power detection 2026-03-26 18:41:02 +03:00
Mikhail Chusavitin
8acbba3cc9 feat: add reanimator easy bee parser 2026-03-25 19:36:13 +03:00
Mikhail Chusavitin
8942991f0c Add Inspur Group OEM Redfish profile 2026-03-25 15:08:40 +03:00
Mikhail Chusavitin
9b71c4a95f chore: update bible submodule
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:22:33 +03:00
Mikhail Chusavitin
125f77ef69 feat: adaptive BMC readiness check + ghost NIC dedup fix + empty collection plan-B retry
BMC readiness after power-on (waitForStablePoweredOnHost):
- After initial 1m stabilization, poll BMC inventory readiness before collecting
- Ready if MemorySummary.TotalSystemMemoryGiB > 0 OR PCIeDevices.Members non-empty
- On failure: wait +60s, retry; on second failure: wait +120s, retry; then warn and proceed
- Configurable via LOGPILE_REDFISH_BMC_READY_WAITS (default: 60s,120s)

Empty critical collection plan-B retry (EnableEmptyCriticalCollectionRetry):
- Hardware inventory collections that returned Members=[] are now re-probed in plan-B
- Covers PCIeDevices, NetworkAdapters, Processors, Drives, Storage, EthernetInterfaces
- Enabled by default in generic profile (applies to all vendors)

Ghost NIC dedup fix (enrichNICsFromNetworkInterfaces):
- NetworkInterface entries (e.g. Id=2) that don't match existing NIC slots are now
  resolved via Links.NetworkAdapter cross-reference to the real Chassis NIC
- Prevents duplicate ghost entries (slot=2 "Network Device View") from appearing
  alongside real NICs (slot="RISER 5 slot 1 (7)") with the same MAC addresses

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 11:19:36 +03:00
Mikhail Chusavitin
063b08d5fb feat: redesign collection UI + add StopHostAfterCollect + TCP ping pre-probe
- Single "Подключиться" button flow: probe first, then show collect options
- Power management checkboxes: power on before / stop after collect
- Modal confirmation when enabling shutdown on already-powered-on host
- StopHostAfterCollect flag: host shuts down only when explicitly requested
- TCP ping (10 attempts, min 3 successes) before Redfish probe
- Debug payloads checkbox (Oem/Ami/Inventory/Crc, off by default)
- Remove platform_config BIOS settings collection (unreliable on AMI)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 18:50:01 +03:00
Mikhail Chusavitin
e3ff1745fc feat: clean up collection job log UI
- Filter out debug noise (plan-B per-path lines, heartbeats, timing stats, telemetry)
- Strip server-side nanosecond timestamp prefix from displayed messages
- Transform snapshot progress lines to show current path instead of doc count + ETA
- Humanize recurring message patterns (plan-B summary, prefetch, snapshot total)
- Raw collect.log and raw_export.json are unaffected

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 00:50:13 +03:00
Mikhail Chusavitin
96e65d8f65 feat: Redfish hardware event log collection + MSI ghost GPU filter + inventory improvements
- Collect hardware event logs (last 7 days) from Systems and Managers/SEL LogServices
- Parse AMI raw IPMI dump messages into readable descriptions (Sensor_Type: Event_Type)
- Filter out audit/journal/non-hardware log services; only SEL from Managers
- MSI ghost GPU filter: exclude processor GPU entries with temperature=0 when host is powered on
- Reanimator collected_at uses InventoryData/Status.LastModifiedTime (30-day fallback)
- Invalidate Redfish inventory CRC groups before host power-on
- Log inventory LastModifiedTime age in collection logs
- Drop SecureBoot collection (SecureBootMode, SecureBootDatabases) — not hardware inventory
- Add build version to UI footer via template
- Add MSI Redfish API reference doc to bible-local/docs/

ADL-032–ADL-035

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 23:47:22 +03:00
Mikhail Chusavitin
30409eef67 feat: add xFusion iBMC dump parser (tar.gz format)
Parses xFusion G5500 V7 iBMC diagnostic dump archives with:
- FRU info (board serial, product name, component inventory)
- IPMI sensor readings (temperature, voltage, power, fan, current)
- CPU inventory (model, cores, threads, cache, serial)
- Memory DIMMs (size, speed, type, serial, manufacturer)
- GPU inventory from card_manage/card_info (serial, firmware, ECC counts)
- OCP NIC detection (ConnectX-6 Lx with serial)
- PSU inventory (4x 3000W, serial, firmware, voltage)
- Storage: RAID controller firmware + physical drives (model, serial, endurance)
- iBMC maintenance log events with severity mapping
- Registers as vendor "xfusion" in the parser registry

All 11 fixture tests pass against real G5500 V7 dump archive.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 15:31:28 +03:00
Mikhail Chusavitin
65e65968cf feat: add xFusion iBMC Redfish profile
Matches on ServiceRootVendor "xFusion" and OEM namespace "xFusion"
(score 90+). Enables GenericGraphicsControllerDedup unconditionally and
ProcessorGPUFallback when GPU-type processors are present in the snapshot
(xFusion G5500 V7 exposes H200s simultaneously in PCIeDevices,
GraphicsControllers, and Processors/Gpu* — all three need dedup).

Without this profile, xFusion fell into fallback mode which activated all
vendor profiles (Supermicro, HGX, MSI, Dell) unnecessarily. Now resolves
to matched mode with targeted acquisition tuning (120k cap, 75s baseline).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 15:13:15 +03:00
Mikhail Chusavitin
380c199705 chore: update chart submodule
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 08:50:56 +03:00
Mikhail Chusavitin
d650a6ba1c refactor: unified ingest pipeline + modular Redfish profile framework
Implement the full architectural plan: unified ingest.Service entry point
for archive and Redfish payloads, modular redfishprofile package with
composable profiles (generic, ami-family, msi, supermicro, dell,
hgx-topology), score-based profile matching with fallback expansion mode,
and profile-driven acquisition/analysis plans.

Vendor-specific logic moved out of common executors and into profile hooks.
GPU chassis lookup strategies and known storage recovery collections
(IntelVROC/HA-RAID/MRVL) now live in ResolvedAnalysisPlan, populated by
profiles at analysis time. Replay helpers read from the plan; no hardcoded
path lists remain in generic code.

Also splits redfish_replay.go into domain modules (gpu, storage, inventory,
fru, profiles) and adds full fixture/matcher/directive test coverage
including Dell, AMI, unknown-vendor fallback, and deterministic ordering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 08:48:58 +03:00
Mikhail Chusavitin
d8d3d8c524 build: use local go toolchain in release script 2026-03-16 00:32:09 +03:00
Mikhail Chusavitin
057a222288 ui: embed reanimator chart viewer 2026-03-16 00:20:11 +03:00
Mikhail Chusavitin
f11a43f690 export: merge inspur psu sensor groups 2026-03-15 23:29:44 +03:00
Mikhail Chusavitin
476630190d export: align reanimator contract v2.7 2026-03-15 23:27:32 +03:00
Mikhail Chusavitin
9007f1b360 export: align reanimator and enrich redfish metrics 2026-03-15 21:38:28 +03:00
Mikhail Chusavitin
0acdc2b202 docs: refresh project documentation 2026-03-15 16:35:16 +03:00
Mikhail Chusavitin
47bb0ee939 docs: document firmware filter regression pattern in bible (ADL-019)
Root cause analysis for device-bound firmware leaking into hardware.firmware
on Supermicro Redfish (SYS-A21GE-NBRT HGX B200):

- collectFirmwareInventory (6c19a58) had no coverage for Supermicro naming.
  isDeviceBoundFirmwareName checked "gpu " / "nic " (space-terminated) while
  Supermicro uses "GPU1 System Slot0" / "NIC1 System Slot0 ..." (digit suffix).

- 9c5512d added _fw_gpu_ / _fw_nvswitch_ / _inforom_gpu_ patterns to fix HGX,
  but checked DeviceName which contains "Software Inventory" (from Redfish Name),
  not the firmware Id. Dead code from day one.

09-testing.md: add firmware filter worked example and rule #4 — verify the
filter checks the field that the collector actually populates.

10-decisions.md: ADL-019 — isDeviceBoundFirmwareName must be extended per
vendor with a test case per vendor format before shipping.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 14:03:47 +03:00
Mikhail Chusavitin
5815100e2f exporter: filter Supermicro Redfish device-bound firmware from hardware.firmware
isDeviceBoundFirmwareName did not catch Supermicro FirmwareInventory naming
conventions where a digit follows the type prefix directly ("GPU1 System Slot0",
"NIC1 System Slot0 AOM-DP805-IO") instead of a space. Also missing: "Power supply N",
"NVMeController N", and "Software Inventory" (generic label for all HGX per-component
firmware slots — GPU, NVSwitch, PCIeRetimer, ERoT, InfoROM, etc.).

On SYS-A21GE-NBRT (HGX B200) this caused 29 device-bound entries to leak into
hardware.firmware: 8 GPU, 9 NIC, 1 NVMe, 6 PSU, 4 PCIeSwitch, 1 Software Inventory.

Fix: extend isDeviceBoundFirmwareName with patterns for all four new cases.
Add TestIsDeviceBoundFirmwareName covering both excluded and kept entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:48:55 +03:00
Mikhail Chusavitin
1eb639e6bf redfish: skip NVMe bay probe for non-storage chassis types (Module/Component/Zone)
On Supermicro HGX systems (SYS-A21GE-NBRT) ~35 sub-chassis (GPU, NVSwitch,
PCIeRetimer, ERoT/IRoT, BMC, FPGA) all carry ChassisType=Module/Component/Zone
and expose empty /Drives collections. shouldAdaptiveNVMeProbe returned true for
all of them, triggering 35 × 384 = 13 440 HTTP requests → ~22 min wasted per
collection (more than half of total 35 min collection time).

Fix: chassisTypeCanHaveNVMe returns false for Module, Component, Zone. The
candidate selection loop in collectRawRedfishTree now checks the parent chassis
doc before adding a /Drives path to the probe list. Enclosure (NVMe backplane),
RackMount, and unknown types are unaffected.

Tests:
- TestChassisTypeCanHaveNVMe: table-driven, covers excluded and storage-capable types
- TestNVMePostProbeSkipsNonStorageChassis: topology integration, GPU chassis +
  backplane with empty /Drives → exactly 1 candidate selected (backplane only)

Docs:
- ADL-018 in bible-local/10-decisions.md
- Candidate-selection test matrix in bible-local/09-testing.md
- SYS-A21GE-NBRT baseline row in docs/test_server_collection_memory.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:38:29 +03:00
Mikhail Chusavitin
a9f58b3cf4 redfish: fix GPU duplication on Supermicro HGX, exclude NVSwitch, restore path dedup
Three bugs, all related to GPU dedup in the Redfish replay pipeline:

1. collectGPUsFromProcessors (redfish_replay.go): GPU-type Processor entries
   (Systems/HGX_Baseboard_0/Processors/GPU_SXM_N) were not deduplicated against
   existing PCIeDevice GPUs on Supermicro HGX. The chassis-ID lookup keyed on
   processor Id ("GPU_SXM_1") but the chassis is named "HGX_GPU_SXM_1" — lookup
   returned nothing, serial stayed empty, UUID was unseen → 8 duplicate GPU rows.
   Fix: read SerialNumber directly from the Processor doc first; chassis lookup
   is now a fallback override (as it was designed for MSI).

2. looksLikeGPU (redfish.go): NVSwitch PCIe devices (Model="NVSwitch",
   Manufacturer="NVIDIA") were classified as GPUs because "nvidia" matched the
   GPU hint list. Fix: early return false when Model contains "nvswitch".

3. gpuDocDedupKey (redfish.go): commit 9df29b1 changed the dedup key to prefer
   slot|model before path, which collapsed two distinct GPUs with identical model
   names in GraphicsControllers into one entry. Fix: only serial and BDF are used
   as cross-path stable dedup keys; fall back to Redfish path when neither is
   present. This also restores TestReplayCollectGPUs_DedupUsesRedfishPathBeforeHeuristics
   which had been broken on main since 9df29b1.

Added tests:
- TestCollectGPUsFromProcessors_SupermicroHGX: Processor GPU dedup when
  chassis-ID naming convention does not match processor Id
- TestReplayCollectGPUs_DedupCrossChassisSerial: same GPU via two Chassis
  PCIeDevice trees with matching serials → collapsed to one
- TestLooksLikeGPU_NVSwitchExcluded: NVSwitch is not a GPU

Added rule to bible-local/09-testing.md: dedup/filter/classify functions must
cover true-positive, true-negative, and the vendor counter-case axes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 15:09:27 +03:00
Mikhail Chusavitin
d8ffe3d3a5 redfish: add service root to critical endpoints, tolerate missing root in replay
Add /redfish/v1 to redfishCriticalEndpoints so plan-B retries the service
root if it failed during the main crawl. Also downgrade the missing-root
error in ReplayRedfishFromRawPayloads from fatal to a warning so analysis
can complete with defaults when the root doc was not recovered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 08:31:00 +03:00
Mikhail Chusavitin
9df29b1be9 fix: dedup GPUs across multiple chassis PCIeDevice trees in Redfish collector
Supermicro HGX exposes each GPU under both Chassis/1/PCIeDevices and a
dedicated Chassis/HGX_GPU_SXM_N/PCIeDevices. gpuDocDedupKey was keying
by @odata.id path, so identical GPUs with the same serial were not
deduplicated across sources. Now stable identifiers (serial → BDF →
slot+model) take priority over path.

Also includes Inspur parser improvements: NVMe model/serial enrichment
from devicefrusdr.log and audit.log, RAID drive slot normalization to
BP notation, PSU slot normalization, BMC/CPLD/VR firmware from RESTful
version info section, and parser version bump to 1.8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 14:44:36 +03:00
Mikhail Chusavitin
62d6ad6f66 ui: deduplicate files by name and SHA-256 hash before batch convert
On folder selection, filter out duplicate files before conversion:
- First pass: same basename → skip (same filename in different subdirs)
- Second pass: same SHA-256 hash → skip (identical content, different path)

Duplicates are excluded from the convert queue and shown as a warning
in the summary with reason (same name / same content).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 12:45:09 +03:00
Mikhail Chusavitin
f09344e288 dell: filter chipset/embedded noise from PCIe device list
Skip FQDD prefixes that are internal AMD EPYC fabric or devices
already captured with richer data from other DCIM views:
- HostBridge/P2PBridge/ISABridge/SMBus.Embedded: AMD internal bus
- AHCI.Embedded: AMD FCH SATA (chipset, not a slot)
- Video.Embedded: BMC Matrox G200eW3, not user-visible
- NIC.Embedded: duplicates DCIM_NICView entries (no model/MAC in PCIe view)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 12:09:40 +03:00
19d857b459 redfish: filter PCIe topology noise, deduplicate GPU/NIC cross-sources
- isUnidentifiablePCIeDevice: skip PCIe entries with generic class
  (SingleFunction/MultiFunction) and no model/serial/VendorID — eliminates
  PCH bridges, root ports and other bus infrastructure that MSI BMC
  enumerates exhaustively (59→9 entries on CG480-S5063)
- collectPCIeDevices: skip entries where looksLikeGPU — prevents GPU
  devices from appearing in both hw.GPUs and hw.PCIeDevices (fixed
  Inspur H100 duplicate)
- dedupeCanonicalDevices: secondary model+manufacturer match for noKey
  items (no serial, no BDF) — merges NetworkAdapter entries into
  matching PCIe device entries; isGenericDeviceClass helper for
  DeviceClass identity check (fixed Inspur ENFI1100-T4 duplicate)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 22:08:02 +03:00
114 changed files with 23022 additions and 3136 deletions

3
.gitmodules vendored
View File

@@ -4,3 +4,6 @@
[submodule "bible"]
path = bible
url = https://git.mchus.pro/mchus/bible.git
[submodule "internal/chart"]
path = internal/chart
url = https://git.mchus.pro/reanimator/chart.git

View File

@@ -6,6 +6,6 @@ Start with `bible/rules/patterns/` for specific contracts.
## Project Architecture
Read `bible-local/` — LOGPile specific architecture.
Read order: `bible-local/README.md``01-overview.md` → relevant files for the task.
Read order: `bible-local/README.md``01-overview.md` `02-architecture.md``04-data-models.md` relevant file(s) for the task.
Every architectural decision specific to this project must be recorded in `bible-local/10-decisions.md`.

View File

@@ -6,6 +6,6 @@ Start with `bible/rules/patterns/` for specific contracts.
## Project Architecture
Read `bible-local/` — LOGPile specific architecture.
Read order: `bible-local/README.md``01-overview.md` → relevant files for the task.
Read order: `bible-local/README.md``01-overview.md` `02-architecture.md``04-data-models.md` relevant file(s) for the task.
Every architectural decision specific to this project must be recorded in `bible-local/10-decisions.md`.

View File

@@ -2,9 +2,27 @@
Standalone Go application for BMC diagnostics analysis with an embedded web UI.
## What it does
- Parses vendor diagnostic archives into a normalized hardware inventory
- Collects live BMC data via Redfish
- Exports normalized data as CSV, raw re-analysis bundles, and Reanimator JSON
- Runs as a single Go binary with embedded UI assets
## Documentation
- Architecture and technical documentation (single source of truth): [`docs/bible/README.md`](docs/bible/README.md)
- Shared engineering rules: [`bible/README.md`](bible/README.md)
- Project architecture and API contracts: [`bible-local/README.md`](bible-local/README.md)
- Agent entrypoints: [`AGENTS.md`](AGENTS.md), [`CLAUDE.md`](CLAUDE.md)
## Run
```bash
make build
./bin/logpile
```
Default port: `8082`
## License

2
bible

Submodule bible updated: 0c829182a1...456c1f022c

View File

@@ -1,35 +1,46 @@
# 01 — Overview
## What is LOGPile?
## Purpose
LOGPile is a standalone Go application for BMC (Baseboard Management Controller)
diagnostics analysis with an embedded web UI.
It runs as a single binary with no external file dependencies.
LOGPile is a standalone Go application for BMC diagnostics analysis with an embedded web UI.
It runs as a single binary and normalizes hardware data from archives or live Redfish collection.
## Operating modes
| Mode | Entry point | Description |
|------|-------------|-------------|
| **Offline / archive** | `POST /api/upload` | Upload a vendor diagnostic archive or a JSON snapshot; parse and display in UI |
| **Live / Redfish** | `POST /api/collect` | Connect to a live BMC via Redfish API, collect hardware inventory, display and export |
| Mode | Entry point | Outcome |
|------|-------------|---------|
| Archive upload | `POST /api/upload` | Parse a supported archive, raw export bundle, or JSON snapshot into `AnalysisResult` |
| Live collection | `POST /api/collect` | Collect from a live BMC via Redfish and store the result in memory |
| Batch convert | `POST /api/convert` | Convert multiple supported input files into Reanimator JSON in a ZIP artifact |
Both modes produce the same in-memory `AnalysisResult` structure and expose it
through the same API and UI.
All modes converge on the same normalized hardware model and exporter pipeline.
## Key capabilities
## In scope
- Single self-contained binary with embedded HTML/JS/CSS (no static file serving required).
- Vendor archive parsing: Inspur/Kaytus, Dell TSR, NVIDIA HGX Field Diagnostics,
NVIDIA Bug Report, Unraid, XigmaNAS, Generic text fallback.
- Live Redfish collection with async progress tracking.
- Normalized hardware inventory: CPU / RAM / Storage / GPU / PSU / NIC / PCIe / Firmware.
- Raw `redfish_tree` snapshot stored in `RawPayloads` for future offline re-analysis.
- Re-upload of a JSON snapshot for offline work (`/api/upload` accepts `AnalysisResult` JSON).
- Export in CSV, JSON (full `AnalysisResult`), and Reanimator format.
- PCI device model resolution via embedded `pci.ids` (no hardcoded model strings).
- Single-binary desktop/server utility with embedded UI
- Vendor archive parsing and live Redfish collection
- Canonical hardware inventory across UI and exports
- Reopenable raw export bundles for future re-analysis
- Reanimator export and batch conversion workflows
- Embedded `pci.ids` lookup for vendor/device name enrichment
## Non-goals (current scope)
## Current vendor coverage
- No persistent storage — all state is in-memory per process lifetime.
- IPMI collector is a mock scaffold only; real IPMI support is not implemented.
- No authentication layer on the HTTP server.
- Dell TSR
- Reanimator Easy Bee support bundles
- H3C SDS G5/G6
- Inspur / Kaytus
- HPE iLO AHS
- NVIDIA HGX Field Diagnostics
- NVIDIA Bug Report
- Unraid
- xFusion iBMC dump / file export
- XigmaNAS
- Generic fallback parser
## Non-goals
- Persistent storage or multi-user state
- Production IPMI collection
- Authentication/authorization on the built-in HTTP server
- Long-term server-side job history beyond in-memory process lifetime

View File

@@ -2,114 +2,97 @@
## Runtime stack
| Layer | Technology |
|-------|------------|
| Layer | Implementation |
|-------|----------------|
| Language | Go 1.22+ |
| HTTP | `net/http`, `http.ServeMux` |
| UI | Embedded via `//go:embed` in `web/embed.go` (templates + static assets) |
| State | In-memory only — no database |
| Build | `CGO_ENABLED=0`, single static binary |
| HTTP | `net/http` + `http.ServeMux` |
| UI | Embedded templates and static assets via `go:embed` |
| State | In-memory only |
| Build | `CGO_ENABLED=0`, single binary |
Default port: **8082**
Default port: `8082`
## Directory structure
Audit result rendering is delegated to embedded `reanimator/chart`, vendored as git submodule `internal/chart`.
LOGPile remains responsible for upload, collection, parsing, normalization, and Reanimator export generation.
```
cmd/logpile/main.go # Binary entry point, CLI flag parsing
internal/
collector/ # Live data collectors
registry.go # Collector registration
redfish.go # Redfish connector (real implementation)
ipmi_mock.go # IPMI mock connector (scaffold)
types.go # Connector request/progress contracts
parser/ # Archive parsers
parser.go # BMCParser (dispatcher) + parse orchestration
archive.go # Archive extraction helpers
registry.go # Parser registry + detect/selection
interface.go # VendorParser interface
vendors/ # Vendor-specific parser modules
vendors.go # Import-side-effect registrations
dell/
inspur/
nvidia/
nvidia_bug_report/
unraid/
xigmanas/
generic/
pciids/ # PCI IDs lookup (embedded pci.ids)
server/ # HTTP layer
server.go # Server struct, route registration
handlers.go # All HTTP handler functions
exporter/ # Export formatters
exporter.go # CSV + JSON exporters
reanimator_models.go
reanimator_converter.go
models/ # Shared data contracts
web/
embed.go # go:embed directive
templates/ # HTML templates
static/ # JS / CSS
js/app.js # Frontend — API contract consumer
## Code map
```text
cmd/logpile/main.go entrypoint and CLI flags
internal/server/ HTTP handlers, jobs, upload/export flows
internal/ingest/ source-family orchestration for upload and raw replay
internal/collector/ live collection and Redfish replay
internal/analyzer/ shared analysis helpers
internal/parser/ archive extraction and parser dispatch
internal/exporter/ CSV and Reanimator conversion
internal/chart/ vendored `reanimator/chart` viewer submodule
internal/models/ stable data contracts
web/ embedded UI assets
```
## In-memory state
## Server state
The `Server` struct in `internal/server/server.go` holds:
`internal/server.Server` stores:
| Field | Type | Description |
|-------|------|-------------|
| `result` | `*models.AnalysisResult` | Current parsed/collected dataset |
| `detectedVendor` | `string` | Vendor identifier from last parse |
| `jobManager` | `*JobManager` | Tracks live collect job status/logs |
| `collectors` | `*collector.Registry` | Registered live collection connectors |
| Field | Purpose |
|------|---------|
| `result` | Current `AnalysisResult` shown in UI and used by exports |
| `detectedVendor` | Parser/collector identity for the current dataset |
| `rawExport` | Reopenable raw-export package associated with current result |
| `jobManager` | Shared async job state for collect and convert flows |
| `collectors` | Registered live collectors (`redfish`, `ipmi`) |
| `convertOutput` | Temporary ZIP artifacts for batch convert downloads |
State is replaced atomically on successful upload or collect.
On a failed/canceled collect, the previous `result` is preserved unchanged.
State is replaced only on successful upload or successful live collection.
Failed or canceled jobs do not overwrite the previous dataset.
## Upload flow (`POST /api/upload`)
## Main flows
```
multipart form field: "archive"
├─ file looks like JSON?
│ └─ parse as models.AnalysisResult snapshot → store in Server.result
└─ otherwise
└─ parser.NewBMCParser().ParseFromReader(...)
├─ try all registered vendor parsers (highest confidence wins)
└─ result → store in Server.result
```
### Upload
## Live collect flow (`POST /api/collect`)
1. `POST /api/upload` receives multipart field `archive`
2. `internal/ingest.Service` resolves the source family
3. JSON inputs are checked for raw-export package or `AnalysisResult` snapshot
4. Non-JSON archives go through the archive parser family
5. Archive metadata is normalized onto `AnalysisResult`
6. Result becomes the current in-memory dataset
```
validate request (host / protocol / port / username / auth_type / tls_mode)
└─ launch async job
├─ progress callback → job log (queryable via GET /api/collect/{id})
├─ success:
│ set source metadata (source_type=api, protocol, host, date)
│ store result in Server.result
└─ failure / cancel:
previous Server.result unchanged
```
### Live collect
Job lifecycle states: `queued → running → success | failed | canceled`
1. `POST /api/collect` validates request fields
2. Server creates an async job and returns `202 Accepted`
3. Selected collector gathers raw data
4. For Redfish, collector runs minimal discovery, matches Redfish profiles, and builds an acquisition plan
5. Collector applies profile tuning hints (for example crawl breadth, prefetch, bounded plan-B passes)
6. Collector saves `raw_payloads.redfish_tree` plus acquisition diagnostics
7. Result is normalized, source metadata applied, and state replaced on success
### Batch convert
1. `POST /api/convert` accepts multiple files
2. Each supported file is analyzed independently
3. Successful results are converted to Reanimator JSON
4. Outputs are packaged into a temporary ZIP artifact
5. Client polls job status and downloads the artifact when ready
## Redfish design rule
Live Redfish collection and offline Redfish re-analysis must use the same replay path.
The collector first captures `raw_payloads.redfish_tree`, then the replay logic builds the normalized result.
Redfish is being split into two coordinated phases:
- acquisition: profile-driven snapshot collection strategy
- analysis: replay over the saved snapshot with the same profile framework
## PCI IDs lookup
Load/override order (`LOGPILE_PCI_IDS_PATH` has highest priority because it is loaded last):
Lookup order:
1. Embedded `internal/parser/vendors/pciids/pci.ids` (base dataset compiled into binary)
1. Embedded `internal/parser/vendors/pciids/pci.ids`
2. `./pci.ids`
3. `/usr/share/hwdata/pci.ids`
4. `/usr/share/misc/pci.ids`
5. `/opt/homebrew/share/pciids/pci.ids`
6. Paths from `LOGPILE_PCI_IDS_PATH` (colon-separated on Unix; later loaded, override same IDs)
6. Extra paths from `LOGPILE_PCI_IDS_PATH`
This means unknown GPU/NIC model strings can be updated by refreshing `pci.ids`
without any code change.
Later sources override earlier ones for the same IDs.

View File

@@ -2,38 +2,38 @@
## Conventions
- All endpoints under `/api/`.
- Request bodies: `application/json` or `multipart/form-data` where noted.
- Responses: `application/json` unless file download.
- Export filenames follow pattern: `YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
- All endpoints are under `/api/`
- JSON responses are used unless the endpoint downloads a file
- Async jobs share the same status model: `queued`, `running`, `success`, `failed`, `canceled`
- Export filenames use `YYYY-MM-DD (MODEL) - SERIAL.<ext>` when board metadata exists
- Embedded chart viewer routes live under `/chart/` and return HTML/CSS, not JSON
---
## Upload & Data Input
## Input endpoints
### `POST /api/upload`
Upload a vendor diagnostic archive or a JSON snapshot.
**Request:** `multipart/form-data`, field name `archive`.
Server-side multipart limit: **100 MiB**.
Uploads one file in multipart field `archive`.
Accepted inputs:
- `.tar`, `.tar.gz`, `.tgz` — vendor diagnostic archives
- `.txt` — plain text files
- JSON file containing a serialized `AnalysisResult` — re-loaded as-is
- supported archive/log formats from the parser registry
- `.json` `AnalysisResult` snapshots
- raw-export JSON packages
- raw-export ZIP bundles
**Response:** `200 OK` with parsed result summary, or `4xx`/`5xx` on error.
Result:
- parses or replays the input
- stores the result as current in-memory state
- returns parsed summary JSON
---
## Live Collection
Related helper:
- `GET /api/file-types` returns `archive_extensions`, `upload_extensions`, and `convert_extensions`
### `POST /api/collect`
Start a live collection job (`redfish` or `ipmi`).
Starts a live collection job.
Request body:
**Request body:**
```json
{
"host": "bmc01.example.local",
@@ -47,138 +47,154 @@ Start a live collection job (`redfish` or `ipmi`).
```
Supported values:
- `protocol`: `redfish` | `ipmi`
- `auth_type`: `password` | `token`
- `tls_mode`: `strict` | `insecure`
- `protocol`: `redfish` or `ipmi`
- `auth_type`: `password` or `token`
- `tls_mode`: `strict` or `insecure`
**Response:** `202 Accepted`
```json
{
"job_id": "job_a1b2c3d4e5f6",
"status": "queued",
"message": "Collection job accepted",
"created_at": "2026-02-23T12:00:00Z"
}
```
Responses:
- `202` on accepted job creation
- `400` on malformed JSON
- `422` on validation errors
Validation behavior:
- `400 Bad Request` for invalid JSON
- `422 Unprocessable Entity` for semantic validation errors (missing/invalid fields)
Optional request field:
- `power_on_if_host_off`: when `true`, Redfish collection may power on the host before collection if preflight found it powered off
- `debug_payloads`: when `true`, collector keeps extra diagnostic payloads and enables extended plan-B retries for slow HGX component inventory branches (`Assembly`, `Accelerators`, `Drives`, `NetworkAdapters`, `PCIeDevices`)
### `POST /api/collect/probe`
Checks that live API connectivity works and returns host power state before collection starts.
Typical request body is the same as `POST /api/collect`.
Typical response fields:
- `reachable`
- `protocol`
- `host_power_state`
- `host_powered_on`
- `power_control_available`
- `message`
### `GET /api/collect/{id}`
Poll job status and progress log.
**Response:**
```json
{
"job_id": "job_a1b2c3d4e5f6",
"status": "running",
"progress": 55,
"logs": ["..."],
"created_at": "2026-02-23T12:00:00Z",
"updated_at": "2026-02-23T12:00:10Z"
}
```
Status values: `queued` | `running` | `success` | `failed` | `canceled`
Returns async collection job status, progress, timestamps, and accumulated logs.
### `POST /api/collect/{id}/cancel`
Cancel a running job.
Requests cancellation for a running collection job.
---
### `POST /api/convert`
## Data Queries
Starts a batch conversion job that accepts multiple files under `files[]` or `files`.
Each supported file is parsed independently and converted to Reanimator JSON.
Response fields:
- `job_id`
- `status`
- `accepted`
- `skipped`
- `total_files`
### `GET /api/convert/{id}`
Returns batch convert job status using the same async job envelope as collection.
### `GET /api/convert/{id}/download`
Downloads the ZIP artifact produced by a successful convert job.
## Read endpoints
### `GET /api/status`
Returns source metadata for the current dataset.
If nothing is loaded, response is `{ "loaded": false }`.
```json
{
"loaded": true,
"filename": "redfish://bmc01.example.local",
"vendor": "redfish",
"source_type": "api",
"protocol": "redfish",
"target_host": "bmc01.example.local",
"collected_at": "2026-02-10T15:30:00Z",
"stats": { "events": 0, "sensors": 0, "fru": 0 }
}
```
`source_type`: `archive` | `api`
When no dataset is loaded, response is `{ "loaded": false }`.
Typical fields:
- `loaded`
- `filename`
- `vendor`
- `source_type`
- `protocol`
- `target_host`
- `source_timezone`
- `collected_at`
- `stats`
### `GET /api/config`
Returns source metadata plus:
Returns the main UI configuration payload, including:
- source metadata
- `hardware.board`
- `hardware.firmware`
- canonical `hardware.devices`
- computed `specification` summary lines
- computed specification lines
### `GET /api/events`
Returns parsed diagnostic events.
Returns events sorted newest first.
### `GET /api/sensors`
Returns sensor readings (temperatures, voltages, fan speeds).
Returns parsed sensors plus synthesized PSU voltage sensors when telemetry is available.
### `GET /api/serials`
Returns serial numbers built from canonical `hardware.devices`.
Returns serial-oriented inventory built from canonical devices.
### `GET /api/firmware`
Returns firmware versions built from canonical `hardware.devices`.
Returns firmware-oriented inventory built from canonical devices.
### `GET /api/parse-errors`
Returns normalized parse and collection issues combined from:
- Redfish fetch errors in `raw_payloads`
- raw-export collect logs
- derived partial-inventory warnings
### `GET /api/parsers`
Returns list of registered vendor parsers with their identifiers.
Returns registered parser metadata.
---
### `GET /api/file-types`
## Export
Returns supported file extensions for upload and batch convert.
## Viewer endpoints
### `GET /chart/current`
Renders the current in-memory dataset as Reanimator HTML using embedded `reanimator/chart`.
The server first converts the current result to Reanimator JSON, then passes that snapshot to the viewer.
### `GET /chart/static/...`
Serves embedded `reanimator/chart` static assets.
## Export endpoints
### `GET /api/export/csv`
Download serial numbers as CSV.
Downloads serial-number CSV.
### `GET /api/export/json`
Download full `AnalysisResult` as JSON (includes `raw_payloads`).
Downloads a raw-export artifact for reopen and re-analysis.
Current implementation emits a ZIP bundle containing:
- `raw_export.json`
- `collect.log`
- `parser_fields.json`
### `GET /api/export/reanimator`
Download hardware data in Reanimator format for asset tracking integration.
See [`07-exporters.md`](07-exporters.md) for full format spec.
Downloads Reanimator JSON built from the current normalized result.
---
## Management
## Management endpoints
### `DELETE /api/clear`
Clear current in-memory dataset.
Clears current in-memory dataset, raw export state, and temporary convert artifacts.
### `POST /api/shutdown`
Gracefully shut down the server process.
This endpoint terminates the current process after responding.
---
## Source metadata fields
Fields present in `/api/status` and `/api/config`:
| Field | Values |
|-------|--------|
| `source_type` | `archive` \| `api` |
| `protocol` | `redfish` \| `ipmi` (may be empty for archive uploads) |
| `target_host` | IP or hostname |
| `collected_at` | RFC3339 timestamp |
Gracefully shuts down the process after responding.

View File

@@ -1,104 +1,87 @@
# 04 — Data Models
## AnalysisResult
## Core contract: `AnalysisResult`
`internal/models/` — the central data contract shared by parsers, collectors, exporters, and the HTTP layer.
`internal/models/models.go` defines the shared result passed between parsers, collectors, server handlers, and exporters.
**Stability rule:** Never break the JSON shape of `AnalysisResult`.
Backward-compatible additions are allowed; removals or renames are not.
Stability rule:
- do not rename or remove JSON fields from `AnalysisResult`
- additive fields are allowed
- UI and exporter compatibility depends on this shape remaining stable
Key top-level fields:
Key fields:
| Field | Type | Description |
|-------|------|-------------|
| `filename` | `string` | Uploaded filename or generated live source identifier |
| `source_type` | `string` | `archive` or `api` |
| `protocol` | `string` | `redfish`, `ipmi`, or empty for archive uploads |
| `target_host` | `string` | BMC host for live collection |
| `collected_at` | `time.Time` | Upload/collection timestamp |
| `hardware` | `*HardwareConfig` | All normalized hardware inventory |
| `events` | `[]Event` | Diagnostic events from parsers |
| `fru` | `[]FRUInfo` | FRU/SDR-derived inventory details |
| `sensors` | `[]SensorReading` | Sensor readings |
| `raw_payloads` | `map[string]any` | Raw vendor data (e.g. `redfish_tree`) |
| Field | Meaning |
|------|---------|
| `filename` | Original upload name or synthesized live source name |
| `source_type` | `archive` or `api` |
| `protocol` | `redfish`, `ipmi`, or empty for archive uploads |
| `target_host` | Hostname or IP for live collection |
| `source_timezone` | Source timezone/offset if known |
| `collected_at` | Canonical collection/upload time |
| `raw_payloads` | Raw source data used for replay or diagnostics |
| `events` | Parsed event timeline |
| `fru` | FRU-derived inventory details |
| `sensors` | Sensor readings |
| `hardware` | Normalized hardware inventory |
`raw_payloads` is the durable source for offline re-analysis (especially for Redfish).
Normalized fields should be treated as derivable output from raw source data.
## `HardwareConfig`
### Hardware sub-structure
Main sections:
```
HardwareConfig
├── board BoardInfo — server/motherboard identity
├── devices []HardwareDevice — CANONICAL INVENTORY (see below)
├── cpus []CPU
├── memory []MemoryDIMM
├── storage []Storage
├── volumes []StorageVolume — logical RAID/VROC volumes
├── pcie_devices []PCIeDevice
├── gpus []GPU
├── network_adapters []NetworkAdapter
├── network_cards []NIC (legacy/alternate source field)
├── power_supplies []PSU
└── firmware []FirmwareInfo
```text
hardware.board
hardware.devices
hardware.cpus
hardware.memory
hardware.storage
hardware.volumes
hardware.pcie_devices
hardware.gpus
hardware.network_adapters
hardware.network_cards
hardware.power_supplies
hardware.firmware
```
---
`network_cards` is legacy/alternate source data.
`hardware.devices` is the canonical cross-section inventory.
## Canonical Device Repository (`hardware.devices`)
## Canonical inventory: `hardware.devices`
`hardware.devices` is the **single source of truth** for hardware inventory.
`hardware.devices` is the single source of truth for device-oriented UI and Reanimator export.
### Rules — must not be violated
Required rules:
1. All UI tabs displaying hardware components **must read from `hardware.devices`**.
2. The Device Inventory tab shows kinds: `pcie`, `storage`, `gpu`, `network`.
3. The Reanimator exporter **must use the same `hardware.devices`** as the UI.
4. Any discrepancy between UI data and Reanimator export data is a **bug**.
5. New hardware attributes must be added to the canonical device schema **first**,
then mapped to Reanimator/UI — never the other way around.
6. The exporter should group/filter canonical records by section, not rebuild data
from multiple sources.
1. UI hardware views must read from `hardware.devices`
2. Reanimator conversion must derive device sections from `hardware.devices`
3. UI/export mismatches are bugs, not accepted divergence
4. New shared device fields belong in `HardwareDevice` first
### Deduplication logic (applied once by repository builder)
Deduplication priority:
| Priority | Key used |
|----------|----------|
| 1 | `serial_number` — usable (not empty, not `N/A`, `NA`, `NONE`, `NULL`, `UNKNOWN`, `-`) |
| 2 | `bdf` — PCI Bus:Device.Function address |
| 3 | No merge — records remain distinct if both serial and bdf are absent |
| Priority | Key |
|----------|-----|
| 1 | usable `serial_number` |
| 2 | `bdf` |
| 3 | keep records separate |
### Device schema alignment
## Raw payloads
Keep `hardware.devices` schema as close as possible to Reanimator JSON field names.
This minimizes translation logic in the exporter and prevents drift.
`raw_payloads` is authoritative for replayable sources.
---
Current important payloads:
- `redfish_tree`
- `redfish_fetch_errors`
- `source_timezone`
## Source metadata fields (stored directly on `AnalysisResult`)
Normalized hardware fields are derived output, not the long-term source of truth.
Carried by both `/api/status` and `/api/config`:
## Raw export package
```json
{
"source_type": "api",
"protocol": "redfish",
"target_host": "10.0.0.1",
"collected_at": "2026-02-10T15:30:00Z"
}
```
Valid `source_type` values: `archive`, `api`
Valid `protocol` values: `redfish`, `ipmi` (empty is allowed for archive uploads)
---
## Raw Export Package (reopenable artifact)
`Export Raw Data` does not merely dump `AnalysisResult`; it emits a reopenable raw package
(JSON or ZIP bundle) that carries source data required for re-analysis.
`/api/export/json` produces a reopenable raw-export artifact.
Design rules:
- raw source is authoritative (`redfish_tree` or original file bytes)
- imports must re-analyze from raw source
- parsed field snapshots included in bundles are diagnostic artifacts, not the source of truth
- raw source stays authoritative
- uploads of raw-export artifacts must re-analyze from raw source
- parsed snapshots inside the bundle are diagnostic only

View File

@@ -3,107 +3,180 @@
Collectors live in `internal/collector/`.
Core files:
- `internal/collector/registry.go` — connector registry (`redfish`, `ipmi`)
- `internal/collector/redfish.go` — real Redfish connector
- `internal/collector/ipmi_mock.go` — IPMI mock connector scaffold
- `internal/collector/types.go` — request/progress contracts
- `registry.go` for protocol registration
- `redfish.go` for live collection
- `redfish_replay.go` for replay from raw payloads
- `redfish_replay_gpu.go` for profile-driven GPU replay collectors and GPU fallback helpers
- `redfish_replay_storage.go` for profile-driven storage replay collectors and storage recovery helpers
- `redfish_replay_inventory.go` for replay inventory collectors (PCIe, NIC, BMC MAC, NIC enrichment)
- `redfish_replay_fru.go` for board fallback helpers and Assembly/FRU replay extraction
- `redfish_replay_profiles.go` for profile-driven replay helpers and vendor-aware recovery helpers
- `redfishprofile/` for Redfish profile matching and acquisition/analysis hooks
- `ipmi_mock.go` for the placeholder IPMI implementation
- `types.go` for request/progress contracts
---
## Redfish collector
## Redfish Collector (`redfish`)
Status: active production path.
**Status:** Production-ready.
Request fields passed from the server:
- `host`
- `port`
- `username`
- `auth_type`
- credential field (`password` or token)
- `tls_mode`
- optional `power_on_if_host_off`
- optional `debug_payloads` for extended diagnostics
### Request contract (from server)
### Core rule
Passed through from `/api/collect` after validation:
- `host`, `port`, `username`
- `auth_type=password|token` (+ matching credential field)
- `tls_mode=strict|insecure`
Live collection and replay must stay behaviorally aligned.
If the collector adds a fallback, probe, or normalization rule, replay must mirror it.
### Discovery
### Preflight and host power
Dynamic — does not assume fixed paths. Discovers:
- `Systems` collection → per-system resources
- `Chassis` collection → enclosure/board data
- `Managers` collection → BMC/firmware info
- `Probe()` is used before collection to verify API connectivity and report current host `PowerState`
- if the host is off, the collector logs a warning and proceeds with collection; inventory data may
be incomplete when the host is powered off
- power-on and power-off are not performed by the collector
### Collected data
### Skip hung requests
| Category | Notes |
|----------|-------|
| CPU | Model, cores, threads, socket, status |
| Memory | DIMM slot, size, type, speed, serial, manufacturer |
| Storage | Slot, type, model, serial, firmware, interface, status |
| GPU | Detected via PCIe class + NVIDIA vendor ID |
| PSU | Model, serial, wattage, firmware, telemetry (input/output power, voltage) |
| NIC | Model, serial, port count, BDF |
| PCIe | Slot, vendor_id, device_id, BDF, link width/speed |
| Firmware | BIOS, BMC versions |
Redfish collection uses a two-level context model:
### Raw snapshot
- `ctx` — job lifetime context, cancelled only on explicit job cancel
- `collectCtx` — collection phase context, derived from `ctx`; covers snapshot, prefetch, and plan-B
Full Redfish response tree is stored in `result.RawPayloads["redfish_tree"]`.
This allows future offline re-analysis without re-collecting from a live BMC.
`collectCtx` is cancelled when the user presses "Пропустить зависшие" (skip hung).
On skip, all in-flight HTTP requests in the current phase are aborted immediately via context
cancellation, the crawler and plan-B loops exit, and execution proceeds to the replay phase using
whatever was collected in `rawTree`. The result is partial but valid.
### Unified Redfish analysis pipeline (live == replay)
The skip signal travels: UI button → `POST /api/collect/{id}/skip``JobManager.SkipJob()`
closes `skipCh` → goroutine in `Collect()``cancelCollect()`.
LOGPile uses a **single Redfish analyzer path**:
The skip button is visible during `running` state and hidden once the job reaches a terminal state.
1. Live collector crawls the Redfish API and builds `raw_payloads.redfish_tree`
2. Parsed result is produced by replaying that tree through the same analyzer used by raw import
### Extended diagnostics toggle
This guarantees that live collection and `Export Raw Data` re-open/re-analyze produce the same
normalized output for the same `redfish_tree`.
The live collect form exposes a user-facing checkbox for extended diagnostics.
### Snapshot crawler behavior (important)
- default collection prioritizes inventory completeness and bounded runtime
- when extended diagnostics is off, heavy HGX component-chassis critical plan-B retries
(`Assembly`, `Accelerators`, `Drives`, `NetworkAdapters`, `PCIeDevices`) are skipped
- when extended diagnostics is on, those retries are allowed and extra debug payloads are collected
The Redfish snapshot crawler is intentionally:
- **bounded** (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
- **prioritized** (PCIe, Fabrics, FirmwareInventory, Storage, PowerSubsystem, ThermalSubsystem)
- **tolerant** (skips noisy expected failures, strips `#fragment` from `@odata.id`)
This toggle is intended for operator-driven deep diagnostics on problematic hosts, not for the default path.
Design notes:
- Queue capacity is sized to snapshot cap to avoid worker deadlocks on large trees.
- UI progress is coarse and human-readable; detailed per-request diagnostics are available via debug logs.
- `LOGPILE_REDFISH_DEBUG=1` and `LOGPILE_REDFISH_SNAPSHOT_DEBUG=1` enable console diagnostics.
### Discovery model
### Parsing guidelines
The collector does not rely on one fixed vendor tree.
It discovers and follows Redfish resources dynamically from root collections such as:
- `Systems`
- `Chassis`
- `Managers`
When adding Redfish mappings, follow these principles:
- Support alternate collection paths (resources may appear at different odata URLs).
- Follow `@odata.id` references and handle embedded `Members` arrays.
- Prefer **raw-tree replay compatibility**: if live collector adds a fallback/probe, replay analyzer must mirror it.
- Deduplicate by serial / BDF / slot+model (in that priority order).
- Prefer tolerant/fallback parsing — missing fields should be silently skipped,
not cause the whole collection to fail.
After minimal discovery the collector builds `MatchSignals` and selects a Redfish profile mode:
- `matched` when one or more profiles score with high confidence
- `fallback` when vendor/platform confidence is low; in this mode the collector aggregates safe additive profile probes to maximize snapshot completeness
### Vendor-specific storage fallbacks (Supermicro and similar)
Profile modules may contribute:
- primary acquisition seeds
- bounded `PlanBPaths` for secondary recovery
- critical paths
- acquisition notes/diagnostics
- tuning hints such as snapshot document cap, prefetch behavior, and expensive post-probe toggles
- post-probe policy for numeric collection recovery, direct NVMe `Disk.Bay` recovery, and sensor post-probe enablement
- recovery policy for critical collection member retry, slow numeric plan-B probing, and profile-specific plan-B activation
- scoped path policy for discovered `Systems/*`, `Chassis/*`, and `Managers/*` branches when a profile needs extra seeds/critical targets beyond the vendor-neutral core set
- prefetch policy for which critical paths are eligible for adaptive prefetch and which path shapes are explicitly excluded
When standard `Storage/.../Drives` collections are empty, collector/replay may recover drives via:
- `Storage.Links.Enclosures[*] -> .../Drives`
- direct probing of finite `Disk.Bay` candidates (`Disk.Bay.0`, `Disk.Bay0`, `.../0`)
Model- or topology-specific `CriticalPaths` and profile `PlanBPaths` must live in the profile
module that owns the behavior. The collector core may execute those paths, but it should not
hardcode vendor-specific recovery targets.
The same rule applies to expensive post-probe decisions: the collector core may execute bounded
post-probe loops, but profiles own whether those loops are enabled for a given platform shape.
The same rule applies to critical recovery passes: the collector core may run bounded plan-B
loops, but profiles own whether member retry, slow numeric recovery, and profile-specific plan-B
passes are enabled.
When a profile needs extra discovered-path branches such as storage controller subtrees, it must
provide them as scoped suffix policy rather than by hardcoding platform-shaped suffixes into the
collector core baseline seed list.
The same applies to prefetch shaping: the collector core may execute adaptive prefetch, but
profiles own the include/exclude rules for which critical paths should participate.
The same applies to critical inventory shaping: the collector core should keep only a minimal
vendor-neutral critical baseline, while profiles own additional system/chassis/manager critical
suffixes and top-level critical targets.
Resolved live acquisition plans should be built inside `redfishprofile/`, not by hand in
`redfish.go`. The collector core should receive discovered resources plus the selected profile
plan and then execute the resolved seed/critical paths.
When profile behavior depends on what discovery actually returned, use a post-discovery
refinement hook in `redfishprofile/` instead of hardcoding guessed absolute paths in the static
plan. MSI GPU chassis refinement is the reference example.
This is required for some BMCs that publish drive inventory in vendor-specific paths while leaving
standard collections empty.
Live Redfish collection must expose profile-match diagnostics:
- collector logs must include the selected modules and score for every known module
- job status responses must carry structured `active_modules` and `module_scores`
- the collect page should render active modules as chips from structured status data, not by
parsing log lines
### PSU source preference (newer Redfish)
Profile matching may use stable platform grammar signals in addition to vendor strings:
- discovered member/resource naming from lightweight discovery collections
- firmware inventory member IDs
- OEM action names and linked target paths embedded in discovery documents
- replay-only snapshot hints such as OEM assembly/type markers when they are present in
`raw_payloads.redfish_tree`
PSU inventory source order:
1. `Chassis/*/PowerSubsystem/PowerSupplies` (preferred on X14+/newer Redfish)
2. `Chassis/*/Power` (legacy fallback)
On replay, profile-derived analysis directives may enable vendor-specific inventory linking
helpers such as processor-GPU fallback, chassis-ID alias resolution, and bounded storage recovery.
Replay should now resolve a structured analysis plan inside `redfishprofile/`, analogous to the
live acquisition plan. The replay core may execute collectors against the resolved directives, but
snapshot-aware vendor decisions should live in profile analysis hooks, not in `redfish_replay.go`.
GPU and storage replay executors should consume the resolved analysis plan directly, not a raw
`AnalysisDirectives` struct, so the boundary between planning and execution stays explicit.
### Progress reporting
Profile matching and acquisition tuning must be regression-tested against repo-owned compact
fixtures under `internal/collector/redfishprofile/testdata/`, derived from representative
raw-export snapshots, for at least MSI and Supermicro shapes.
When multiple raw-export snapshots exist for the same platform, profile selection must remain
stable across those sibling fixtures unless the topology actually changes.
Analysis-plan metadata should be stored in replay raw payloads so vendor hook activation is
debuggable offline.
The collector emits progress log entries at each stage (connecting, enumerating systems,
collecting CPUs, etc.) so the UI can display meaningful status.
Current progress message strings are user-facing and may be localized.
### Stored raw data
---
Important raw payloads:
- `raw_payloads.redfish_tree`
- `raw_payloads.redfish_fetch_errors`
- `raw_payloads.redfish_profiles`
- `raw_payloads.source_timezone` when available
## IPMI Collector (`ipmi`)
### Snapshot crawler rules
**Status:** Mock scaffold only — not implemented.
- bounded by `LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`
- prioritized toward high-value inventory paths
- tolerant of expected vendor-specific failures
- normalizes `@odata.id` values before queueing
Registered in the collector registry but returns placeholder data.
Real IPMI support is a future work item.
### Redfish implementation guidance
When changing collection logic:
1. Prefer profile modules over ad-hoc vendor branches in the collector core
2. Keep expensive probing bounded
3. Deduplicate by serial, then BDF, then location/model fallbacks
4. Preserve replay determinism from saved raw payloads
5. Add tests for both the motivating topology and a negative case
### Known vendor fallbacks
- empty standard drive collections may trigger bounded `Disk.Bay` probing
- `Storage.Links.Enclosures[*]` may be followed to recover physical drives
- `PowerSubsystem/PowerSupplies` is preferred over legacy `Power` when available
## IPMI collector
Status: mock scaffold only.
It remains registered for protocol completeness, but it is not a real collection path.

View File

@@ -2,261 +2,72 @@
## Framework
### Registration
Parsers live in `internal/parser/` and vendor implementations live in `internal/parser/vendors/`.
Each vendor parser registers itself via Go's `init()` side-effect import pattern.
Core behavior:
- registration uses `init()` side effects
- all registered parsers run `Detect()`
- the highest-confidence parser wins
- generic fallback stays last and low-confidence
All registrations are collected in `internal/parser/vendors/vendors.go`:
```go
import (
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
// etc.
)
```
### VendorParser interface
`VendorParser` contract:
```go
type VendorParser interface {
Name() string // human-readable name
Vendor() string // vendor identifier string
Version() string // parser version (increment on logic changes)
Detect(files []ExtractedFile) int // confidence 0100
Name() string
Vendor() string
Version() string
Detect(files []ExtractedFile) int
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
}
```
### Selection logic
## Adding a parser
All registered parsers run `Detect()` against the uploaded archive's file list.
The parser with the **highest confidence score** is selected.
Multiple parsers may return >0; only the top scorer is used.
1. Create `internal/parser/vendors/<vendor>/`
2. Start from `internal/parser/vendors/template/parser.go.template`
3. Implement `Detect()` and `Parse()`
4. Add a blank import in `internal/parser/vendors/vendors.go`
5. Add at least one positive and one negative detection test
### Adding a new vendor parser
## Data quality rules
1. `mkdir -p internal/parser/vendors/VENDORNAME`
2. Copy `internal/parser/vendors/template/parser.go.template` as starting point.
3. Implement `Detect()` and `Parse()`.
4. Add blank import to `vendors/vendors.go`.
### System firmware only in `hardware.firmware`
`Detect()` tips:
- Look for unique filenames or directory names.
- Check file content for vendor-specific markers.
- Return 70+ only when confident; return 0 if clearly not a match.
`hardware.firmware` must contain system-level firmware only.
Device-bound firmware belongs on the device record and must not be duplicated at the top level.
### Parser versioning
### Strip embedded MAC addresses from model names
Each parser file contains a `parserVersion` constant.
Increment the version whenever parsing logic changes — this helps trace which
version produced a given result.
If a source embeds ` - XX:XX:XX:XX:XX:XX` in a model/name field, remove that suffix before storing it.
---
### Use `pci.ids` for empty or generic PCI model names
## Parser data quality rules
When `vendor_id` and `device_id` are known but the model name is missing or generic, resolve the name via `internal/parser/vendors/pciids`.
### FirmwareInfo — system-level only
## Active vendor coverage
`Hardware.Firmware` must contain **only system-level firmware**: BIOS, BMC/iDRAC,
Lifecycle Controller, CPLD, storage controllers, BOSS adapters.
| Vendor ID | Input family | Notes |
|-----------|--------------|-------|
| `dell` | TSR ZIP archives | Broad hardware, firmware, sensors, lifecycle events |
| `easy_bee` | `bee-support-*.tar.gz` | Imports embedded `export/bee-audit.json` snapshot from reanimator-easy-bee bundles |
| `h3c_g5` | H3C SDS G5 bundles | INI/XML/CSV-driven hardware and event parsing |
| `h3c_g6` | H3C SDS G6 bundles | Similar flow with G6-specific files |
| `hpe_ilo_ahs` | HPE iLO Active Health System (`.ahs`) | Proprietary `ABJR` container with gzip-compressed `zbb` members; parser combines SMBIOS-style inventory strings and embedded Redfish storage JSON |
| `inspur` | onekeylog archives | FRU/SDR plus optional Redis enrichment |
| `nvidia` | HGX Field Diagnostics | GPU- and fabric-heavy diagnostic input |
| `nvidia_bug_report` | `nvidia-bug-report-*.log.gz` | dmidecode, lspci, NVIDIA driver sections |
| `unraid` | Unraid diagnostics/log bundles | Server and storage-focused parsing |
| `xfusion` | xFusion iBMC `tar.gz` dump / file export | AppDump + RTOSDump + LogDump merge for hardware and firmware |
| `xigmanas` | XigmaNAS plain logs | FreeBSD/NAS-oriented inventory |
| `generic` | fallback | Low-confidence text fallback when nothing else matches |
**Device-bound firmware** (NIC, GPU, PSU, disk, backplane) **must NOT be added to
`Hardware.Firmware`**. It belongs to the device's own `Firmware` field and is already
present there. Duplicating it in `Hardware.Firmware` causes double entries in Reanimator.
## Practical guidance
The Reanimator exporter filters by `FirmwareInfo.DeviceName` prefix and by
`FirmwareInfo.Description` (FQDD prefix). Parsers must cooperate:
- Store the device's FQDD (or equivalent slot identifier) in `FirmwareInfo.Description`
for all firmware entries that come from a per-device inventory source (e.g. Dell
`DCIM_SoftwareIdentity`).
- FQDD prefixes that are device-bound: `NIC.`, `PSU.`, `Disk.`, `RAID.Backplane.`, `GPU.`
### NIC/device model names — strip embedded MAC addresses
Some vendors (confirmed: Dell TSR) embed the MAC address in the device model name field,
e.g. `ProductName = "NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08"`.
**Rule:** Strip any ` - XX:XX:XX:XX:XX:XX` suffix from model/name strings before storing
them in `FirmwareInfo.DeviceName`, `NetworkAdapter.Model`, or any other model field.
Use `nicMACInModelRE` (defined in the Dell parser) or an equivalent regex:
```
\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$
```
This applies to **all** string fields used as device names or model identifiers.
### PCI device name enrichment via pci.ids
If a PCIe device, GPU, NIC, or any hardware component has a `vendor_id` + `device_id`
but its model/name field is **empty or generic** (e.g. blank, equals the description,
or is just a raw hex ID), the parser **must** attempt to resolve the human-readable
model name from the embedded `pci.ids` database before storing the result.
**Rule:** When `Model` (or equivalent name field) is empty and both `VendorID` and
`DeviceID` are non-zero, call the pciids lookup and use the result as the model name.
```go
// Example pattern — use in any parser that handles PCIe/GPU/NIC devices:
if strings.TrimSpace(device.Model) == "" && device.VendorID != 0 && device.DeviceID != 0 {
if name := pciids.Lookup(device.VendorID, device.DeviceID); name != "" {
device.Model = name
}
}
```
This rule applies to all vendor parsers. The pciids package is available at
`internal/parser/vendors/pciids`. See ADL-005 for the rationale.
**Do not hardcode model name strings.** If a device is unknown today, it will be
resolved automatically once `pci.ids` is updated.
---
## Vendor parsers
### Inspur / Kaytus (`inspur`)
**Status:** Ready. Tested on KR4268X2 (onekeylog format).
**Archive format:** `.tar.gz` onekeylog
**Primary source files:**
| File | Content |
|------|---------|
| `asset.json` | Base hardware inventory |
| `component.log` | Component list |
| `devicefrusdr.log` | FRU and SDR data |
| `onekeylog/runningdata/redis-dump.rdb` | Runtime enrichment (optional) |
**Redis RDB enrichment** (applied conservatively — fills missing fields only):
- GPU: `serial_number`, `firmware` (VBIOS/FW), runtime telemetry
- NIC: firmware, serial, part number (when text logs leave fields empty)
**Module structure:**
```
inspur/
parser.go — main parser + registration
sdr.go — sensor/SDR parsing
fru.go — FRU serial parsing
asset.go — asset.json parsing
syslog.go — syslog parsing
```
---
### Dell TSR (`dell`)
**Status:** Ready (v3.0). Tested on nested TSR archives with embedded `*.pl.zip`.
**Archive format:** `.zip` (outer archive + nested `*.pl.zip`)
**Primary source files:**
- `tsr/metadata.json`
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml`
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml`
- `tsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xml`
- `tsr/hardware/sysinfo/lcfiles/curr_lclog.xml`
**Extracted data:**
- Board/system identity and BIOS/iDRAC firmware
- CPU, memory, physical disks, virtual disks, PSU, NIC, PCIe
- GPU inventory (`DCIM_VideoView`) + GPU sensor enrichment (`DCIM_GPUSensor`)
- Controller/backplane inventory (`DCIM_ControllerView`, `DCIM_EnclosureView`)
- Sensor readings (temperature/voltage/current/power/fan/utilization)
- Lifecycle events (`curr_lclog.xml`)
---
### NVIDIA HGX Field Diagnostics (`nvidia`)
**Status:** Ready (v1.1.0). Works with any server vendor.
**Archive format:** `.tar` / `.tar.gz`
**Confidence scoring:**
| File | Score |
|------|-------|
| `unified_summary.json` with "HGX Field Diag" marker | +40 |
| `summary.json` | +20 |
| `summary.csv` | +15 |
| `gpu_fieldiag/` directory | +15 |
**Source files:**
| File | Content |
|------|---------|
| `output.log` | dmidecode — server manufacturer, model, serial number |
| `unified_summary.json` | GPU details, NVSwitch devices, PCI addresses |
| `summary.json` | Diagnostic test results and error codes |
| `summary.csv` | Alternative test results format |
**Extracted data:**
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
**Severity mapping:**
- `info` — tests passed
- `warning` — e.g. "Row remapping failed"
- `critical` — error codes 300+
**Known limitations:**
- Detailed logs in `gpu_fieldiag/*.log` are not parsed.
- No CPU, memory, or storage extraction (not present in field diag archives).
---
### NVIDIA Bug Report (`nvidia_bug_report`)
**Status:** Ready (v1.0.0).
**File format:** `nvidia-bug-report-*.log.gz` (gzip-compressed text)
**Confidence:** 85 (high priority for matching filename pattern)
**Source sections parsed:**
| dmidecode section | Extracts |
|-------------------|---------|
| System Information | server serial, UUID, manufacturer, product name |
| Processor Information | CPU model, serial, core/thread count, frequency |
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
| Other source | Extracts |
|--------------|---------|
| `lspci -vvv` (Ethernet/Network/IB) | NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
| `/proc/driver/nvidia/gpus/*/information` | GPU model, BDF, UUID, VBIOS version, IRQ |
| NVRM version line | NVIDIA driver version |
**Known limitations:**
- Driver error/warning log lines not yet extracted.
- GPU temperature/utilization metrics require additional parsing sections.
---
### XigmaNAS (`xigmanas`)
**Status:** Ready.
**Archive format:** Plain log files (FreeBSD-based NAS system)
**Detection:** Files named `xigmanas`, `system`, or `dmesg`; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
**Extracted data:**
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
- Populates: `Hardware.Firmware`, `Hardware.CPUs`, `Hardware.Memory`, `Hardware.Storage`, `Sensors`
---
### Unraid (`unraid`)
**Status:** Ready (v1.0.0).
- Be conservative with high detect scores
- Prefer filling missing fields over overwriting stronger source data
- Keep parser version constants current when behavior changes
- Any new vendor-specific filtering or dedup logic must ship with tests for that vendor format
**Archive format:** Unraid diagnostics archive contents (text-heavy diagnostics directories).
@@ -312,6 +123,55 @@ with content markers (e.g. `Unraid kernel build`, parity data markers).
---
### HPE iLO AHS (`hpe_ilo_ahs`)
**Status:** Ready (v1.0.0). Tested on HPE ProLiant Gen11 `.ahs` export from iLO 6.
**Archive format:** `.ahs` single-file Active Health System export.
**Detection:** Single-file input with `ABJR` container header and HPE AHS member names
such as `CUST_INFO.DAT`, `*.zbb`, `ilo_boot_support.zbb`.
**Extracted data (current):**
- System board identity (manufacturer, model, serial, part number)
- iLO / System ROM / SPS top-level firmware
- CPU inventory (model-level)
- Memory DIMM inventory for populated slots
- PSU inventory
- PCIe / OCP NIC inventory from SMBIOS-style slot records
- Storage controller and physical drives from embedded Redfish JSON inside `zbb` members
- Basic iLO event log entries with timestamps when present
**Implementation note:** The format is proprietary. Parser support is intentionally hybrid:
container parsing (`ABJR` + gzip) plus structured extraction from embedded Redfish objects and
printable SMBIOS/FRU payloads. This is sufficient for inventory-grade parsing without decoding the
entire internal `zbb` schema.
---
### xFusion iBMC Dump / File Export (`xfusion`)
**Status:** Ready (v1.1.0). Tested on xFusion G5500 V7 `tar.gz` exports.
**Archive format:** `tar.gz` dump exported from the iBMC UI, including `AppDump/`, `RTOSDump/`,
and `LogDump/` trees.
**Detection:** `AppDump/FruData/fruinfo.txt`, `AppDump/card_manage/card_info`,
`RTOSDump/versioninfo/app_revision.txt`, and `LogDump/netcard/netcard_info.txt`.
**Extracted data (current):**
- Board / FRU inventory from `fruinfo.txt`
- CPU inventory from `CpuMem/cpu_info`
- Memory DIMM inventory from `CpuMem/mem_info`
- GPU inventory from `card_info`
- OCP NIC inventory by merging `card_info` with `LogDump/netcard/netcard_info.txt`
- PSU inventory from `BMC/psu_info.txt`
- Physical storage from `StorageMgnt/PhysicalDrivesInfo/*/disk_info`
- System firmware entries from `RTOSDump/versioninfo/app_revision.txt`
- Maintenance events from `LogDump/maintenance_log`
---
### Generic text fallback (`generic`)
**Status:** Ready (v1.0.0).
@@ -331,10 +191,13 @@ with content markers (e.g. `Unraid kernel build`, parity data markers).
| Vendor | ID | Status | Tested on |
|--------|----|--------|-----------|
| Dell TSR | `dell` | Ready | TSR nested zip archives |
| Reanimator Easy Bee | `easy_bee` | Ready | `bee-support-*.tar.gz` support bundles |
| HPE iLO AHS | `hpe_ilo_ahs` | Ready | iLO 6 `.ahs` exports |
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
| Unraid | `unraid` | Ready | Unraid diagnostics archives |
| xFusion iBMC dump | `xfusion` | Ready | G5500 V7 file-export `tar.gz` bundles |
| XigmaNAS | `xigmanas` | Ready | FreeBSD NAS logs |
| H3C SDS G5 | `h3c_g5` | Ready | H3C UniServer R4900 G5 SDS archives |
| H3C SDS G6 | `h3c_g6` | Ready | H3C UniServer R4700 G6 SDS archives |

View File

@@ -1,366 +1,93 @@
# 07 — Exporters & Reanimator Integration
## Export endpoints summary
| Endpoint | Format | Filename pattern |
|----------|--------|-----------------|
| `GET /api/export/csv` | CSV — serial numbers | `YYYY-MM-DD (MODEL) - SN.csv` |
| `GET /api/export/json` | **Raw export package** (JSON or ZIP bundle) for reopen/re-analysis | `YYYY-MM-DD (MODEL) - SN.(json|zip)` |
| `GET /api/export/reanimator` | Reanimator hardware JSON | `YYYY-MM-DD (MODEL) - SN.json` |
---
## Raw Export (`Export Raw Data`)
### Purpose
Preserve enough source data to reproduce parsing later after parser fixes, without requiring
another live collection from the target system.
### Format
`/api/export/json` returns a **raw export package**:
- JSON package (machine-readable), or
- ZIP bundle containing:
- `raw_export.json` — machine-readable package
- `collect.log` — human-readable collection + parsing summary
- `parser_fields.json` — structured parsed field snapshot for diffs between parser versions
### Import / reopen behavior
When a raw export package is uploaded back into LOGPile:
- the app **re-analyzes from raw source**
- it does **not** trust embedded parsed output as source of truth
For Redfish, this means replay from `raw_payloads.redfish_tree`.
### Design rule
Raw export is a **re-analysis artifact**, not a final report dump. Keep it self-contained and
forward-compatible where possible (versioned package format, additive fields only).
---
## Reanimator Export
### Purpose
Exports hardware inventory data in the format expected by the Reanimator asset tracking
system. Enables one-click push from LOGPile to an external asset management platform.
### Implementation files
| File | Role |
|------|------|
| `internal/exporter/reanimator_models.go` | Go structs for Reanimator JSON |
| `internal/exporter/reanimator_converter.go` | `ConvertToReanimator()` and helpers |
| `internal/server/handlers.go` | `handleExportReanimator()` HTTP handler |
### Conversion rules
- Source: canonical `hardware.devices` repository (see [`04-data-models.md`](04-data-models.md))
- CPU manufacturer inferred from model string (Intel / AMD / ARM / Ampere)
- PCIe serial number generated when absent: `{board_serial}-PCIE-{slot}`
- Status values normalized to: `OK`, `Warning`, `Critical`, `Unknown` (`Empty` only for memory slots)
- Timestamps in RFC3339 format
- `target_host` derived from `filename` field (`redfish://…`, `ipmi://…`) if not in source; omitted if undeterminable
- `board.manufacturer` and `board.product_name` values of `"NULL"` treated as absent
### LOGPile → Reanimator field mapping
| LOGPile type | Reanimator section | Notes |
|---|---|---|
| `BoardInfo` | `board` | Direct mapping |
| `CPU` | `cpus` | + manufacturer (inferred) |
| `MemoryDIMM` | `memory` | Direct; empty slots included (`present=false`) |
| `Storage` | `storage` | Excluded if no `serial_number` |
| `PCIeDevice` | `pcie_devices` | Serial generated if missing |
| `GPU` | `pcie_devices` | `device_class=DisplayController` |
| `NetworkAdapter` | `pcie_devices` | `device_class=NetworkController` |
| `PSU` | `power_supplies` | Excluded if no serial or `present=false` |
| `FirmwareInfo` | `firmware` | Direct mapping |
### Inclusion / exclusion rules
**Included:**
- Memory slots with `present=false` (as Empty slots)
- PCIe devices without serial number (serial is generated)
**Excluded:**
- Storage without `serial_number`
- PSU without `serial_number` or with `present=false`
- NetworkAdapters with `present=false`
---
## Reanimator Integration Guide
This section documents the Reanimator receiver-side JSON format (what the Reanimator
system expects when it ingests a LOGPile export).
> **Important:** The Reanimator endpoint uses a strict JSON decoder (`DisallowUnknownFields`).
> Any unknown field — including nested ones — causes `400 Bad Request`.
> Use only `snake_case` keys listed here.
### Top-level structure
```json
{
"filename": "redfish://10.10.10.103",
"source_type": "api",
"protocol": "redfish",
"target_host": "10.10.10.103",
"collected_at": "2026-02-10T15:30:00Z",
"hardware": {
"board": {...},
"firmware": [...],
"cpus": [...],
"memory": [...],
"storage": [...],
"pcie_devices": [...],
"power_supplies": [...]
}
}
```
**Required:** `collected_at`, `hardware.board.serial_number`
**Optional:** `target_host`, `source_type`, `protocol`, `filename`
`source_type` values: `api`, `logfile`, `manual`
`protocol` values: `redfish`, `ipmi`, `snmp`, `ssh`
### Component status fields (all component sections)
Each component may carry:
| Field | Type | Description |
|-------|------|-------------|
| `status` | string | `OK`, `Warning`, `Critical`, `Unknown`, `Empty` |
| `status_checked_at` | RFC3339 | When status was last verified |
| `status_changed_at` | RFC3339 | When status last changed |
| `status_at_collection` | object | `{ "status": "...", "at": "..." }` — snapshot-time status |
| `status_history` | array | `[{ "status": "...", "changed_at": "...", "details": "..." }]` |
| `error_description` | string | Human-readable error for Warning/Critical |
### Board
```json
{
"board": {
"manufacturer": "Supermicro",
"product_name": "X12DPG-QT6",
"serial_number": "21D634101",
"part_number": "X12DPG-QT6-REV1.01",
"uuid": "d7ef2fe5-2fd0-11f0-910a-346f11040868"
}
}
```
`serial_number` required. `manufacturer` / `product_name` of `"NULL"` treated as absent.
### CPUs
```json
{
"socket": 0,
"model": "INTEL(R) XEON(R) GOLD 6530",
"cores": 32,
"threads": 64,
"frequency_mhz": 2100,
"max_frequency_mhz": 4000,
"manufacturer": "Intel",
"status": "OK"
}
```
`socket` (int) and `model` required. Serial generated: `{board_serial}-CPU-{socket}`.
LOT format: `CPU_{VENDOR}_{MODEL_NORMALIZED}` → e.g. `CPU_INTEL_XEON_GOLD_6530`
### Memory
```json
{
"slot": "CPU0_C0D0",
"location": "CPU0_C0D0",
"present": true,
"size_mb": 32768,
"type": "DDR5",
"max_speed_mhz": 4800,
"current_speed_mhz": 4800,
"manufacturer": "Hynix",
"serial_number": "80AD032419E17CEEC1",
"part_number": "HMCG88AGBRA191N",
"status": "OK"
}
```
`slot` and `present` required. `serial_number` required when `present=true`.
Empty slots (`present=false`, `status="Empty"`) are included but no component created.
LOT format: `DIMM_{TYPE}_{SIZE_GB}GB` → e.g. `DIMM_DDR5_32GB`
### Storage
```json
{
"slot": "OB01",
"type": "NVMe",
"model": "INTEL SSDPF2KX076T1",
"size_gb": 7680,
"serial_number": "BTAX41900GF87P6DGN",
"manufacturer": "Intel",
"firmware": "9CV10510",
"interface": "NVMe",
"present": true,
"status": "OK"
}
```
`slot`, `model`, `serial_number`, `present` required.
LOT format: `{TYPE}_{INTERFACE}_{SIZE_TB}TB` → e.g. `SSD_NVME_07.68TB`
### Power Supplies
```json
{
"slot": "0",
"present": true,
"model": "GW-CRPS3000LW",
"vendor": "Great Wall",
"wattage_w": 3000,
"serial_number": "2P06C102610",
"part_number": "V0310C9000000000",
"firmware": "00.03.05",
"status": "OK",
"input_power_w": 137,
"output_power_w": 104,
"input_voltage": 215.25
}
```
`slot`, `present` required. `serial_number` required when `present=true`.
Telemetry fields (`input_power_w`, `output_power_w`, `input_voltage`) stored in observation only.
LOT format: `PSU_{WATTAGE}W_{VENDOR_NORMALIZED}` → e.g. `PSU_3000W_GREAT_WALL`
### PCIe Devices
```json
{
"slot": "PCIeCard1",
"vendor_id": 32902,
"device_id": 2912,
"bdf": "0000:18:00.0",
"device_class": "MassStorageController",
"manufacturer": "Intel",
"model": "RAID Controller RSP3DD080F",
"link_width": 8,
"link_speed": "Gen3",
"max_link_width": 8,
"max_link_speed": "Gen3",
"serial_number": "RAID-001-12345",
"firmware": "50.9.1-4296",
"status": "OK"
}
```
`slot` required. Serial generated if absent: `{board_serial}-PCIE-{slot}`.
`device_class` values: `NetworkController`, `MassStorageController`, `DisplayController`, etc.
LOT format: `PCIE_{DEVICE_CLASS}_{MODEL_NORMALIZED}` → e.g. `PCIE_NETWORK_CONNECTX5`
### Firmware
```json
[
{ "device_name": "BIOS", "version": "06.08.05" },
{ "device_name": "BMC", "version": "5.17.00" }
]
```
Both fields required. Changes trigger `FIRMWARE_CHANGED` timeline events.
---
### Import process (Reanimator side)
1. Validate `collected_at` (RFC3339) and `hardware.board.serial_number`.
2. Find or create Asset by `board.serial_number``vendor_serial`.
3. For each component: filter `present=false`, auto-determine LOT, find or create Component,
create Observation, update Installations.
4. Detect removed components (present in previous snapshot, absent in current) → close Installation.
5. Generate timeline events: `LOG_COLLECTED`, `INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`.
**Idempotency:** Repeated import of the same snapshot (same content hash) returns `200 OK`
with `"duplicate": true` and does not create duplicate records.
### Reanimator API endpoint
```http
POST /ingest/hardware
Content-Type: application/json
```
**Success (201):**
```json
{
"status": "success",
"bundle_id": "lb_01J...",
"asset_id": "mach_01J...",
"collected_at": "2026-02-10T15:30:00Z",
"duplicate": false,
"summary": {
"parts_observed": 15,
"parts_created": 2,
"installations_created": 2,
"timeline_events_created": 9
}
}
```
**Duplicate (200):**
```json
{ "status": "success", "duplicate": true, "message": "LogBundle with this content hash already exists" }
```
**Error (400):**
```json
{ "status": "error", "error": "validation_failed", "details": { "field": "...", "message": "..." } }
```
Common `400` causes:
- Unknown JSON field (strict decoder)
- Wrong key name (e.g. `targetHost` instead of `target_host`)
- Invalid `collected_at` format (must be RFC3339)
- Empty `hardware.board.serial_number`
### LOT normalization rules
1. Remove special chars `( ) - ® ™`; replace spaces with `_`
2. Uppercase all
3. Collapse multiple underscores to one
4. Strip common prefixes like `MODEL:`, `PN:`
### Status values
| Value | Meaning | Action |
|-------|---------|--------|
| `OK` | Normal | — |
| `Warning` | Degraded | Create `COMPONENT_WARNING` event (optional) |
| `Critical` | Failed | Auto-create `failure_event`, create `COMPONENT_FAILED` event |
| `Unknown` | Not determinable | Treat as working |
| `Empty` | Slot unpopulated | No component created (memory/PCIe only) |
### Missing field handling
| Field | Fallback |
|-------|---------|
| CPU serial | Generated: `{board_serial}-CPU-{socket}` |
| PCIe serial | Generated: `{board_serial}-PCIE-{slot}` |
| Other serial | Component skipped if absent |
| manufacturer (PCIe) | Looked up from `vendor_id` (8086→Intel, 10de→NVIDIA, 15b3→Mellanox…) |
| status | Treated as `Unknown` |
| firmware | No `FIRMWARE_CHANGED` event |
# 07 — Exporters
## Export surfaces
| Endpoint | Output | Purpose |
|----------|--------|---------|
| `GET /api/export/csv` | CSV | Serial-number export |
| `GET /api/export/json` | raw-export ZIP bundle | Reopen and re-analyze later |
| `GET /api/export/reanimator` | JSON | Reanimator hardware payload |
| `POST /api/convert` | async ZIP artifact | Batch archive-to-Reanimator conversion |
## Raw export
Raw export is not a final report dump.
It is a replayable artifact that preserves enough source data for future parser improvements.
Current bundle contents:
- `raw_export.json`
- `collect.log`
- `parser_fields.json`
Design rules:
- raw source is authoritative
- uploads of raw export must replay from raw source
- parsed snapshots inside the bundle are diagnostic only
## Reanimator export
Implementation files:
- `internal/exporter/reanimator_models.go`
- `internal/exporter/reanimator_converter.go`
- `internal/server/handlers.go`
- `bible-local/docs/hardware-ingest-contract.md`
Conversion rules:
- canonical source is merged canonical inventory derived from `hardware.devices` plus legacy hardware slices
- output must conform to the strict Reanimator ingest contract in `docs/hardware-ingest-contract.md`
- local mirror currently tracks upstream contract `v2.7`
- timestamps are RFC3339
- status is normalized to Reanimator-friendly values
- missing component serial numbers must stay absent; LOGPile must not synthesize fake serials for Reanimator export
- CPU `firmware` field means CPU microcode, not generic processor firmware inventory
- `NULL`-style board manufacturer/product values are treated as absent
- optional component telemetry/health fields are exported when LOGPile already has the data
- partial `hardware.devices` must not suppress components still present only in legacy parser/collector fields
- `present` is not serialized for exported components; presence is expressed by the existence of the component record itself
- Reanimator ingest may apply its own server-side fallback serial rules for CPU and PCIe when LOGPile leaves serials absent
## Inclusion rules
Included:
- PCIe-class devices when the component itself is present, even if serial number is missing
- contract `v2.7` component telemetry and health fields when source data exists
- hardware sensors grouped into `fans`, `power`, `temperatures`, `other` only when the sensor has a real numeric reading
- sensor `location` is not exported; LOGPile keeps only sensor `name` plus measured values and status
- Redfish linked metric docs that carry component telemetry: `ProcessorMetrics`, `MemoryMetrics`, `DriveMetrics`, `EnvironmentMetrics`, `Metrics`
- `pcie_devices.slot` is treated as the canonical PCIe address; `bdf` is used only as an internal fallback/dedupe key and is not serialized in the payload
- `event_logs` are exported only from normalized parser/collector events that can be mapped to contract sources `host` / `bmc` / `redfish` without synthesizing content
- `manufactured_year_week` is exported only as a reliable passthrough when the parser/collector already extracted a valid `YYYY-Www` value
Excluded:
- storage endpoints from `pcie_devices`; disks and NVMe drives export only through `hardware.storage`
- fake serial numbers for PCIe-class devices; any fallback serial generation belongs to Reanimator ingest, not LOGPile
- sensors without a real numeric reading
- events with internal-only or unmappable sources such as LOGPile internal warnings
- memory with missing serial number
- memory with `present=false` or `status=Empty`
- CPUs with `present=false`
- storage without `serial_number`
- storage with `present=false`
- power supplies without `serial_number`
- power supplies with `present=false`
- non-present network adapters
- non-present PCIe / GPU devices
- device-bound firmware duplicated at top-level firmware list
- any field not present in the strict ingest contract
## Batch convert
`POST /api/convert` accepts multiple supported files and produces a ZIP with:
- one `*.reanimator.json` file per successful input
- `convert-summary.txt`
Behavior:
- unsupported filenames are skipped
- each file is parsed independently
- one bad file must not fail the whole batch if at least one conversion succeeds
- result artifact is temporary and deleted after download
## CSV export
`GET /api/export/csv` uses the same merged canonical inventory as Reanimator export,
with legacy network-card fallback kept only for records that still have no canonical device match.

View File

@@ -4,86 +4,78 @@
Defined in `cmd/logpile/main.go`:
| Flag | Default | Description |
|------|---------|-------------|
| Flag | Default | Purpose |
|------|---------|---------|
| `--port` | `8082` | HTTP server port |
| `--file` | — | Reserved for archive preload (not active) |
| `--version` | | Print version and exit |
| `--no-browser` | | Do not open browser on start |
| `--hold-on-crash` | `true` on Windows | Keep console open on fatal crash for debugging |
| `--file` | empty | Preload archive file |
| `--version` | `false` | Print version and exit |
| `--no-browser` | `false` | Do not auto-open browser |
| `--hold-on-crash` | `true` on Windows | Keep console open after fatal crash |
## Build
## Common commands
```bash
# Local binary (current OS/arch)
make build
# Output: bin/logpile
# Cross-platform binaries
make build-all
# Output:
# bin/logpile-linux-amd64
# bin/logpile-linux-arm64
# bin/logpile-darwin-amd64
# bin/logpile-darwin-arm64
# bin/logpile-windows-amd64.exe
```
Both `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort`
before compilation to sync `pci.ids` from the submodule.
To skip PCI IDs update:
```bash
SKIP_PCI_IDS_UPDATE=1 make build
```
Build flags: `CGO_ENABLED=0` — fully static binary, no C runtime dependency.
## PCI IDs submodule
Source: `third_party/pciids` (git submodule → `github.com/pciutils/pciids`)
Local copy embedded at build time: `internal/parser/vendors/pciids/pci.ids`
```bash
# Manual update
make test
make fmt
make update-pci-ids
```
# Init submodule after fresh clone
Notes:
- `make build` outputs `bin/logpile`
- `make build-all` builds the supported cross-platform binaries
- `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort` unless `SKIP_PCI_IDS_UPDATE=1`
## PCI IDs
Source submodule: `third_party/pciids`
Embedded copy: `internal/parser/vendors/pciids/pci.ids`
Typical setup after clone:
```bash
git submodule update --init third_party/pciids
```
## Release process
## Release script
Run:
```bash
scripts/release.sh
./scripts/release.sh
```
What it does:
Current behavior:
1. Reads version from `git describe --tags`
2. Validates clean working tree (override: `ALLOW_DIRTY=1`)
3. Sets stable `GOPATH` / `GOCACHE` / `GOTOOLCHAIN` env
4. Creates `releases/{VERSION}/` directory
5. Generates `RELEASE_NOTES.md` template if not present
6. Builds `darwin-arm64` and `windows-amd64` binaries
7. Packages all binaries found in `bin/` as `.tar.gz` / `.zip`
2. Refuses a dirty tree unless `ALLOW_DIRTY=1`
3. Sets stable Go cache/toolchain environment
4. Creates `releases/{VERSION}/`
5. Creates a release-notes template if missing
6. Builds `darwin-arm64` and `windows-amd64`
7. Packages any already-present binaries from `bin/`
8. Generates `SHA256SUMS.txt`
9. Prints next steps (tag, push, create release manually)
Release notes template is created in `releases/{VERSION}/RELEASE_NOTES.md`.
Important limitation:
- `scripts/release.sh` does not run `make build-all` for you
- if you want Linux or additional macOS archives in the release directory, build them before running the script
## Running
Toolchain note:
- `scripts/release.sh` defaults `GOTOOLCHAIN=local` to use the already installed Go toolchain and avoid implicit network downloads during release builds
- if you intentionally want another toolchain, pass it explicitly, for example `GOTOOLCHAIN=go1.24.0 ./scripts/release.sh`
## Run locally
```bash
./bin/logpile
./bin/logpile --port 9090
./bin/logpile --no-browser
./bin/logpile --version
./bin/logpile --hold-on-crash # keep console open on crash (default on Windows)
```
## macOS Gatekeeper
After downloading a binary, remove the quarantine attribute:
```bash
xattr -d com.apple.quarantine /path/to/logpile-darwin-arm64
```

View File

@@ -1,43 +1,54 @@
# 09 — Testing
## Required before merge
## Baseline
Required before merge:
```bash
go test ./...
```
All tests must pass before any change is merged.
## Test locations
## Where to add tests
| Change area | Test location |
|-------------|---------------|
| Collectors | `internal/collector/*_test.go` |
| HTTP handlers | `internal/server/*_test.go` |
| Area | Location |
|------|----------|
| Collectors and replay | `internal/collector/*_test.go` |
| HTTP handlers and jobs | `internal/server/*_test.go` |
| Exporters | `internal/exporter/*_test.go` |
| Parsers | `internal/parser/vendors/<vendor>/*_test.go` |
| Vendor parsers | `internal/parser/vendors/<vendor>/*_test.go` |
## Exporter tests
## General rules
The Reanimator exporter has comprehensive coverage:
- Prefer table-driven tests
- No network access in unit tests
- Cover happy path and realistic failure/partial-data cases
- New vendor parsers need both detection and parse coverage
| Test file | Coverage |
|-----------|----------|
| `reanimator_converter_test.go` | Unit tests per conversion function |
| `reanimator_integration_test.go` | Full export with realistic `AnalysisResult` |
## Mandatory coverage for dedup/filter/classify logic
Any new deduplication, filtering, or classification function must have:
1. A true-positive case
2. A true-negative case
3. A regression case for the vendor or topology that motivated the change
This is mandatory for inventory logic, firmware filtering, and similar code paths where silent data drift is likely.
## Mandatory coverage for expensive path selection
Any function that decides whether to crawl or probe an expensive path must have:
1. A positive selection case
2. A negative exclusion case
3. A topology-level count/integration case
The goal is to catch runaway I/O regressions before they ship.
## Useful focused commands
Run exporter tests only:
```bash
go test ./internal/exporter/...
go test ./internal/exporter/... -v -run Reanimator
go test ./internal/exporter/... -cover
go test ./internal/collector/...
go test ./internal/server/...
go test ./internal/parser/vendors/...
```
## Guidelines
- Prefer table-driven tests for parsing logic (multiple input variants).
- Do not rely on network access in unit tests.
- Test both the happy path and edge cases (missing fields, empty collections).
- When adding a new vendor parser, include at minimum:
- `Detect()` test with a positive and a negative sample file list.
- `Parse()` test with a minimal but representative archive.

View File

@@ -253,4 +253,887 @@ at parse time before storing in any model struct. Use the regex
---
## ADL-018 — NVMe bay probe must be restricted to storage-capable chassis types
**Date:** 2026-03-12
**Context:** `shouldAdaptiveNVMeProbe` was introduced in `2fa4a12` to recover NVMe drives on
Supermicro BMCs that expose empty `Drives` collections but serve disks at direct `Disk.Bay.N`
---
paths. The function returns `true` for any chassis with an empty `Members` array. On
Supermicro HGX systems (SYS-A21GE-NBRT and similar) ~35 sub-chassis (GPU, NVSwitch,
PCIeRetimer, ERoT, IRoT, BMC, FPGA) all carry `ChassisType=Module/Component/Zone` and
expose empty `/Drives` collections. Without filtering, each triggered 384 HTTP requests →
13 440 requests ≈ 22 minutes of pure I/O waste per collection.
**Decision:** Before probing `Disk.Bay.N` candidates for a chassis, check its `ChassisType`
via `chassisTypeCanHaveNVMe`. Skip if type is `Module`, `Component`, or `Zone`. Keep probing
for `Enclosure`, `RackMount`, and any unrecognised type (fail-safe).
**Consequences:**
- On HGX systems post-probe NVMe goes from ~22 min to effectively zero.
- NVMe backplane recovery (`Enclosure` type) is unaffected.
- Any new chassis type that hosts NVMe storage is covered by the default `true` path.
- `chassisTypeCanHaveNVMe` and the candidate-selection loop must have unit tests covering
both the excluded types and the storage-capable types (see `TestChassisTypeCanHaveNVMe`
and `TestNVMePostProbeSkipsNonStorageChassis`).
## ADL-019 — Redfish post-probe recovery is profile-owned acquisition policy
**Date:** 2026-03-18
**Context:** Numeric collection post-probe and direct NVMe `Disk.Bay` recovery were still
controlled by collector-core heuristics, which kept platform-specific acquisition behavior in
`redfish.go` and made vendor/topology refactoring incomplete.
**Decision:** Move expensive Redfish post-probe enablement into profile-owned acquisition policy.
The collector core may execute bounded post-probe loops, but profiles must explicitly enable:
- numeric collection post-probe
- direct NVMe `Disk.Bay` recovery
- sensor collection post-probe
**Consequences:**
- Generic collector flow no longer implicitly turns on storage/NVMe recovery for every platform.
- Supermicro-specific direct NVMe recovery and generic numeric collection recovery are now
regression-tested through profile fixtures.
- Future platform storage/post-probe behavior must be added through profile tuning, not new
vendor-shaped `if` branches in collector core.
## ADL-020 — Redfish critical plan-B activation is profile-owned recovery policy
**Date:** 2026-03-18
**Context:** `critical plan-B` and `profile plan-B` were still effectively always-on collector
behavior once paths were present, including critical collection member retry and slow numeric
child probing. That kept acquisition recovery semantics in `redfish.go` instead of the profile
layer.
**Decision:** Move plan-B activation into profile-owned recovery policy. Profiles must explicitly
enable:
- critical collection member retry
- slow numeric probing during critical plan-B
- profile-specific plan-B pass
**Consequences:**
- Recovery behavior is now observable in raw Redfish diagnostics alongside other tuning.
- Generic/fallback recovery remains available through profile policy instead of implicit collector
defaults.
- Future platform-specific plan-B behavior must be introduced through profile tuning and tests,
not through new unconditional collector branches.
## ADL-021 — Extra discovered-path storage seeds must be profile-scoped, not core-baseline
**Date:** 2026-03-18
**Context:** The collector core baseline seed list still contained storage-specific discovered-path
suffixes such as `SimpleStorage` and `Storage/IntelVROC/*`. These are useful on some platforms,
but they are acquisition extensions layered on top of discovered `Systems/*` resources, not part
of the minimal vendor-neutral Redfish baseline.
**Decision:** Move such discovered-path expansions into profile-owned scoped path policy. The
collector core keeps the vendor-neutral baseline; profiles may add extra system/chassis/manager
suffixes that are expanded over discovered members during acquisition planning.
**Consequences:**
- Platform-shaped storage discovery no longer lives in `redfish.go` baseline seed construction.
- Extra discovered-path branches are visible in plan diagnostics and fixture regression tests.
- Future model/vendor storage path expansions must be added through scoped profile policy instead
of editing the shared baseline seed list.
## ADL-022 — Adaptive prefetch eligibility is profile-owned policy
**Date:** 2026-03-18
**Context:** The adaptive prefetch executor was still driven by hardcoded include/exclude path
rules in `redfish.go`. That made GPU/storage/network prefetch shaping part of collector-core
knowledge rather than profile-owned acquisition policy.
**Decision:** Move prefetch eligibility rules into profile tuning. The collector core still runs
adaptive prefetch, but profiles provide:
- `IncludeSuffixes` for critical paths eligible for prefetch
- `ExcludeContains` for path shapes that must never be prefetched
**Consequences:**
- Prefetch behavior is now visible in raw Redfish diagnostics and test fixtures.
- Platform- or topology-specific prefetch shaping no longer requires editing collector-core
string lists.
- Future prefetch tuning must be introduced through profiles and regression tests.
## ADL-023 — Core critical baseline is roots-only; critical shaping is profile-owned
**Date:** 2026-03-18
**Context:** `redfishCriticalEndpoints(...)` still encoded a broad set of system/chassis/manager
critical branches directly in collector core. This mixed minimal crawl invariants with profile-
specific acquisition shaping.
**Decision:** Reduce collector-core critical baseline to vendor-neutral roots only:
- `/redfish/v1`
- discovered `Systems/*`
- discovered `Chassis/*`
- discovered `Managers/*`
Profiles now own additional critical shaping through:
- scoped critical suffix policy for discovered resources
- explicit top-level `CriticalPaths`
**Consequences:**
- Critical inventory breadth is now explained by the acquisition plan, not hidden in collector
helper defaults.
- Generic profile still provides the previous broad critical coverage, so behavior stays stable.
- Future critical-path tuning must be implemented in profiles and regression-tested there.
## ADL-024 — Live Redfish execution plans are resolved inside redfishprofile
**Date:** 2026-03-18
**Context:** Even after moving seeds, scoped paths, critical shaping, recovery, and prefetch
policy into profiles, `redfish.go` still manually merged discovered resources with those policy
fragments. That left acquisition-plan resolution logic in collector core.
**Decision:** Introduce `redfishprofile.ResolveAcquisitionPlan(...)` as the boundary between
profile planning and collector execution. `redfishprofile` now resolves:
- baseline seeds
- baseline critical roots
- scoped path expansions
- explicit profile seed/critical/plan-B paths
The collector core consumes the resolved plan and executes it.
**Consequences:**
- Acquisition planning logic is now testable in `redfishprofile` without going through the live
collector.
- `redfish.go` no longer owns path-resolution helpers for seeds/critical planning.
- This creates a clean next step toward true per-profile acquisition hooks beyond static policy
fragments.
## ADL-025 — Post-discovery acquisition refinement belongs to profile hooks
**Date:** 2026-03-18
**Context:** Some acquisition behavior depends not only on vendor/model hints, but on what the
lightweight Redfish discovery actually returned. Static absolute path lists in profile plans are
too rigid for such cases and reintroduce guessed platform knowledge.
**Decision:** Add a post-discovery acquisition refinement hook to Redfish profiles. Profiles may
mutate the resolved execution plan after discovered `Systems/*`, `Chassis/*`, and `Managers/*`
are known.
First concrete use:
- MSI now derives GPU chassis seeds and `.../Sensors` critical/plan-B paths from discovered
`Chassis/GPU*` resources instead of hardcoded `GPU1..GPU4` absolute paths in the static plan.
Additional use:
- Supermicro now derives `UpdateService/Oem/Supermicro/FirmwareInventory` critical/plan-B paths
from resource hints instead of carrying that absolute path in the static plan.
Additional use:
- Dell now derives `Managers/iDRAC.Embedded.*` acquisition paths from discovered manager
resources instead of carrying `Managers/iDRAC.Embedded.1` as a static absolute path.
**Consequences:**
- Profile modules can react to actual discovery results without pushing conditional logic back
into `redfish.go`.
- Diagnostics still show the final refined plan because the collector stores the refined plan,
not only the pre-refinement template.
- Future vendor-specific discovery-dependent acquisition behavior should be implemented through
this hook rather than new collector-core branches.
## ADL-026 — Replay analysis uses a resolved profile plan, not ad-hoc directives only
**Date:** 2026-03-18
**Context:** Replay still relied on a flat `AnalysisDirectives` struct assembled centrally,
while vendor-specific conditions often depended on the actual snapshot shape. That made analysis
behavior harder to explain and kept too much vendor logic in generic replay collectors.
**Decision:** Introduce `redfishprofile.ResolveAnalysisPlan(...)` for replay. The resolved
analysis plan contains:
- active match result
- resolved analysis directives
- analysis notes explaining snapshot-aware hook activation
Profiles may refine this plan using the snapshot and discovered resources before replay collectors
run.
First concrete uses:
- MSI enables processor-GPU fallback and MSI chassis lookup only when the snapshot actually
contains GPU processors and `Chassis/GPU*`
- HGX enables processor-GPU alias fallback from actual HGX/GPU_SXM topology signals in the snapshot
- Supermicro enables NVMe backplane and known-controller recovery from actual snapshot paths
**Consequences:**
- Replay behavior is now closer to the acquisition architecture: a resolved profile plan feeds the
executor.
- `redfish_analysis_plan` is stored in raw payload metadata for offline debugging.
- Future analysis-side vendor logic should move into profile refinement hooks instead of growing the
central directive builder.
## ADL-027 — Replay GPU/storage executors consume resolved analysis plans
**Date:** 2026-03-18
**Context:** Even after introducing `ResolveAnalysisPlan(...)`, replay GPU/storage collectors still
accepted a raw `AnalysisDirectives` struct. That preserved an implicit shortcut from the old design
and weakened the plan/executor boundary.
**Decision:** Replay GPU/storage executors now accept `redfishprofile.ResolvedAnalysisPlan`
directly. The executor reads resolved directives from the plan instead of being passed a standalone
directive bundle.
**Consequences:**
- GPU and storage replay execution now follows the same architectural pattern as acquisition:
resolve plan first, execute second.
- Future profile-owned execution helpers can use plan notes or additional resolved fields without
changing the executor API again.
- Remaining replay areas should migrate the same way instead of continuing to accept raw directive
structs.
## ADL-019 — isDeviceBoundFirmwareName must cover vendor-specific naming patterns per vendor
**Date:** 2026-03-12
**Context:** `isDeviceBoundFirmwareName` was written to filter Dell-style device firmware names
(`"GPU SomeDevice"`, `"NIC OnboardLAN"`). When Supermicro Redfish FirmwareInventory was added
(`6c19a58`), no Supermicro-specific patterns were added. Supermicro names a NIC entry
`"NIC1 System Slot0 AOM-DP805-IO"` — a digit follows the type prefix directly, bypassing the
`"nic "` (space-terminated) check. 29 device-bound entries leaked into `hardware.firmware` on
SYS-A21GE-NBRT (HGX B200). Commit `9c5512d` attempted a fix by adding `_fw_gpu_` patterns,
but checked `DeviceName` which contains `"Software Inventory"` (from the Redfish `Name` field),
not the firmware inventory ID. The patterns were dead code from the moment they were committed.
**Decision:**
- `isDeviceBoundFirmwareName` must be extended for each new vendor whose FirmwareInventory
naming convention differs from the existing patterns.
- When adding HGX/Supermicro patterns, check that the pattern matches the field value that
`collectFirmwareInventory` actually stores — trace the data path from Redfish doc to
`FirmwareInfo.DeviceName` before writing the condition.
- `TestIsDeviceBoundFirmwareName` must contain at least one case per vendor format.
**Consequences:**
- New vendors with FirmwareInventory support require a test covering both device-bound names
(must return true) and system-level names (must return false) before the code ships.
- The dead `_fw_gpu_` / `_fw_nvswitch_` / `_inforom_gpu_` patterns were replaced with
correct prefix+digit checks (`"gpu" + digit`, `"nic" + digit`) and explicit string checks
(`"nvmecontroller"`, `"power supply"`, `"software inventory"`).
## ADL-020 — Dell TSR device-bound firmware filtered via FQDD; InfiniBand routed to NetworkAdapters
**Date:** 2026-03-15
**Context:** Dell TSR `sysinfo_DCIM_SoftwareIdentity.xml` lists firmware for every installed
component. `parseSoftwareIdentityXML` dumped all of these into `hardware.firmware` without
filtering, so device-bound entries such as `"Mellanox Network Adapter"` (FQDD `InfiniBand.Slot.1-1`)
and `"PERC H755 Front"` (FQDD `RAID.SL.3-1`) appeared in the reanimator export alongside system
firmware like BIOS and iDRAC. Confirmed on PowerEdge R6625 (8VS2LG4).
Additionally, `DCIM_InfiniBandView` was not handled in the parser switch, so Mellanox ConnectX-6
appeared only as a PCIe device with `model: "16x or x16"` (from `DataBusWidth` fallback).
`parseControllerView` called `addFirmware` with description `"storage controller"` instead of the
FQDD, so the FQDD-based filter in the exporter could not remove it.
**Decision:**
1. `isDeviceBoundFirmwareFQDD` extended with `"infiniband."` and `"fc."` prefixes; `"raid.backplane."`
broadened to `"raid."` to cover `RAID.SL.*`, `RAID.Integrated.*`, etc.
2. `DCIM_InfiniBandView` routed to `parseNICView` → device appears as `NetworkAdapter` with correct
firmware, MAC address, and VendorID/DeviceID.
3. `"InfiniBand."` added to `pcieFQDDNoisePrefix` to suppress the duplicate `DCIM_PCIDeviceView`
entry (DataBusWidth-only, no useful data).
4. `parseControllerView` now passes `fqdd` as the `addFirmware` description so the FQDD filter
removes the entry in the exporter.
5. `parsePCIeDeviceView` now prioritises `props["description"]` (chip model, e.g. `"MT28908 Family
[ConnectX-6]"`) over `props["devicedescription"]` (location string) for `pcie.Description`.
6. `convertPCIeDevices` model fallback order: `PartNumber → Description → DeviceClass`.
**Consequences:**
- `hardware.firmware` contains only system-level entries; NIC/RAID/storage-controller firmware
lives on the respective device record.
- `TestParseDellInfiniBandView` and `TestIsDeviceBoundFirmwareFQDD` guard the regression.
- Any future Dell TSR device class whose FQDD prefix is not yet in the prefix list may still leak;
extend `isDeviceBoundFirmwareFQDD` and add a test case when encountered.
---
## ADL-021 — pci.ids enrichment: chip model and vendor resolved from PCI IDs when source data is generic or missing
**Date:** 2026-03-15
**Context:**
Dell TSR `DCIM_InfiniBandView.ProductName` reports a generic marketing name ("Mellanox Network
Adapter") instead of the precise chip identifier ("MT28908 Family [ConnectX-6]"). The actual
chip model is available in `pci.ids` by VendorID:DeviceID (15B3:101B). Vendor name may also be
absent when no `VendorName` / `Manufacturer` property is present.
The general rule was established: *if model is not found in source data but PCI IDs are known,
resolve model from `pci.ids`*. This rule applies broadly across all export paths.
**Decision (two-layer enrichment):**
1. **Parser layer (Dell, `parseNICView`):** When `VendorID != 0 && DeviceID != 0`, prefer
`pciids.DeviceName(vendorID, deviceID)` over the product name from logs. This makes the chip
identifier the primary model for NIC/InfiniBand adapters (more specific than marketing name).
Fill `Vendor` from `pciids.VendorName(vendorID)` when the vendor field is otherwise empty.
Same fallback applied in `parsePCIeDeviceView` for empty `Description`.
2. **Exporter layer (`convertPCIeFromDevices`):** General rule — when `d.Model == ""` after all
legacy fallbacks and `VendorID != 0 && DeviceID != 0`, set `model = pciids.DeviceName(...)`.
Also fill empty `manufacturer` from `pciids.VendorName(...)`. This covers all parsers/sources.
**Consequences:**
- Mellanox InfiniBand slot now reports `model: "MT28908 Family [ConnectX-6]"` and
`manufacturer: "Mellanox Technologies"` in the reanimator export.
- For NICs where pci.ids has no entry, the original product name is kept (pci.ids returns "").
- `TestParseDellInfiniBandView` asserts the model and vendor from pci.ids.
---
## ADL-022 — CPUAffinity parsed into NUMANode for PCIe, NIC, and controller devices
**Date:** 2026-03-15
**Context:**
Dell TSR DCIM view classes report `CPUAffinity` for NIC, InfiniBand, PCIe, and controller
devices. Values are "1", "2" (NUMA node index), or "Not Applicable" (for devices that bridge
both CPUs or have no CPU affinity). This data is needed for topology-aware diagnostics.
**Decision:**
- Add `NUMANode int` (JSON: `"numa_node,omitempty"`) to `models.PCIeDevice`,
`models.NetworkAdapter`, `models.HardwareDevice`, and `ReanimatorPCIe`.
- Parse from `props["cpuaffinity"]` using `parseIntLoose`: numeric values ("1", "2") map
directly; "Not Applicable" returns 0 (omitted via `omitempty`).
- Thread through `buildDevicesFromLegacy` (PCIe and NIC sections) and `convertPCIeFromDevices`.
- `parseControllerView` also parses CPUAffinity since RAID controllers have NUMA affinity.
**Consequences:**
- `numa_node: 1` or `2` appears in reanimator export for devices with known affinity.
- Value 0 / absent means "not reported" — covers both "Not Applicable" and sources that don't
provide CPUAffinity at all.
- `TestParseDellCPUAffinity` verifies numeric values parsed correctly and "Not Applicable"→0.
---
## ADL-023 — Reanimator export must match ingest contract exactly
**Date:** 2026-03-15
**Context:**
LOGPile's Reanimator export had drifted from the strict ingest contract. It emitted fields that
Reanimator does not currently accept (`status_at_collection`, `numa_node`),
while missing fields and sections now present in the contract (`hardware.sensors`,
`pcie_devices[].mac_addresses`). Memory export rules also diverged from the ingest side: empty or
serial-less DIMMs were still exported.
**Decision:**
- Treat the Reanimator ingest contract as the authoritative schema for `GET /api/export/reanimator`.
- Emit only fields present in the current upstream contract revision.
- Add `hardware.sensors`, `pcie_devices[].mac_addresses`, `pcie_devices[].numa_node`, and
upstream-approved component telemetry/health fields.
- Leave out fields that are still not part of the upstream contract.
- Map internal `source_type=archive` to external `source_type=logfile`.
- Skip memory entries that are empty, not present, or missing serial numbers.
- Generate CPU and PCIe serials only in the forms allowed by the contract.
- Mirror the applied contract in `bible-local/docs/hardware-ingest-contract.md`.
**Consequences:**
- Some previously exported diagnostic fields are intentionally dropped from the Reanimator payload
until the upstream contract adds them.
- Internal models may retain richer fields than the current export schema.
- `hardware.devices` is canonical only after merge with legacy hardware slices; partial parser-owned
canonical records must not hide CPUs, memory, storage, NICs, or PSUs still stored in legacy
fields.
- CSV and Reanimator exports must use the same merged canonical inventory to avoid divergent export
contents across surfaces.
- Future exporter changes must update both the code and the mirrored contract document together.
---
## ADL-024 — Component presence is implicit; Redfish linked metrics are part of replay correctness
**Date:** 2026-03-15
**Context:**
The upstream ingest contract allows `present`, but current export semantics do not need to send
`present=true` for populated components. At the same time, several important Redfish component
telemetry fields were only available through linked metric resources such as `ProcessorMetrics`,
`MemoryMetrics`, and `DriveMetrics`. Without collecting and replaying these linked documents,
live collection and raw snapshot replay still underreported component health fields.
**Decision:**
- Do not serialize `present=true` in Reanimator export. Presence is represented by the presence of
the component record itself.
- Do not export component records marked `present=false`.
- Interpret CPU `firmware` in Reanimator payload as CPU microcode.
- Treat Redfish linked metric resources `ProcessorMetrics`, `MemoryMetrics`, `DriveMetrics`,
`EnvironmentMetrics`, and generic `Metrics` as part of analyzer correctness when they are linked
from component resources.
- Replay logic must merge these linked metric resources back into CPU, memory, storage, PCIe, GPU,
NIC, and PSU component `Details` the same way live collection expects them to be used.
**Consequences:**
- Reanimator payloads are smaller and avoid redundant `present=true` noise while still excluding
empty slots and absent components.
- Any future exporter change that reintroduces serialized component presence needs an explicit
contract review.
- Raw Redfish snapshot completeness now includes linked per-component metric resources, not only
top-level inventory members.
- CPU microcode is no longer expected in top-level `hardware.firmware`; it belongs on the CPU
component record.
<!-- Add new decisions below this line using the format above -->
## ADL-025 — Missing serial numbers must remain absent in Reanimator export
**Date:** 2026-03-15
**Context:**
LOGPile previously generated synthetic serial numbers for components that had no real serial in
source data, especially CPUs and PCIe-class devices. This made the payload look richer, but the
serials were not authoritative and could mislead downstream consumers. Reanimator can already
accept missing serials and generate its own internal fallback identifiers when needed.
**Decision:**
- Do not synthesize fake serial numbers in LOGPile's Reanimator export.
- If a component has no real serial in parsed source data, export the serial field as absent.
- This applies to CPUs, PCIe devices, GPUs, NICs, and any other component class unless an
upstream contract explicitly requires a deterministic exporter-generated identifier.
- Any fallback serial generation defined by the upstream contract is ingest-side Reanimator behavior,
not LOGPile exporter behavior.
**Consequences:**
- Exported payloads carry only source-backed serial numbers.
- Fake identifiers such as `BOARD-...-CPU-...` or synthetic PCIe serials are no longer considered
acceptable exporter behavior.
- Any future attempt to reintroduce generated serials requires an explicit contract review and a
new ADL entry.
---
## ADL-026 — Live Redfish collection uses explicit preflight host-power confirmation
**Date:** 2026-03-15
**Context:**
Live Redfish inventory can be incomplete when the managed host is powered off. At the same time,
LOGPile must not silently power on a host without explicit user choice. The collection workflow
therefore needs a preflight step that verifies connectivity, shows current host power state to the
user, and only powers on the host when the user explicitly chose that path.
**Decision:**
- Add a dedicated live preflight API step before collection starts.
- UI first runs connectivity and power-state check, then offers:
- collect as-is
- power on and collect
- if the host is off and the user does not answer within 5 seconds, default to collecting without
powering the host on
- Redfish collection may power on the host only when the request explicitly sets
`power_on_if_host_off=true`
- when LOGPile powers on the host for collection, it must try to power the host back off after
collection completes
- if LOGPile did not power the host on itself, it must never power the host off
- all preflight and power-control steps must be logged into the collection log and therefore into
the raw-export bundle
**Consequences:**
- Live collection becomes a two-step UX: probe first, collect second.
- Raw bundles preserve operator-visible evidence of power-state decisions and power-control attempts.
- Power-on failures do not block collection entirely; they only downgrade completeness expectations.
---
## ADL-027 — Sensors without numeric readings are not exported
**Date:** 2026-03-15
**Context:**
Some parsed sensor records carry only a name, unit, or status, but no actual numeric reading. Such
records are not useful as telemetry in Reanimator export and create noisy, low-value sensor lists.
**Decision:**
- Do not export temperature, power, fan, or other sensor records unless they carry a real numeric
measurement value.
- Presence of a sensor name or health/status alone is not sufficient for export.
**Consequences:**
- Exported sensor groups contain only actionable telemetry.
- Parsers and collectors may still keep non-numeric sensor artifacts internally for diagnostics, but
Reanimator export must filter them out.
---
## ADL-028 — Reanimator PCIe export excludes storage endpoints and synthetic serials
**Date:** 2026-03-15
**Context:**
Some Redfish and archive sources expose NVMe drives both as storage inventory and as PCIe-visible
endpoints. Exporting such drives in both `hardware.storage` and `hardware.pcie_devices` creates
duplicates without adding useful topology value. At the same time, PCIe-class export still had old
fallback behavior that generated synthetic serial numbers when source serials were absent.
**Decision:**
- Export disks and NVMe drives only through `hardware.storage`.
- Do not export storage endpoints as `hardware.pcie_devices`, even if the source inventory exposes
them as PCIe/NVMe devices.
- Keep real PCIe storage controllers such as RAID and HBA adapters in `hardware.pcie_devices`.
- Do not synthesize PCIe/GPU/NIC serial numbers in LOGPile; missing serials stay absent.
- Treat placeholder names such as `Network Device View` as non-authoritative and prefer resolved
device names when stronger data exists.
**Consequences:**
- Reanimator payloads no longer duplicate NVMe drives between storage and PCIe sections.
- PCIe export remains topology-focused while storage export remains component-focused.
- Missing PCIe-class serials no longer produce fake `BOARD-...-PCIE-...` identifiers.
---
## ADL-029 — Local exporter guidance tracks upstream contract v2.7 terminology
**Date:** 2026-03-15
**Context:**
The upstream Reanimator hardware ingest contract moved to `v2.7` and clarified several points that
matter for LOGPile documentation: ingest-side serial fallback rules, canonical PCIe addressing via
`slot`, the optional `event_logs` section, and the shared `manufactured_year_week` field.
**Decision:**
- Keep the local mirrored contract file as an exact copy of the upstream `v2.7` document.
- Describe CPU/PCIe serial fallback as Reanimator ingest behavior, not LOGPile exporter behavior.
- Treat `pcie_devices.slot` as the canonical address on the LOGPile side as well; `bdf` may remain
an internal fallback/dedupe key but is not serialized in the payload.
- Export `event_logs` only from normalized parser/collector events that can be mapped to contract
sources `host` / `bmc` / `redfish` without synthesizing message content.
- Export `manufactured_year_week` only as a reliable passthrough when a parser/collector already
extracted a valid `YYYY-Www` value.
**Consequences:**
- Local bible wording no longer conflicts with upstream contract terminology.
- Reanimator payloads use contract-native PCIe addressing and no longer expose `bdf` as a parallel
coordinate.
- LOGPile event export remains strictly source-derived; internal warnings such as LOGPile analysis
notes do not leak into Reanimator `event_logs`.
---
## ADL-030 — Audit result rendering is delegated to embedded reanimator/chart
**Date:** 2026-03-16
**Context:**
LOGPile already owns file upload, Redfish collection, archive parsing, normalization, and
Reanimator export. Maintaining a second host-side audit renderer for the same data created
presentation drift and duplicated UI logic.
**Decision:**
- Use vendored `reanimator/chart` as the only audit result viewer.
- Keep LOGPile responsible for service flows: upload, live collection, batch convert, raw export,
Reanimator export, and parse-error reporting.
- Render the current dataset by converting it to Reanimator JSON and passing that snapshot to
embedded `chart` under `/chart/current`.
**Consequences:**
- Reanimator JSON becomes the single presentation contract for the audit surface.
- The host UI becomes a service shell around the viewer instead of maintaining its own
field-by-field tabs.
- `internal/chart` must be updated explicitly as a git submodule when the viewer changes.
---
## ADL-031 — Redfish uses profile-driven acquisition and unified ingest entrypoints
**Date:** 2026-03-17
**Context:**
Redfish collection had accumulated platform-specific probing in the shared collector path, while
upload and raw-export replay still entered analysis through direct handler branches. This made
vendor/model tuning harder to contain and increased regression risk when one topology needed a
special acquisition strategy.
**Decision:**
- Introduce `internal/ingest.Service` as the internal source-family entrypoint for archive parsing
and Redfish raw replay.
- Introduce `internal/collector/redfishprofile/` for Redfish profile matching and modular hooks.
- Split Redfish behavior into coordinated phases:
- acquisition planning during live collection
- analysis hooks during snapshot replay
- Use score-based profile matching. If confidence is low, enter fallback acquisition mode and
aggregate only safe additive profile probes.
- Allow profile modules to provide bounded acquisition tuning hints such as crawl cap, prefetch
behavior, and expensive post-probe toggles.
- Allow profile modules to own model-specific `CriticalPaths` and bounded `PlanBPaths` so vendor
recovery targets stop leaking into the collector core.
- Expose Redfish profile matching as structured diagnostics during live collection: logs must
contain all module scores, and collect job status must expose active modules for the UI.
**Consequences:**
- Server handlers stop owning parser-vs-replay branching details directly.
- Vendor/model-specific Redfish logic gets an explicit module boundary.
- Unknown-vendor Redfish collection becomes slower but more complete by design.
- Tactical Redfish fixes should move into profile modules instead of widening generic replay logic.
- Repo-owned compact fixtures under `internal/collector/redfishprofile/testdata/`, derived from
representative raw-export snapshots, are used to lock profile matching and acquisition tuning
for known MSI and Supermicro-family shapes.
---
## ADL-032 — MSI ghost GPU filter: exclude GPUs with temperature=0 on powered-on host
**Date:** 2026-03-18
**Context:**
MSI/AMI BMC caches GPU inventory from the host via Host Interface (in-band). When GPUs are
removed without a reboot the old entries remain in `Chassis/GPU*` and
`Systems/Self/Processors/GPU*` with `Status.Health: OK, State: Enabled`. The BMC has no
out-of-band mechanism to detect physical absence. A physically present GPU always reports
an ambient temperature (>0°C) even when idle; a stale cached entry returns `Reading: 0`.
**Decision:**
- Add `EnableMSIGhostGPUFilter` directive (enabled by MSI profile's `refineAnalysis`
alongside `EnableProcessorGPUFallback`).
- In `collectGPUsFromProcessors`: for each processor GPU, resolve its chassis path and read
`Chassis/GPU{n}/Sensors/GPU{n}_Temperature`. If `PowerState=On` and `Reading=0` → skip.
- Filter only applies when host is powered on; when host is off all temperatures are 0 and
the signal is ambiguous.
**Consequences:**
- Ghost GPUs from previous hardware configurations no longer appear in the inventory.
- Filter is MSI-profile-owned and does not affect HGX, Supermicro, or generic paths.
- Any new MSI GPU chassis that uses a different temperature sensor path will bypass the filter
(safe default: include rather than wrongly exclude).
---
## ADL-033 — Reanimator export collected_at uses inventory LastModifiedTime with 30-day fallback
**Date:** 2026-03-18
**Context:**
For Redfish sources the BMC Manager `DateTime` reflects when the BMC clock read the time, not
when the hardware inventory was last known-good. `InventoryData/Status.LastModifiedTime`
(AMI/MSI OEM endpoint) records the actual timestamp of the last successful host-pushed
inventory cycle and is a better proxy for "when was this hardware configuration last confirmed".
**Decision:**
- `inferInventoryLastModifiedTime` reads `LastModifiedTime` from the snapshot and sets
`AnalysisResult.InventoryLastModifiedAt`.
- `reanimatorCollectedAt()` in the exporter selects `InventoryLastModifiedAt` when it is set
and no older than 30 days; otherwise falls back to `CollectedAt`.
- Fallback rationale: inventory older than 30 days is likely from a long-running server with
no recent reboot; using the actual collection date is more useful for the downstream consumer.
- The inventory timestamp is also logged during replay and live collection for diagnostics.
**Consequences:**
- Reanimator export `collected_at` reflects the last confirmed inventory cycle on AMI/MSI BMCs.
- On non-AMI BMCs or when `InventoryData/Status` is absent, behavior is unchanged.
- If inventory is stale (>30 days), collection date is used as before.
---
## ADL-034 — Redfish inventory invalidated before host power-on
**Date:** 2026-03-18
**Context:**
When a host is powered on by the collector (`power_on_if_host_off=true`), the BMC still holds
inventory from the previous boot. If hardware changed between shutdowns, the new boot will push
fresh inventory — but only if the BMC accepts it (CRC mismatch triggers re-population). Without
explicit invalidation, unchanged CRCs can cause the BMC to skip re-processing even after a
hardware change.
**Decision:**
- Before any power-on attempt, `invalidateRedfishInventory` POSTs to
`{systemPath}/Oem/Ami/Inventory/Crc` with all groups zeroed (`CPU`, `DIMM`, `PCIE`,
`CERTIFICATES`, `SECUREBOOT`).
- Best-effort: a 404/405 response (non-AMI BMC) is logged and silently ignored.
- The invalidation is logged at `INFO` level and surfaced as a collect progress message.
**Consequences:**
- On AMI/MSI BMCs: the next boot will push a full fresh inventory regardless of whether
CRCs appear unchanged, eliminating ghost components from prior hardware configurations.
- On non-AMI BMCs: the POST fails immediately (endpoint does not exist), nothing changes.
- Invalidation runs only when `power_on_if_host_off=true` and host is confirmed off.
---
## ADL-035 — Redfish hardware event log collection from Systems LogServices
**Date:** 2026-03-18
**Context:** Redfish BMCs expose event logs via `LogServices/{svc}/Entries`. On MSI/AMI this includes the IPMI SEL with hardware events (temperature, power, drive failures, etc.). Live collection previously collected only inventory/sensor snapshots; event history was unavailable in Reanimator.
**Decision:**
- After tree-walk, fetch hardware log entries separately via `collectRedfishLogEntries()` (not part of tree-walk to avoid bloat).
- Only `Systems/{sys}/LogServices` is queried — Managers LogServices (BMC audit/journal) are excluded.
- Log services with Id/Name containing "audit", "journal", "bmc", "security", "manager", "debug" are skipped.
- Entries older than 7 days (client-side filter) are discarded. Pages are followed until an out-of-window entry is found (assumes newest-first ordering, typical for BMCs).
- Entries with `EntryType: "Oem"` or `MessageId` containing user/auth/login keywords are filtered as non-hardware.
- Raw entries stored in `rawPayloads["redfish_log_entries"]` as `[]map[string]interface{}`.
- Parsed to `models.Event` in `parseRedfishLogEntries()` during replay — same path for live and offline.
- Max 200 entries per log service, 500 total to limit BMC load.
**Consequences:**
- Hardware event history (last 7 days) visible in Reanimator `EventLogs` section.
- No impact on existing inventory pipeline or offline archive replay (archives without `redfish_log_entries` key silently skip parsing).
- Adds extra HTTP requests during live collection (sequential, after tree-walk completes).
---
## ADL-036 — Redfish profile matching may use platform grammar hints beyond vendor strings
**Date:** 2026-03-25
**Context:**
Some BMCs expose unusable `Manufacturer` / `Model` values (`NULL`, placeholders, or generic SoC
names) while still exposing a stable platform-specific Redfish grammar: repeated member names,
firmware inventory IDs, OEM action names, and target-path quirks. Matching only on vendor
strings forced such systems into fallback mode even when the platform shape was consistent.
**Decision:**
- Extend `redfishprofile.MatchSignals` with doc-derived hint tokens collected from discovery docs
and replay snapshots.
- Allow profile matchers to score on stable platform grammar such as:
- collection member naming (`outboardPCIeCard*`, drive slot grammars)
- firmware inventory member IDs
- OEM action/type markers and linked target paths
- During live collection, gather only lightweight extra hint collections needed for matching
(`NetworkInterfaces`, `NetworkAdapters`, `Drives`, `UpdateService/FirmwareInventory`), not slow
deep inventory branches.
- Keep such profiles out of fallback aggregation unless they are proven safe as broad additive
hints.
**Consequences:**
- Platform-family profiles can activate even when vendor strings are absent or set to `NULL`.
- Matching logic becomes more robust for OEM BMC implementations that differ mainly by Redfish
grammar rather than by explicit vendor strings.
- Live collection gains a small amount of extra discovery I/O to harvest stable member IDs, but
avoids slow deep probes such as `Assembly` just for profile selection.
---
## ADL-037 — easy-bee archives are parsed from the embedded bee-audit snapshot
**Date:** 2026-03-25
**Context:**
`reanimator-easy-bee` support bundles already contain a normalized hardware snapshot in
`export/bee-audit.json` plus supporting logs and techdump files. Rebuilding the same inventory
from raw `techdump/` files inside LOGPile would duplicate parser logic and create drift between
the producer utility and archive importer.
**Decision:**
- Add a dedicated `easy_bee` vendor parser for `bee-support-*.tar.gz` bundles.
- Detect the bundle by `manifest.txt` (`bee_version=...`) plus `export/bee-audit.json`.
- Parse the archive from the embedded snapshot first; treat `techdump/` and runtime files as
secondary context only.
- Normalize snapshot-only fields needed by LOGPile, notably:
- flatten `hardware.sensors` groups into `[]SensorReading`
- turn runtime issues/status into `[]Event`
- synthesize a board FRU entry when the snapshot does not include FRU data
**Consequences:**
- LOGPile stays aligned with the schema emitted by `reanimator-easy-bee`.
- Adding support required only a thin archive adapter instead of a full hardware parser.
- If the upstream utility changes the embedded snapshot schema, the `easy_bee` adapter is the
only place that must be updated.
---
## ADL-038 — HPE AHS parser uses hybrid extraction instead of full `zbb` schema decoding
**Date:** 2026-03-30
**Context:** HPE iLO Active Health System exports (`.ahs`) are proprietary `ABJR` containers
with gzip-compressed `zbb` payloads. The sample inventory data contains two practical signal
families: printable SMBIOS/FRU-style strings and embedded Redfish JSON subtrees, especially for
storage controllers and drives. Full `zbb` binary schema decoding is not documented and would add
significant complexity before proving user value.
**Decision:** Support HPE AHS with a hybrid parser:
- decode the outer `ABJR` container
- gunzip embedded members when applicable
- extract inventory from printable SMBIOS/FRU payloads
- extract storage/controller/backplane details from embedded Redfish JSON objects
- enrich firmware and PSU inventory from auxiliary package payloads such as `bcert.pkg`
- do not attempt complete semantic decoding of the internal `zbb` record format
**Consequences:**
- Parser reaches inventory-grade usefulness quickly for HPE `.ahs` uploads.
- Storage inventory is stronger than text-only parsing because it reuses structured Redfish data when present.
- Auxiliary package payloads can supply missing firmware/PSU fields even when the main SMBIOS-like blob is incomplete.
- Future deeper `zbb` decoding can be added incrementally without replacing the current parser contract.
---
## ADL-039 — Canonical inventory keeps DIMMs with unknown capacity when identity is known
**Date:** 2026-03-30
**Context:** Some sources, notably HPE iLO AHS SMBIOS-like blobs, expose installed DIMM identity
(slot, serial, part number, manufacturer) but do not include capacity. The parser already extracts
those modules into `Hardware.Memory`, but canonical device building and export previously dropped
them because `size_mb == 0`.
**Decision:** Treat a DIMM as installed inventory when `present=true` and it has identifying
memory fields such as serial number or part number, even if `size_mb` is unknown.
**Consequences:**
- HPE AHS uploads now show real installed memory modules instead of hiding them.
- Empty slots still stay filtered because they lack inventory identity or are marked absent.
- Specification/export can include "size unknown" memory entries without inventing capacity data.
---
## ADL-040 — HPE Redfish normalization prefers chassis `Devices/*` over generic PCIe topology labels
**Date:** 2026-03-30
**Context:** HPE ProLiant Gen11 Redfish snapshots expose parallel inventory trees. `Chassis/*/PCIeDevices/*`
is good for topology presence, but often reports only generic `DeviceType` values such as
`SingleFunction`. `Chassis/*/Devices/*` carries the concrete slot label, richer device type, and
product-vs-spare part identifiers for the same physical NIC/controller. Replay fallback over empty
storage volume collections can also discover `Volumes/Capabilities` children, which are not real
logical volumes.
**Decision:**
- Treat Redfish `SKU` as a valid fallback for `hardware.board.part_number` when `PartNumber` is empty.
- Ignore `Volumes/Capabilities` documents during logical-volume parsing.
- Enrich `Chassis/*/PCIeDevices/*` entries with matching `Chassis/*/Devices/*` documents by
serial/name/part identity.
- Keep `pcie.device_class` semantic; do not replace it with model or part-number strings when
Redfish exposes only generic topology labels.
**Consequences:**
- HPE Redfish imports now keep the server SKU in `hardware.board.part_number`.
- Empty volume collections no longer produce fake `Capabilities` volume records.
- HPE PCIe inventory gets better slot labels like `OCP 3.0 Slot 15` plus concrete classes such as
`LOM/NIC` or `SAS/SATA Storage Controller`.
- `part_number` remains available separately for model identity, without polluting the class field.
---
## ADL-041 — Redfish replay drops topology-only PCIe noise classes from canonical inventory
**Date:** 2026-04-01
**Context:** Some Redfish BMCs, especially MSI/AMI GPU systems, expose a very wide PCIe topology
tree under `Chassis/*/PCIeDevices/*`. Besides real endpoint devices, the replay sees bridge stages,
CPU-side helper functions, IMC/mesh signal-processing nodes, USB/SPI side controllers, and GPU
display-function duplicates reported as generic `Display Device`. Keeping all of them in
`hardware.pcie_devices` pollutes downstream exports such as Reanimator and hides the actual
endpoint inventory signal.
**Decision:**
- Filter topology-only PCIe records during Redfish replay, not in the UI layer.
- Drop PCIe entries with replay-resolved classes:
- `Bridge`
- `Processor`
- `SignalProcessingController`
- `SerialBusController`
- Drop `DisplayController` entries when the source Redfish PCIe document is the generic MSI-style
`Description: "Display Device"` duplicate.
- Drop PCIe network endpoints when their PCIe functions already link to `NetworkDeviceFunctions`,
because those devices are represented canonically in `hardware.network_adapters`.
- When `Systems/*/NetworkInterfaces/*` links back to a chassis `NetworkAdapter`, match against the
fully enriched chassis NIC identity to avoid creating a second ghost NIC row with the raw
`NetworkAdapter_*` slot/name.
- Treat generic Redfish object names such as `NetworkAdapter_*` and `PCIeDevice_*` as placeholder
models and replace them from PCI IDs when a concrete vendor/device match exists.
- Drop MSI-style storage service PCIe endpoints whose resolved device names are only
`Volume Management Device NVMe RAID Controller` or `PCIe Switch management endpoint`; storage
inventory already comes from the Redfish storage tree.
- Normalize Ethernet-class NICs into the single exported class `NetworkController`; do not split
`EthernetController` into a separate top-level inventory section.
- Keep endpoint classes such as `NetworkController`, `MassStorageController`, and dedicated GPU
inventory coming from `hardware.gpus`.
**Consequences:**
- `hardware.pcie_devices` becomes closer to real endpoint inventory instead of raw PCIe topology.
- Reanimator exports stop showing MSI bridge/processor/display duplicate noise.
- Reanimator exports no longer duplicate the same MSI NIC as both `PCIeDevice_*` and
`NetworkAdapter_*`.
- Replay no longer creates extra NIC rows from `Systems/NetworkInterfaces` when the same adapter
was already normalized from `Chassis/NetworkAdapters`.
- MSI VMD / PCIe switch storage service endpoints no longer pollute PCIe inventory.
- UI/Reanimator group all Ethernet NICs under the same `NETWORKCONTROLLER` section.
- Canonical NIC inventory prefers resolved PCI product names over generic Redfish placeholder names.
- The raw Redfish snapshot still remains available in `raw_payloads.redfish_tree` for low-level
troubleshooting if topology details are ever needed.
---
## ADL-042 — xFusion file-export archives merge AppDump inventory with RTOS/Log snapshots
**Date:** 2026-04-04
**Context:** xFusion iBMC `tar.gz` exports expose the base inventory in `AppDump/`, but the most
useful NIC and firmware details live elsewhere: NIC firmware/MAC snapshots in
`LogDump/netcard/netcard_info.txt` and system firmware versions in
`RTOSDump/versioninfo/app_revision.txt`. Parsing only `AppDump/` left xFusion uploads detectable but
incomplete for UI and Reanimator consumers.
**Decision:**
- Treat xFusion file-export `tar.gz` bundles as a first-class archive parser input.
- Merge OCP NIC identity from `AppDump/card_manage/card_info` with the latest per-slot snapshot
from `LogDump/netcard/netcard_info.txt` to produce `hardware.network_adapters`.
- Import system-level firmware from `RTOSDump/versioninfo/app_revision.txt` into
`hardware.firmware`.
- Allow FRU fallback from `RTOSDump/versioninfo/fruinfo.txt` when `AppDump/FruData/fruinfo.txt`
is absent.
**Consequences:**
- xFusion uploads now preserve NIC BDF, MAC, firmware, and serial identity in normalized output.
- System firmware such as BIOS and iBMC versions survives xFusion file exports.
- xFusion archives participate more reliably in canonical device/export flows without special UI
cases.
---
## ADL-043 — Extended HGX diagnostic plan-B is opt-in from the live collect form
**Date:** 2026-04-13
**Context:** Some Supermicro HGX Redfish targets expose slow or hanging component-chassis inventory
collections during critical plan-B, especially under `Chassis/HGX_*` for `Assembly`,
`Accelerators`, `Drives`, `NetworkAdapters`, and `PCIeDevices`. Default collection should not
block operators on deep diagnostic retries that are useful mainly for troubleshooting.
**Decision:** Keep the normal snapshot/replay path unchanged, but gate those heavy HGX
component-chassis critical plan-B retries behind the existing live-collect `debug_payloads` flag,
presented in the UI as "Сбор расширенных данных для диагностики".
**Consequences:**
- Default live collection skips those heavy diagnostic plan-B retries and reaches replay faster.
- Operators can explicitly opt into the slower diagnostic path when they need deeper collection.
- The same user-facing toggle continues to enable extra debug payload capture for troubleshooting.

View File

@@ -1,59 +1,42 @@
# LOGPile Bible
> **Documentation language:** English only. All maintained project documentation must be written in English.
>
> **Architectural decisions:** Every significant architectural decision **must** be recorded in
> [`10-decisions.md`](10-decisions.md) before or alongside the code change.
>
> **Single source of truth:** Architecture and technical design documentation belongs in `docs/bible/`.
> Keep `README.md` and `CLAUDE.md` minimal to avoid duplicate documentation.
`bible-local/` is the project-specific source of truth for LOGPile.
Keep top-level docs minimal and put maintained architecture/API contracts here.
This directory is the single source of truth for LOGPile's architecture, design, and integration contracts.
It is structured so that both humans and AI assistants can navigate it quickly.
## Rules
---
- Documentation language: English only
- Update relevant bible files in the same change as the code
- Record significant architectural decisions in [`10-decisions.md`](10-decisions.md)
- Do not duplicate shared rules from `bible/`
## Reading Map (Hierarchical)
## Read order
### 1. Foundations (read first)
| File | Purpose |
|------|---------|
| [01-overview.md](01-overview.md) | Product scope, modes, non-goals |
| [02-architecture.md](02-architecture.md) | Runtime structure, state, main flows |
| [04-data-models.md](04-data-models.md) | Stable data contracts and canonical inventory |
| [03-api.md](03-api.md) | HTTP endpoints and response contracts |
| [05-collectors.md](05-collectors.md) | Live collection behavior |
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor coverage |
| [07-exporters.md](07-exporters.md) | Raw export, Reanimator export, batch convert |
| [docs/hardware-ingest-contract.md](docs/hardware-ingest-contract.md) | Reanimator ingest schema mirrored locally |
| [08-build-release.md](08-build-release.md) | Build and release workflow |
| [09-testing.md](09-testing.md) | Test expectations and regression rules |
| [10-decisions.md](10-decisions.md) | Architectural Decision Log |
| File | What it covers |
|------|----------------|
| [01-overview.md](01-overview.md) | Product purpose, operating modes, scope |
| [02-architecture.md](02-architecture.md) | Runtime structure, control flow, in-memory state |
| [04-data-models.md](04-data-models.md) | Core contracts (`AnalysisResult`, canonical `hardware.devices`) |
## Fast orientation
### 2. Runtime Interfaces
| File | What it covers |
|------|----------------|
| [03-api.md](03-api.md) | HTTP API contracts and endpoint behavior |
| [05-collectors.md](05-collectors.md) | Live collection connectors (Redfish, IPMI mock) |
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor parsers |
| [07-exporters.md](07-exporters.md) | CSV / JSON / Reanimator exports and integration mapping |
### 3. Delivery & Quality
| File | What it covers |
|------|----------------|
| [08-build-release.md](08-build-release.md) | Build, packaging, release workflow |
| [09-testing.md](09-testing.md) | Testing expectations and verification guidance |
### 4. Governance (always current)
| File | What it covers |
|------|----------------|
| [10-decisions.md](10-decisions.md) | Architectural Decision Log (ADL) |
---
## Quick orientation for AI assistants
- Read order for most changes: `01``02``04` → relevant interface doc(s) → `10`
- Entry point: `cmd/logpile/main.go`
- HTTP server: `internal/server/` — handlers in `handlers.go`, routes in `server.go`
- Data contracts: `internal/models/` — never break `AnalysisResult` JSON shape
- Frontend contract: `web/static/js/app.js` — keep API responses stable
- Canonical inventory: `hardware.devices` in `AnalysisResult` — source of truth for UI and exports
- Parser registry: `internal/parser/vendors/``init()` auto-registration pattern
- Collector registry: `internal/collector/registry.go`
- HTTP layer: `internal/server/`
- Core contracts: `internal/models/models.go`
- Live collection: `internal/collector/`
- Archive parsing: `internal/parser/`
- Export conversion: `internal/exporter/`
- Frontend consumer: `web/static/js/app.js`
## Maintenance rule
If a document becomes stale, either fix it immediately or delete it.
Stale docs are worse than missing docs.

View File

@@ -0,0 +1,793 @@
---
title: Hardware Ingest JSON Contract
version: "2.7"
updated: "2026-03-15"
maintainer: Reanimator Core
audience: external-integrators, ai-agents
language: ru
---
# Интеграция с Reanimator: контракт JSON-импорта аппаратного обеспечения
Версия: **2.7** · Дата: **2026-03-15**
Документ описывает формат JSON для передачи данных об аппаратном обеспечении серверов в систему **Reanimator** (управление жизненным циклом аппаратного обеспечения).
Предназначен для разработчиков смежных систем (Redfish-коллекторов, агентов мониторинга, CMDB-экспортёров) и может быть включён в документацию интегрируемых проектов.
> Актуальная версия документа: https://git.mchus.pro/reanimator/core/src/branch/main/bible-local/docs/hardware-ingest-contract.md
---
## Changelog
| Версия | Дата | Изменения |
|--------|------|-----------|
| 2.7 | 2026-03-15 | Явно запрещён синтез данных в `event_logs`; интеграторы не должны придумывать серийные номера компонентов, если источник их не отдал |
| 2.6 | 2026-03-15 | Добавлена необязательная секция `event_logs` для dedup/upsert логов `host` / `bmc` / `redfish` вне history timeline |
| 2.5 | 2026-03-15 | Добавлено общее необязательное поле `manufactured_year_week` для компонентных секций (`YYYY-Www`) |
| 2.4 | 2026-03-15 | Добавлена первая волна component telemetry: health/life поля для `cpus`, `memory`, `storage`, `pcie_devices`, `power_supplies` |
| 2.3 | 2026-03-15 | Добавлены component telemetry поля: `pcie_devices.temperature_c`, `pcie_devices.power_w`, `power_supplies.temperature_c` |
| 2.2 | 2026-03-15 | Добавлено поле `numa_node` у `pcie_devices` для topology/affinity |
| 2.1 | 2026-03-15 | Добавлена секция `sensors` (fans, power, temperatures, other); поле `mac_addresses` у `pcie_devices`; расширен список значений `device_class` |
| 2.0 | 2026-02-01 | История статусов (`status_history`, `status_changed_at`); поля telemetry у PSU; async job response |
| 1.0 | 2026-01-01 | Начальная версия контракта |
---
## Принципы
1. **Snapshot** — JSON описывает состояние сервера на момент сбора. Может включать историю изменений статуса компонентов.
2. **Идемпотентность** — повторная отправка идентичного payload не создаёт дублей (дедупликация по хешу).
3. **Частичность** — можно передавать только те секции, данные по которым доступны. Пустой массив и отсутствие секции эквивалентны.
4. **Строгая схема** — endpoint использует строгий JSON-декодер; неизвестные поля приводят к `400 Bad Request`.
5. **Event-driven** — импорт создаёт события в timeline (LOG_COLLECTED, INSTALLED, REMOVED, FIRMWARE_CHANGED и др.).
6. **Без синтеза со стороны интегратора** — сборщик передаёт только фактически собранные значения. Нельзя придумывать `serial_number`, `component_ref`, `message`, `message_id` или другие идентификаторы/атрибуты, если источник их не предоставил или парсер не смог их надёжно извлечь.
---
## Endpoint
```
POST /ingest/hardware
Content-Type: application/json
```
**Ответ при приёме (202 Accepted):**
```json
{
"status": "accepted",
"job_id": "job_01J..."
}
```
Импорт выполняется асинхронно. Результат доступен по:
```
GET /ingest/hardware/jobs/{job_id}
```
**Ответ при успехе задачи:**
```json
{
"status": "success",
"bundle_id": "lb_01J...",
"asset_id": "mach_01J...",
"collected_at": "2026-02-10T15:30:00Z",
"duplicate": false,
"summary": {
"parts_observed": 15,
"parts_created": 2,
"parts_updated": 13,
"installations_created": 2,
"installations_closed": 1,
"timeline_events_created": 9,
"failure_events_created": 1
}
}
```
**Ответ при дубликате:**
```json
{
"status": "success",
"duplicate": true,
"message": "LogBundle with this content hash already exists"
}
```
**Ответ при ошибке (400 Bad Request):**
```json
{
"status": "error",
"error": "validation_failed",
"details": {
"field": "hardware.board.serial_number",
"message": "serial_number is required"
}
}
```
Частые причины `400`:
- Неверный формат `collected_at` (требуется RFC3339).
- Пустой `hardware.board.serial_number`.
- Наличие неизвестного JSON-поля на любом уровне.
- Тело запроса превышает допустимый размер.
---
## Структура верхнего уровня
```json
{
"filename": "redfish://10.10.10.103",
"source_type": "api",
"protocol": "redfish",
"target_host": "10.10.10.103",
"collected_at": "2026-02-10T15:30:00Z",
"hardware": {
"board": { ... },
"firmware": [ ... ],
"cpus": [ ... ],
"memory": [ ... ],
"storage": [ ... ],
"pcie_devices": [ ... ],
"power_supplies": [ ... ],
"sensors": { ... },
"event_logs": [ ... ]
}
}
```
### Поля верхнего уровня
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `collected_at` | string RFC3339 | **да** | Время сбора данных |
| `hardware` | object | **да** | Аппаратный снапшот |
| `hardware.board.serial_number` | string | **да** | Серийный номер платы/сервера |
| `target_host` | string | нет | IP или hostname |
| `source_type` | string | нет | Тип источника: `api`, `logfile`, `manual` |
| `protocol` | string | нет | Протокол: `redfish`, `ipmi`, `snmp`, `ssh` |
| `filename` | string | нет | Идентификатор источника |
---
## Общие поля статуса компонентов
Применяются ко всем компонентным секциям (`cpus`, `memory`, `storage`, `pcie_devices`, `power_supplies`).
| Поле | Тип | Описание |
|------|-----|----------|
| `status` | string | Текущий статус: `OK`, `Warning`, `Critical`, `Unknown`, `Empty` |
| `status_checked_at` | string RFC3339 | Время последней проверки статуса |
| `status_changed_at` | string RFC3339 | Время последнего изменения статуса |
| `status_history` | array | История переходов статусов (см. ниже) |
| `error_description` | string | Текст ошибки/диагностики |
| `manufactured_year_week` | string | Дата производства в формате `YYYY-Www`, например `2024-W07` |
**Объект `status_history[]`:**
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `status` | string | **да** | Статус в этот момент |
| `changed_at` | string RFC3339 | **да** | Время перехода (без этого поля запись игнорируется) |
| `details` | string | нет | Пояснение к переходу |
**Правила приоритета времени события:**
1. `status_changed_at`
2. Последняя запись `status_history` с совпадающим статусом
3. Последняя парсируемая запись `status_history`
4. `status_checked_at`
**Правила передачи статусов:**
- Передавайте `status` как текущее состояние компонента в snapshot.
- Если источник хранит историю — передавайте `status_history` отсортированным по `changed_at` по возрастанию.
- Не включайте записи `status_history` без `changed_at`.
- Все даты — RFC3339, рекомендуется UTC (`Z`).
- `manufactured_year_week` используйте, когда источник знает только год и неделю производства, без точной календарной даты.
---
## Секции hardware
### board
Основная информация о сервере. Обязательная секция.
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `serial_number` | string | **да** | Серийный номер (ключ идентификации Asset) |
| `manufacturer` | string | нет | Производитель |
| `product_name` | string | нет | Модель |
| `part_number` | string | нет | Партномер |
| `uuid` | string | нет | UUID системы |
Значения `"NULL"` в строковых полях трактуются как отсутствие данных.
```json
"board": {
"manufacturer": "Supermicro",
"product_name": "X12DPG-QT6",
"serial_number": "21D634101",
"part_number": "X12DPG-QT6-REV1.01",
"uuid": "d7ef2fe5-2fd0-11f0-910a-346f11040868"
}
```
---
### firmware
Версии прошивок системных компонентов (BIOS, BMC, CPLD и др.).
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `device_name` | string | **да** | Название устройства (`BIOS`, `BMC`, `CPLD`, …) |
| `version` | string | **да** | Версия прошивки |
Записи с пустым `device_name` или `version` игнорируются.
Изменение версии создаёт событие `FIRMWARE_CHANGED` для Asset.
```json
"firmware": [
{ "device_name": "BIOS", "version": "06.08.05" },
{ "device_name": "BMC", "version": "5.17.00" },
{ "device_name": "CPLD", "version": "01.02.03" }
]
```
---
### cpus
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `socket` | int | **да** | Номер сокета (используется для генерации serial) |
| `model` | string | нет | Модель процессора |
| `manufacturer` | string | нет | Производитель |
| `cores` | int | нет | Количество ядер |
| `threads` | int | нет | Количество потоков |
| `frequency_mhz` | int | нет | Текущая частота |
| `max_frequency_mhz` | int | нет | Максимальная частота |
| `temperature_c` | float | нет | Температура CPU, °C (telemetry) |
| `power_w` | float | нет | Текущая мощность CPU, Вт (telemetry) |
| `throttled` | bool | нет | Зафиксирован thermal/power throttling |
| `correctable_error_count` | int | нет | Количество корректируемых ошибок CPU |
| `uncorrectable_error_count` | int | нет | Количество некорректируемых ошибок CPU |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| `serial_number` | string | нет | Серийный номер (если доступен) |
| `firmware` | string | нет | Версия микрокода; если логгер отдает `Microcode level`, передавайте его сюда как есть |
| `present` | bool | нет | Наличие (по умолчанию `true`) |
| + общие поля статуса | | | см. раздел выше |
**Генерация serial_number при отсутствии:** `{board_serial}-CPU-{socket}`
Если источник использует поле/лейбл `Microcode level`, его значение передавайте в `cpus[].firmware` без дополнительного преобразования.
```json
"cpus": [
{
"socket": 0,
"model": "INTEL(R) XEON(R) GOLD 6530",
"cores": 32,
"threads": 64,
"frequency_mhz": 2100,
"max_frequency_mhz": 4000,
"temperature_c": 61.5,
"power_w": 182.0,
"throttled": false,
"manufacturer": "Intel",
"status": "OK",
"status_checked_at": "2026-02-10T15:28:00Z"
}
]
```
---
### memory
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `slot` | string | нет | Идентификатор слота |
| `present` | bool | нет | Наличие модуля (по умолчанию `true`) |
| `serial_number` | string | нет | Серийный номер |
| `part_number` | string | нет | Партномер (используется как модель) |
| `manufacturer` | string | нет | Производитель |
| `size_mb` | int | нет | Объём в МБ |
| `type` | string | нет | Тип: `DDR3`, `DDR4`, `DDR5`, … |
| `max_speed_mhz` | int | нет | Максимальная частота |
| `current_speed_mhz` | int | нет | Текущая частота |
| `temperature_c` | float | нет | Температура DIMM/модуля, °C (telemetry) |
| `correctable_ecc_error_count` | int | нет | Количество корректируемых ECC-ошибок |
| `uncorrectable_ecc_error_count` | int | нет | Количество некорректируемых ECC-ошибок |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| `spare_blocks_remaining_pct` | float | нет | Остаток spare blocks, % |
| `performance_degraded` | bool | нет | Зафиксирована деградация производительности |
| `data_loss_detected` | bool | нет | Источник сигнализирует риск/факт потери данных |
| + общие поля статуса | | | см. раздел выше |
Модуль без `serial_number` игнорируется. Модуль с `present=false` или `status=Empty` игнорируется.
```json
"memory": [
{
"slot": "CPU0_C0D0",
"present": true,
"size_mb": 32768,
"type": "DDR5",
"max_speed_mhz": 4800,
"current_speed_mhz": 4800,
"temperature_c": 43.0,
"correctable_ecc_error_count": 0,
"manufacturer": "Hynix",
"serial_number": "80AD032419E17CEEC1",
"part_number": "HMCG88AGBRA191N",
"status": "OK"
}
]
```
---
### storage
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `slot` | string | нет | Канонический адрес установки PCIe-устройства; передавайте BDF (`0000:18:00.0`) |
| `serial_number` | string | нет | Серийный номер |
| `model` | string | нет | Модель |
| `manufacturer` | string | нет | Производитель |
| `type` | string | нет | Тип: `NVMe`, `SSD`, `HDD` |
| `interface` | string | нет | Интерфейс: `NVMe`, `SATA`, `SAS` |
| `size_gb` | int | нет | Размер в ГБ |
| `temperature_c` | float | нет | Температура накопителя, °C (telemetry) |
| `power_on_hours` | int64 | нет | Время работы, часы |
| `power_cycles` | int64 | нет | Количество циклов питания |
| `unsafe_shutdowns` | int64 | нет | Нештатные выключения |
| `media_errors` | int64 | нет | Ошибки носителя / media errors |
| `error_log_entries` | int64 | нет | Количество записей в error log |
| `written_bytes` | int64 | нет | Всего записано байт |
| `read_bytes` | int64 | нет | Всего прочитано байт |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `available_spare_pct` | float | нет | Доступный spare, % |
| `reallocated_sectors` | int64 | нет | Переназначенные сектора |
| `current_pending_sectors` | int64 | нет | Сектора в ожидании ремапа |
| `offline_uncorrectable` | int64 | нет | Некорректируемые ошибки offline scan |
| `firmware` | string | нет | Версия прошивки |
| `present` | bool | нет | Наличие (по умолчанию `true`) |
| + общие поля статуса | | | см. раздел выше |
Диск без `serial_number` игнорируется. Изменение `firmware` создаёт событие `FIRMWARE_CHANGED`.
```json
"storage": [
{
"slot": "OB01",
"type": "NVMe",
"model": "INTEL SSDPF2KX076T1",
"size_gb": 7680,
"temperature_c": 38.5,
"power_on_hours": 12450,
"unsafe_shutdowns": 3,
"written_bytes": 9876543210,
"life_remaining_pct": 91.0,
"serial_number": "BTAX41900GF87P6DGN",
"manufacturer": "Intel",
"firmware": "9CV10510",
"interface": "NVMe",
"present": true,
"status": "OK"
}
]
```
---
### pcie_devices
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `slot` | string | нет | Идентификатор слота |
| `vendor_id` | int | нет | PCI Vendor ID (decimal) |
| `device_id` | int | нет | PCI Device ID (decimal) |
| `numa_node` | int | нет | NUMA node / CPU affinity устройства |
| `temperature_c` | float | нет | Температура устройства, °C (telemetry) |
| `power_w` | float | нет | Текущее энергопотребление устройства, Вт (telemetry) |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| `ecc_corrected_total` | int64 | нет | Всего корректируемых ECC-ошибок |
| `ecc_uncorrected_total` | int64 | нет | Всего некорректируемых ECC-ошибок |
| `hw_slowdown` | bool | нет | Устройство вошло в hardware slowdown / protective mode |
| `battery_charge_pct` | float | нет | Заряд батареи / supercap, % |
| `battery_health_pct` | float | нет | Состояние батареи / supercap, % |
| `battery_temperature_c` | float | нет | Температура батареи / supercap, °C |
| `battery_voltage_v` | float | нет | Напряжение батареи / supercap, В |
| `battery_replace_required` | bool | нет | Требуется замена батареи / supercap |
| `sfp_temperature_c` | float | нет | Температура SFP/optic, °C |
| `sfp_tx_power_dbm` | float | нет | TX optical power, dBm |
| `sfp_rx_power_dbm` | float | нет | RX optical power, dBm |
| `sfp_voltage_v` | float | нет | Напряжение SFP, В |
| `sfp_bias_ma` | float | нет | Bias current SFP, мА |
| `bdf` | string | нет | Deprecated alias для `slot`; при наличии ingest нормализует его в `slot` |
| `device_class` | string | нет | Класс устройства (см. список ниже) |
| `manufacturer` | string | нет | Производитель |
| `model` | string | нет | Модель |
| `serial_number` | string | нет | Серийный номер |
| `firmware` | string | нет | Версия прошивки |
| `link_width` | int | нет | Текущая ширина линка |
| `link_speed` | string | нет | Текущая скорость: `Gen3`, `Gen4`, `Gen5` |
| `max_link_width` | int | нет | Максимальная ширина линка |
| `max_link_speed` | string | нет | Максимальная скорость |
| `mac_addresses` | string[] | нет | MAC-адреса портов (для сетевых устройств) |
| `present` | bool | нет | Наличие (по умолчанию `true`) |
| + общие поля статуса | | | см. раздел выше |
`numa_node` передавайте для NIC / InfiniBand / RAID / GPU, когда источник знает CPU/NUMA affinity. Поле сохраняется в snapshot-атрибутах PCIe-компонента и дублируется в telemetry для topology use cases.
Поля `temperature_c` и `power_w` используйте для device-level telemetry GPU / accelerator / smart PCIe devices. Они не влияют на идентификацию компонента.
**Генерация serial_number при отсутствии или `"N/A"`:** `{board_serial}-PCIE-{slot}`, где `slot` для PCIe равен BDF.
`slot` — единственный канонический адрес компонента. Для PCIe в `slot` передавайте BDF. Поле `bdf` сохраняется только как переходный alias на входе и не должно использоваться как отдельная координата рядом со `slot`.
**Значения `device_class`:**
| Значение | Назначение |
|----------|------------|
| `MassStorageController` | RAID-контроллеры |
| `StorageController` | HBA, SAS-контроллеры |
| `NetworkController` | Сетевые адаптеры (InfiniBand, общий) |
| `EthernetController` | Ethernet NIC |
| `FibreChannelController` | Fibre Channel HBA |
| `VideoController` | GPU, видеокарты |
| `ProcessingAccelerator` | Вычислительные ускорители (AI/ML) |
| `DisplayController` | Контроллеры дисплея (BMC VGA) |
Список открытый: допускаются произвольные строки для нестандартных классов.
```json
"pcie_devices": [
{
"slot": "0000:3b:00.0",
"vendor_id": 5555,
"device_id": 4401,
"numa_node": 0,
"temperature_c": 48.5,
"power_w": 18.2,
"sfp_temperature_c": 36.2,
"sfp_tx_power_dbm": -1.8,
"sfp_rx_power_dbm": -2.1,
"device_class": "EthernetController",
"manufacturer": "Intel",
"model": "X710 10GbE",
"serial_number": "K65472-003",
"firmware": "9.20 0x8000d4ae",
"mac_addresses": ["3c:fd:fe:aa:bb:cc", "3c:fd:fe:aa:bb:cd"],
"status": "OK"
}
]
```
---
### power_supplies
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `slot` | string | нет | Идентификатор слота |
| `present` | bool | нет | Наличие (по умолчанию `true`) |
| `serial_number` | string | нет | Серийный номер |
| `part_number` | string | нет | Партномер |
| `model` | string | нет | Модель |
| `vendor` | string | нет | Производитель |
| `wattage_w` | int | нет | Мощность в ваттах |
| `firmware` | string | нет | Версия прошивки |
| `input_type` | string | нет | Тип входа (например `ACWideRange`) |
| `input_voltage` | float | нет | Входное напряжение, В (telemetry) |
| `input_power_w` | float | нет | Входная мощность, Вт (telemetry) |
| `output_power_w` | float | нет | Выходная мощность, Вт (telemetry) |
| `temperature_c` | float | нет | Температура PSU, °C (telemetry) |
| `life_remaining_pct` | float | нет | Остаточный ресурс / health, % |
| `life_used_pct` | float | нет | Использованный ресурс / wear, % |
| + общие поля статуса | | | см. раздел выше |
Поля telemetry (`input_voltage`, `input_power_w`, `output_power_w`, `temperature_c`, `life_remaining_pct`, `life_used_pct`) сохраняются в атрибутах компонента и не влияют на его идентификацию.
PSU без `serial_number` игнорируется.
```json
"power_supplies": [
{
"slot": "0",
"present": true,
"model": "GW-CRPS3000LW",
"vendor": "Great Wall",
"wattage_w": 3000,
"serial_number": "2P06C102610",
"firmware": "00.03.05",
"status": "OK",
"input_type": "ACWideRange",
"input_power_w": 137,
"output_power_w": 104,
"input_voltage": 215.25,
"temperature_c": 39.5,
"life_remaining_pct": 97.0
}
]
```
---
### sensors
Показания сенсоров сервера. Секция опциональная, не привязана к компонентам.
Данные хранятся как последнее известное значение (last-known-value) на уровне Asset.
```json
"sensors": {
"fans": [ ... ],
"power": [ ... ],
"temperatures": [ ... ],
"other": [ ... ]
}
```
---
### event_logs
Нормализованные операционные логи сервера из `host`, `bmc` или `redfish`.
Эти записи не попадают в history timeline и не создают history events. Они сохраняются в отдельной deduplicated log store и отображаются в отдельном UI-блоке asset logs / host logs.
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `source` | string | **да** | Источник лога: `host`, `bmc`, `redfish` |
| `event_time` | string RFC3339 | нет | Время события из источника; если отсутствует, используется время ingest/collection |
| `severity` | string | нет | Уровень: `OK`, `Info`, `Warning`, `Critical`, `Unknown` |
| `message_id` | string | нет | Идентификатор/код события источника |
| `message` | string | **да** | Нормализованный текст события |
| `component_ref` | string | нет | Ссылка на компонент/устройство/слот, если извлекается |
| `fingerprint` | string | нет | Внешний готовый dedup-key; если не передан, система вычисляет свой |
| `is_active` | bool | нет | Признак, что событие всё ещё активно/не погашено, если источник умеет lifecycle |
| `raw_payload` | object | нет | Сырой vendor-specific payload для диагностики |
**Правила event_logs:**
- Логи дедуплицируются в рамках asset + source + fingerprint.
- Если `fingerprint` не передан, система строит его из нормализованных полей (`source`, `message_id`, `message`, `component_ref`, временная нормализация).
- Интегратор/сборщик логов не должен синтезировать содержимое событий: не придумывайте `message`, `message_id`, `component_ref`, serial/device identifiers или иные поля, если они отсутствуют в исходном логе или не были надёжно извлечены.
- Повторное получение того же события обновляет `last_seen_at`/счётчик повторов и не должно создавать новый timeline/history event.
- `event_logs` используются для отдельного UI-представления логов и не изменяют canonical state компонентов/asset по умолчанию.
```json
"event_logs": [
{
"source": "bmc",
"event_time": "2026-03-15T14:03:11Z",
"severity": "Warning",
"message_id": "0x000F",
"message": "Correctable ECC error threshold exceeded",
"component_ref": "CPU0_C0D0",
"raw_payload": {
"sensor": "DIMM_A1",
"sel_record_id": "0042"
}
},
{
"source": "redfish",
"event_time": "2026-03-15T14:03:20Z",
"severity": "Info",
"message_id": "OpenBMC.0.1.SystemReboot",
"message": "System reboot requested by administrator",
"component_ref": "Mainboard"
}
]
```
#### sensors.fans
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `name` | string | **да** | Уникальное имя сенсора в рамках секции |
| `location` | string | нет | Физическое расположение |
| `rpm` | int | нет | Обороты, RPM |
| `status` | string | нет | Статус: `OK`, `Warning`, `Critical`, `Unknown` |
#### sensors.power
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `name` | string | **да** | Уникальное имя сенсора |
| `location` | string | нет | Физическое расположение |
| `voltage_v` | float | нет | Напряжение, В |
| `current_a` | float | нет | Ток, А |
| `power_w` | float | нет | Мощность, Вт |
| `status` | string | нет | Статус |
#### sensors.temperatures
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `name` | string | **да** | Уникальное имя сенсора |
| `location` | string | нет | Физическое расположение |
| `celsius` | float | нет | Температура, °C |
| `threshold_warning_celsius` | float | нет | Порог Warning, °C |
| `threshold_critical_celsius` | float | нет | Порог Critical, °C |
| `status` | string | нет | Статус |
#### sensors.other
| Поле | Тип | Обязательно | Описание |
|------|-----|-------------|----------|
| `name` | string | **да** | Уникальное имя сенсора |
| `location` | string | нет | Физическое расположение |
| `value` | float | нет | Значение |
| `unit` | string | нет | Единица измерения |
| `status` | string | нет | Статус |
**Правила sensors:**
- Идентификатор сенсора: пара `(sensor_type, name)`. Дубли в одном payload — берётся первое вхождение.
- Сенсоры без `name` игнорируются.
- При каждом импорте значения перезаписываются (upsert по ключу).
```json
"sensors": {
"fans": [
{ "name": "FAN1", "location": "Front", "rpm": 4200, "status": "OK" },
{ "name": "FAN_CPU0", "location": "CPU0", "rpm": 5600, "status": "OK" }
],
"power": [
{ "name": "12V Rail", "location": "Mainboard", "voltage_v": 12.06, "status": "OK" },
{ "name": "PSU0 Input", "location": "PSU0", "voltage_v": 215.25, "current_a": 0.64, "power_w": 137.0, "status": "OK" }
],
"temperatures": [
{ "name": "CPU0 Temp", "location": "CPU0", "celsius": 46.0, "threshold_warning_celsius": 80.0, "threshold_critical_celsius": 95.0, "status": "OK" },
{ "name": "Inlet Temp", "location": "Front", "celsius": 22.0, "threshold_warning_celsius": 40.0, "threshold_critical_celsius": 50.0, "status": "OK" }
],
"other": [
{ "name": "System Humidity", "value": 38.5, "unit": "%" , "status": "OK" }
]
}
```
---
## Обработка статусов компонентов
| Статус | Поведение |
|--------|-----------|
| `OK` | Нормальная обработка |
| `Warning` | Создаётся событие `COMPONENT_WARNING` |
| `Critical` | Создаётся событие `COMPONENT_FAILED` + запись в `failure_events` |
| `Unknown` | Компонент считается рабочим, создаётся событие `COMPONENT_UNKNOWN` |
| `Empty` | Компонент не создаётся/не обновляется |
---
## Обработка отсутствующих serial_number
Общее правило для всех секций: если источник не вернул серийный номер и сборщик не смог его надёжно извлечь, интегратор не должен подставлять вымышленные значения, хеши, локальные placeholder-идентификаторы или серийные номера "по догадке". Разрешены только явно оговорённые ниже server-side fallback-правила ingest.
| Тип | Поведение |
|-----|-----------|
| CPU | Генерируется: `{board_serial}-CPU-{socket}` |
| PCIe | Генерируется: `{board_serial}-PCIE-{slot}` (если serial = `"N/A"` или пустой; `slot` для PCIe = BDF) |
| Memory | Компонент игнорируется |
| Storage | Компонент игнорируется |
| PSU | Компонент игнорируется |
Если `serial_number` не уникален внутри одного payload для того же `model`:
- Первое вхождение сохраняет оригинальный серийный номер.
- Каждое следующее дублирующее получает placeholder: `NO_SN-XXXXXXXX`.
---
## Минимальный валидный пример
```json
{
"collected_at": "2026-02-10T15:30:00Z",
"target_host": "192.168.1.100",
"hardware": {
"board": {
"serial_number": "SRV-001"
}
}
}
```
---
## Полный пример с историей статусов
```json
{
"filename": "redfish://10.10.10.103",
"source_type": "api",
"protocol": "redfish",
"target_host": "10.10.10.103",
"collected_at": "2026-02-10T15:30:00Z",
"hardware": {
"board": {
"manufacturer": "Supermicro",
"product_name": "X12DPG-QT6",
"serial_number": "21D634101"
},
"firmware": [
{ "device_name": "BIOS", "version": "06.08.05" },
{ "device_name": "BMC", "version": "5.17.00" }
],
"cpus": [
{
"socket": 0,
"model": "INTEL(R) XEON(R) GOLD 6530",
"manufacturer": "Intel",
"cores": 32,
"threads": 64,
"status": "OK"
}
],
"storage": [
{
"slot": "OB01",
"type": "NVMe",
"model": "INTEL SSDPF2KX076T1",
"size_gb": 7680,
"serial_number": "BTAX41900GF87P6DGN",
"manufacturer": "Intel",
"firmware": "9CV10510",
"present": true,
"status": "OK",
"status_changed_at": "2026-02-10T15:22:00Z",
"status_history": [
{ "status": "Critical", "changed_at": "2026-02-10T15:10:00Z", "details": "I/O timeout on NVMe queue 3" },
{ "status": "OK", "changed_at": "2026-02-10T15:22:00Z", "details": "Recovered after controller reset" }
]
}
],
"pcie_devices": [
{
"slot": "0000:18:00.0",
"device_class": "EthernetController",
"manufacturer": "Intel",
"model": "X710 10GbE",
"serial_number": "K65472-003",
"mac_addresses": ["3c:fd:fe:aa:bb:cc", "3c:fd:fe:aa:bb:cd"],
"status": "OK"
}
],
"power_supplies": [
{
"slot": "0",
"present": true,
"model": "GW-CRPS3000LW",
"vendor": "Great Wall",
"wattage_w": 3000,
"serial_number": "2P06C102610",
"firmware": "00.03.05",
"status": "OK",
"input_power_w": 137,
"output_power_w": 104,
"input_voltage": 215.25
}
],
"sensors": {
"fans": [
{ "name": "FAN1", "location": "Front", "rpm": 4200, "status": "OK" }
],
"power": [
{ "name": "12V Rail", "voltage_v": 12.06, "status": "OK" }
],
"temperatures": [
{ "name": "CPU0 Temp", "celsius": 46.0, "threshold_warning_celsius": 80.0, "threshold_critical_celsius": 95.0, "status": "OK" }
],
"other": [
{ "name": "System Humidity", "value": 38.5, "unit": "%" }
]
}
}
}
```

View File

@@ -0,0 +1,343 @@
# MSI BMC Redfish API Reference
Source: MSI Enterprise Platform Solutions — Redfish BMC User Guide v1.0 (AMI/MegaRAC stack).
Spec compliance: DSP0266 1.15.1, DSP8010 2019.2.
> This document is trimmed to sections relevant to LOGPile collection and inventory analysis.
> Auth, LDAP/AD, SMTP, VirtualMedia, Certificates, RADIUS, Composability, and BMC config
> sections are omitted.
---
## Supported HTTP methods
`GET`, `POST`, `PATCH`, `DELETE`. Unsupported methods return `405`.
PATCH requires an `If-Match` / `ETag` precondition header; missing header → `428`, mismatch → `412`.
---
## 1. Core Redfish API endpoints
| Resource | URI | Schema |
|---|---|---|
| Service Root | `/redfish/v1/` | ServiceRoot.v1_7_0 |
| ComputerSystem Collection | `/redfish/v1/Systems` | ComputerSystemCollection |
| ComputerSystem | `/redfish/v1/Systems/{sys}` | ComputerSystem.v1_16_2 |
| Memory Collection | `/redfish/v1/Systems/{sys}/Memory` | MemoryCollection |
| Memory | `/redfish/v1/Systems/{sys}/Memory/{mem}` | Memory.v1_19_0 |
| MemoryMetrics | `/redfish/v1/Systems/{sys}/Memory/{mem}/MemoryMetrics` | MemoryMetrics.v1_7_0 |
| MemoryDomain Collection | `/redfish/v1/Systems/{sys}/MemoryDomain` | MemoryDomainCollection |
| MemoryDomain | `/redfish/v1/Systems/{sys}/MemoryDomain/{dom}` | MemoryDomain.v1_2_3 |
| MemoryChunks Collection | `/redfish/v1/Systems/{sys}/MemoryDomain/{dom}/MemoryChunks` | MemoryChunksCollection |
| MemoryChunks | `/redfish/v1/Systems/{sys}/MemoryDomain/{dom}/MemoryChunks/{chunk}` | MemoryChunks.v1_4_0 |
| Processor Collection | `/redfish/v1/Systems/{sys}/Processors` | ProcessorCollection |
| Processor | `/redfish/v1/Systems/{sys}/Processors/{proc}` | Processor.v1_15_0 |
| SubProcessors Collection | `/redfish/v1/Systems/{sys}/Processors/{proc}/SubProcessors` | ProcessorCollection |
| SubProcessor | `/redfish/v1/Systems/{sys}/Processors/{proc}/SubProcessors/{sub}` | Processor.v1_15_0 |
| ProcessorMetrics | `/redfish/v1/Systems/{sys}/Processors/{proc}/ProcessorMetrics` | ProcessorMetrics.v1_4_0 |
| Bios | `/redfish/v1/Systems/{sys}/Bios` | Bios.v1_2_0 |
| SimpleStorage Collection | `/redfish/v1/Systems/{sys}/SimpleStorage` | SimpleStorageCollection |
| SimpleStorage | `/redfish/v1/Systems/{sys}/SimpleStorage/{ss}` | SimpleStorage.v1_3_0 |
| Storage Collection | `/redfish/v1/Systems/{sys}/Storage` | StorageCollection |
| Storage | `/redfish/v1/Systems/{sys}/Storage/{stor}` | Storage.v1_9_0 |
| StorageController Collection | `/redfish/v1/Systems/{sys}/Storage/{stor}/Controllers` | StorageControllerCollection |
| StorageController | `/redfish/v1/Systems/{sys}/Storage/{stor}/Controllers/{ctrl}` | StorageController.v1_0_0 |
| Drive | `/redfish/v1/Systems/{sys}/Storage/{stor}/Drives/{drv}` | Drive.v1_13_0 |
| Volume Collection | `/redfish/v1/Systems/{sys}/Storage/{stor}/Volumes` | VolumeCollection |
| Volume | `/redfish/v1/Systems/{sys}/Storage/{stor}/Volumes/{vol}` | Volume.v1_5_0 |
| NetworkInterface Collection | `/redfish/v1/Systems/{sys}/NetworkInterfaces` | NetworkInterfaceCollection |
| NetworkInterface | `/redfish/v1/Systems/{sys}/NetworkInterfaces/{nic}` | NetworkInterface.v1_2_0 |
| EthernetInterface (System) | `/redfish/v1/Systems/{sys}/EthernetInterfaces/{eth}` | EthernetInterface.v1_6_2 |
| GraphicsController Collection | `/redfish/v1/Systems/{sys}/GraphicsControllers` | GraphicsControllerCollection |
| GraphicsController | `/redfish/v1/Systems/{sys}/GraphicsControllers/{gpu}` | GraphicsController.v1_0_0 |
| USBController Collection | `/redfish/v1/Systems/{sys}/USBControllers` | USBControllerCollection |
| USBController | `/redfish/v1/Systems/{sys}/USBControllers/{usb}` | USBController.v1_0_0 |
| SecureBoot | `/redfish/v1/Systems/{sys}/SecureBoot` | SecureBoot.v1_1_0 |
| LogService Collection (System) | `/redfish/v1/Systems/{sys}/LogServices` | LogServiceCollection |
| LogService (System) | `/redfish/v1/Systems/{sys}/LogServices/{log}` | LogService.v1_1_3 |
| LogEntry Collection | `/redfish/v1/Systems/{sys}/LogServices/{log}/Entries` | LogEntryCollection |
| LogEntry | `/redfish/v1/Systems/{sys}/LogServices/{log}/Entries/{entry}` | LogEntry.v1_12_0 |
| Chassis Collection | `/redfish/v1/Chassis` | ChassisCollection |
| Chassis | `/redfish/v1/Chassis/{ch}` | Chassis.v1_15_0 |
| Power | `/redfish/v1/Chassis/{ch}/Power` | Power.v1_5_4 |
| PowerSubSystem | `/redfish/v1/Chassis/{ch}/PowerSubSystem` | PowerSubsystem.v1_1_0 |
| PowerSupplies Collection | `/redfish/v1/Chassis/{ch}/PowerSubSystem/PowerSupplies` | PowerSupplyCollection |
| PowerSupply | `/redfish/v1/Chassis/{ch}/PowerSubSystem/PowerSupplies/{psu}` | PowerSupply.v1_3_0 |
| PowerSupplyMetrics | `/redfish/v1/Chassis/{ch}/PowerSubSystem/PowerSupplies/{psu}/Metrics` | PowerSupplyMetrics.v1_0_1 |
| Thermal | `/redfish/v1/Chassis/{ch}/Thermal` | Thermal.v1_5_3 |
| ThermalSubSystem | `/redfish/v1/Chassis/{ch}/ThermalSubSystem` | ThermalSubsystem.v1_0_0 |
| ThermalMetrics | `/redfish/v1/Chassis/{ch}/ThermalSubSystem/ThermalMetrics` | ThermalMetrics.v1_0_1 |
| Fans Collection | `/redfish/v1/Chassis/{ch}/ThermalSubSystem/Fans` | FanCollection |
| Fan | `/redfish/v1/Chassis/{ch}/ThermalSubSystem/Fans/{fan}` | Fan.v1_1_1 |
| Sensor Collection | `/redfish/v1/Chassis/{ch}/Sensors` | SensorCollection |
| Sensor | `/redfish/v1/Chassis/{ch}/Sensors/{sen}` | Sensor.v1_0_2 |
| PCIeDevice Collection | `/redfish/v1/Chassis/{ch}/PCIeDevices` | PCIeDeviceCollection |
| PCIeDevice | `/redfish/v1/Chassis/{ch}/PCIeDevices/{dev}` | PCIeDevice.v1_9_0 |
| PCIeFunction Collection | `/redfish/v1/Chassis/{ch}/PCIeDevices/{dev}/PCIeFunctions` | PCIeFunctionCollection |
| PCIeFunction | `/redfish/v1/Chassis/{ch}/PCIeDevices/{dev}/PCIeFunctions/{fn}` | PCIeFunction.v1_2_3 |
| PCIeSlots | `/redfish/v1/Chassis/{ch}/PCIeSlots` | PCIeSlots.v1_5_0 |
| NetworkAdapter Collection | `/redfish/v1/Chassis/{ch}/NetworkAdapters` | NetworkAdapterCollection |
| NetworkAdapter | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}` | NetworkAdapter.v1_8_0 |
| NetworkDeviceFunction Collection | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/NetworkDeviceFunctions` | NetworkDeviceFunctionCollection |
| NetworkDeviceFunction | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/NetworkDeviceFunctions/{fn}` | NetworkDeviceFunction.v1_5_0 |
| Assembly | `/redfish/v1/Chassis/{ch}/Assembly` | Assembly.v1_2_2 |
| Assembly (Drive) | `/redfish/v1/Systems/{sys}/Storage/{stor}/Drives/{drv}/Assembly` | Assembly.v1_2_2 |
| Assembly (Processor) | `/redfish/v1/Systems/{sys}/Processors/{proc}/Assembly` | Assembly.v1_2_2 |
| Assembly (Memory) | `/redfish/v1/Systems/{sys}/Memory/{mem}/Assembly` | Assembly.v1_2_2 |
| Assembly (NetworkAdapter) | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/Assembly` | Assembly.v1_2_2 |
| Assembly (PCIeDevice) | `/redfish/v1/Chassis/{ch}/PCIeDevices/{dev}/Assembly` | Assembly.v1_2_2 |
| MediaController Collection | `/redfish/v1/Chassis/{ch}/MediaControllers` | MediaControllerCollection |
| MediaController | `/redfish/v1/Chassis/{ch}/MediaControllers/{mc}` | MediaController.v1_1_0 |
| LogService Collection (Chassis) | `/redfish/v1/Chassis/{ch}/LogServices` | LogServiceCollection |
| LogService (Chassis) | `/redfish/v1/Chassis/{ch}/LogServices/{log}` | LogService.v1_1_3 |
| Manager Collection | `/redfish/v1/Managers` | ManagerCollection |
| Manager | `/redfish/v1/Managers/{mgr}` | Manager.v1_13_0 |
| EthernetInterface (Manager) | `/redfish/v1/Managers/{mgr}/EthernetInterfaces/{eth}` | EthernetInterface.v1_6_2 |
| LogService Collection (Manager) | `/redfish/v1/Managers/{mgr}/LogServices` | LogServiceCollection |
| LogService (Manager) | `/redfish/v1/Managers/{mgr}/LogServices/{log}` | LogService.v1_1_3 |
| UpdateService | `/redfish/v1/UpdateService` | UpdateService.v1_6_0 |
| TaskService | `/redfish/v1/TasksService` | TaskService.v1_1_4 |
| Task Collection | `/redfish/v1/TaskService/Tasks` | TaskCollection |
| Task | `/redfish/v1/TaskService/Tasks/{task}` | Task.v1_4_2 |
---
## 2. Telemetry API endpoints
| Resource | URI | Schema |
|---|---|---|
| TelemetryService | `/redfish/v1/TelemetryService` | TelemetryService.v1_2_1 |
| MetricDefinition Collection | `/redfish/v1/TelemetryService/MetricDefinitions` | MetricDefinitionCollection |
| MetricDefinition | `/redfish/v1/TelemetryService/MetricDefinitions/{md}` | MetricDefinition.v1_0_3 |
| MetricReportDefinition Collection | `/redfish/v1/TelemetryService/MetricReportDefinitions` | MetricReportDefinitionCollection |
| MetricReportDefinition | `/redfish/v1/TelemetryService/MetricReportDefinitions/{mrd}` | MetricReportDefinition.v1_3_0 |
| MetricReport Collection | `/redfish/v1/TelemetryService/MetricReports` | MetricReportCollection |
| MetricReport | `/redfish/v1/TelemetryService/MetricReports/{mr}` | MetricReport.v1_2_0 |
| Telemetry LogService | `/redfish/v1/TelemetryService/LogService` | LogService.v1_1_3 |
| Telemetry LogEntry Collection | `/redfish/v1/TelemetryService/LogService/Entries` | LogEntryCollection |
---
## 3. Processor / NIC sub-resources (GPU-relevant)
| Resource | URI |
|---|---|
| Processor (NetworkAdapter) | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/Processors/{proc}` |
| AccelerationFunction Collection | `/redfish/v1/Systems/{sys}/Processors/{proc}/AccelerationFunctions` |
| AccelerationFunction | `/redfish/v1/Systems/{sys}/Processors/{proc}/AccelerationFunctions/{fn}` |
| Port Collection (NetworkAdapter) | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/Ports` |
| Port (GraphicsController) | `/redfish/v1/Systems/{sys}/GraphicsControllers/{gpu}/Ports/{port}` |
| OperatingConfig Collection | `/redfish/v1/Systems/{sys}/Processors/{proc}/OperatingConfigs` |
| OperatingConfig | `/redfish/v1/Systems/{sys}/Processors/{proc}/OperatingConfigs/{cfg}` |
---
## 4. Error response format
On error, the service returns an HTTP status code and a JSON body with a single `error` property:
```json
{
"error": {
"code": "Base.1.12.0.ActionParameterMissing",
"message": "...",
"@Message.ExtendedInfo": [
{
"@odata.type": "#Message.v1_0_8.Message",
"MessageId": "Base.1.12.0.ActionParameterMissing",
"Message": "...",
"MessageArgs": [],
"Severity": "Warning",
"Resolution": "..."
}
]
}
}
```
**Common status codes:**
| Code | Meaning |
|------|---------|
| 200 | OK with body |
| 201 | Created |
| 204 | Success, no body |
| 400 | Bad request / validation error |
| 401 | Unauthorized |
| 403 | Forbidden / firmware update in progress |
| 404 | Resource not found |
| 405 | Method not allowed |
| 412 | ETag precondition failed (PATCH) |
| 415 | Unsupported media type |
| 428 | Missing precondition header (PATCH) |
| 501 | Not implemented |
**Request validation sequence:**
1. Authorization check → 401
2. Entity privilege check → 403
3. URI existence → 404
4. Firmware update lock → 403
5. Method allowed → 405
6. Media type → 415
7. Body format → 400
8. PATCH: ETag header → 428/412
9. Property validation → 400
---
## 5. OEM: Inventory refresh (AMI/MSI-specific)
### 5.1 InventoryCrc — force component re-inventory
`GET/POST/DELETE /redfish/v1/Systems/{sys}/Oem/Ami/Inventory/Crc`
The `GroupCrcList` field lists current CRC checksums per component group. When a group's CRC
changes (host sends new inventory) or is explicitly zeroed out via POST, the BMC discards its
cached inventory and re-reads that group from the host.
**CRC groups:**
| Group | Covers |
|-------|--------|
| `CPU` | Processors, ProcessorMetrics |
| `DIMM` | Memory, MemoryDomains, MemoryChunks, MemoryMetrics |
| `PCIE` | Storage, PCIeDevices, NetworkInterfaces, NetworkAdapters |
| `CERTIFICATES` | Boot Certificates |
| `SECURBOOT` | SecureBoot data |
**POST — invalidate selected groups (force re-inventory):**
```
POST /redfish/v1/Systems/{sys}/Oem/Ami/Inventory/Crc
Content-Type: application/json
{
"GroupCrcList": [
{ "CPU": 0 },
{ "DIMM": 0 },
{ "PCIE": 0 }
]
}
```
Setting a group's value to `0` signals the BMC to invalidate and repopulate that group on next
host inventory push (typically at next boot or host-interface inventory cycle).
**DELETE** — remove all CRC records entirely.
**Note:** Inventory data is populated by the host via the Redfish Host Interface (in-band),
not by the BMC itself. Zeroing a CRC group does not immediately re-read hardware — it marks
the group as stale so the next host-side inventory push will be accepted. A cold reboot is the
most reliable trigger.
### 5.2 InventoryData Status — monitor inventory processing
`GET /redfish/v1/Oem/Ami/InventoryData/Status`
Available only after the host has posted an inventory file. Shows current processing state.
**Status enum:**
| Value | Meaning |
|-------|---------|
| `BootInProgress` | Host is booting |
| `Queued` | Processing task queued |
| `In-Progress` | Processing running in background |
| `Ready` / `Completed` | Processing finished successfully |
| `Failed` | Processing failed |
Response also includes:
- `InventoryData.DeletedModules` — array of groups updated in this population cycle
- `InventoryData.Messages` — warnings/errors encountered during processing
- `ProcessingTime` — milliseconds taken
- `LastModifiedTime` — ISO 8601 timestamp of last successful update
### 5.3 Systems OEM properties — Inventory reference
`GET /redfish/v1/Systems/{sys}``Oem.Ami` contains:
| Property | Notes |
|----------|-------|
| `Inventory` | Reference to InventoryCrc URI + current GroupCrc data |
| `RedfishVersion` | BIOS Redfish version (populated via Host Interface) |
| `RtpVersion` | BIOS RTP version (populated via Host Interface) |
| `ManagerBootConfiguration.ManagerBootMode` | PATCH to trigger soft reset: `SoftReset` / `ResetTimeout` / `None` |
---
## 6. OEM: Component state actions
### 6.1 Memory enable/disable
```
POST /redfish/v1/Systems/{sys}/Memory/{mem}/Actions/AmiBios.ChangeState
Content-Type: application/json
{ "State": "Disabled" }
```
Response: 204.
### 6.2 PCIeFunction enable/disable
```
POST /redfish/v1/Chassis/{ch}/PCIeDevices/{dev}/PCIeFunctions/{fn}/Actions/AmiBios.ChangeState
Content-Type: application/json
{ "State": "Disabled" }
```
Response: 204.
---
## 7. OEM: Storage sensor readings
`GET /redfish/v1/Systems/{sys}/Storage/{stor}``Oem.Ami.StorageControllerSensors`
Array of sensor objects per storage controller instance. Each entry exposes:
- `Reading` (Number) — current sensor value
- `ReadingType` (String) — type of reading
- `ReadingUnit` (String) — unit
---
## 8. OEM: Power and Thermal OwnerLUN
Both `GET /redfish/v1/Chassis/{ch}/Power` and `GET /redfish/v1/Chassis/{ch}/Thermal` expose
`Oem.Ami.OwnerLUN` (Number, read-only) — the IPMI LUN associated with each
temperature/fan/voltage sensor entry. Useful for correlating Redfish sensor readings with IPMI
SDR records.
---
## 9. UpdateService
`GET /redfish/v1/UpdateService``Oem.Ami.BMC.DualImageConfiguration`:
| Property | Description |
|----------|-------------|
| `ActiveImage` | Currently active BMC image slot |
| `BootImage` | Image slot BMC boots from |
| `FirmwareImage1Name` / `FirmwareImage1Version` | First image slot name + version |
| `FirmwareImage2Name` / `FirmwareImage2Version` | Second image slot name + version |
Standard `SimpleUpdate` action available at `/redfish/v1/UpdateService/Actions/UpdateService.SimpleUpdate`.
---
## 10. Inventory refresh summary
| Approach | Trigger | Latency | Scope |
|----------|---------|---------|-------|
| Host reboot | Physical/soft reset | Minutes | All groups |
| `POST InventoryCrc` (groups = 0) | Explicit API call | Next host inventory push | Selected groups |
| Firmware update (`SimpleUpdate`) | Explicit API call | Minutes + reboot | Full platform |
| Sensor/telemetry reads | Always live on GET | Immediate | Sensors only |
**Key constraint:** `InventoryCrc POST` marks groups stale but does not re-read hardware
directly. Actual inventory data flows from the host to BMC via the Redfish Host Interface
in-band channel, typically during POST/boot. For immediate inventory refresh without a full
reboot, a soft reset via `ManagerBootMode: SoftReset` PATCH may be sufficient on some
configurations.

View File

@@ -1,28 +0,0 @@
# Test Server Collection Memory
Keep this table updated after each test-server run.
Definition:
- `Collection Time` = total Redfish collection duration from `collect.log`.
- `Speed` = `Documents / seconds`.
- `Metrics Collected` = sum of `Counts` fields (`cpus + memory + storage + pcie + gpus + nics + psus + firmware`).
- `n/a` means the log does not contain enough timestamp metadata to calculate duration/speed.
## Server Model: `NF5688M7`
| Date (UTC) | App Version | Collection Time | Documents | Speed | Metrics Collected | Notes |
|---|---|---:|---:|---:|---:|---|
| 2026-02-28 | `v1.7.1-12-g612058e` (`612058e`) | 10m10s (610s) | 228 | 0.37 docs/s | 98 | 2026-02-28 (SERVER MODEL) - 23E100043.zip |
| 2026-02-28 | `v1.7.1-11-ge0146ad` (`e0146ad`) | 9m36s (576s) | 138 | 0.24 docs/s | 110 | 2026-02-28 (SERVER MODEL) - 23E100042.zip |
| 2026-02-28 | `v1.7.1-10-g9a30705` (`9a30705`) | 20m47s (1247s) | 106 | 0.09 docs/s | 97 | 2026-02-28 (SERVER MODEL) - 23E100053.zip |
| 2026-02-28 | `v1.7.1` (`6c19a58`) | 15m08s (908s) | 184 | 0.20 docs/s | 96 | 2026-02-28 (DDR5 DIMM) - 23E100051.zip |
| 2026-02-28 | `v1.7.0` (`ddab93a`) | n/a | 193 | n/a | 61 | 2026-02-28 (NULL) - 23E100051.zip |
| 2026-02-28 | `v1.7.0` (`ddab93a`) | n/a | 291 | n/a | 61 | 2026-02-28 (NULL) - 23E100206.zip |
## Server Model: `KR1280-X2-A0-R0-00`
| Date (UTC) | App Version | Collection Time | Documents | Speed | Metrics Collected | Notes |
|---|---|---:|---:|---:|---:|---|
| 2026-02-28 | `v1.7.1-12-g612058e` (`612058e`) | 6m15s (375s) | 185 | 0.49 docs/s | 46 | 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657.zip |
| 2026-02-28 | `v1.7.1-9-g8dbbec3-dirty` (`8dbbec3`) | 6m16s (376s) | 165 | 0.44 docs/s | 46 | 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657-2.zip |
| 2026-02-28 | `v1.7.1-7-gc52fea2` (`c52fea2`) | 10m51s (651s) | 227 | 0.35 docs/s | 40 | 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657 copy.zip |

6
go.mod
View File

@@ -1,3 +1,7 @@
module git.mchus.pro/mchus/logpile
go 1.22
go 1.24.0
require reanimator/chart v0.0.0
replace reanimator/chart => ./internal/chart

1
internal/chart Submodule

Submodule internal/chart added at c025ae0477

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,392 @@
package collector
import (
"context"
"log"
"net/http"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
const (
redfishLogEntriesWindow = 7 * 24 * time.Hour
redfishLogEntriesMaxTotal = 500
redfishLogEntriesMaxPerSvc = 200
)
// collectRedfishLogEntries fetches hardware event log entries from Systems and Managers LogServices.
// Only hardware-relevant entries from the last 7 days are returned.
// For Systems: all log services except audit/journal/security/debug.
// For Managers: only the IPMI SEL service (Id="SEL") — audit and event logs are excluded.
func (c *RedfishConnector) collectRedfishLogEntries(ctx context.Context, client *http.Client, req Request, baseURL string, systemPaths, managerPaths []string) []map[string]interface{} {
cutoff := time.Now().UTC().Add(-redfishLogEntriesWindow)
seen := make(map[string]struct{})
var out []map[string]interface{}
collectFrom := func(logServicesPath string, filter func(map[string]interface{}) bool) {
if len(out) >= redfishLogEntriesMaxTotal {
return
}
services, err := c.getCollectionMembers(ctx, client, req, baseURL, logServicesPath)
if err != nil || len(services) == 0 {
return
}
for _, svc := range services {
if len(out) >= redfishLogEntriesMaxTotal {
break
}
if !filter(svc) {
continue
}
entriesPath := redfishLogServiceEntriesPath(svc)
if entriesPath == "" {
continue
}
entries := c.fetchRedfishLogEntriesWithPaging(ctx, client, req, baseURL, entriesPath, cutoff, seen, redfishLogEntriesMaxPerSvc)
out = append(out, entries...)
}
}
for _, systemPath := range systemPaths {
collectFrom(joinPath(systemPath, "/LogServices"), isHardwareLogService)
}
// Managers hold the IPMI SEL on AMI/MSI BMCs — include only the "SEL" service.
for _, managerPath := range managerPaths {
collectFrom(joinPath(managerPath, "/LogServices"), isManagerSELService)
}
if len(out) > 0 {
log.Printf("redfish: collected %d hardware log entries (Systems+Managers SEL, window=7d)", len(out))
}
return out
}
// fetchRedfishLogEntriesWithPaging fetches entries from a LogEntry collection,
// following nextLink pages. Stops early when entries older than cutoff are encountered
// (assumes BMC returns entries newest-first, which is typical).
func (c *RedfishConnector) fetchRedfishLogEntriesWithPaging(ctx context.Context, client *http.Client, req Request, baseURL, entriesPath string, cutoff time.Time, seen map[string]struct{}, limit int) []map[string]interface{} {
var out []map[string]interface{}
nextPath := entriesPath
for nextPath != "" && len(out) < limit {
collection, err := c.getJSON(ctx, client, req, baseURL, nextPath)
if err != nil {
break
}
// Handle both linked members (@odata.id only) and inline members (full objects).
rawMembers, _ := collection["Members"].([]interface{})
hitOldEntry := false
for _, rawMember := range rawMembers {
if len(out) >= limit {
break
}
memberMap, ok := rawMember.(map[string]interface{})
if !ok {
continue
}
var entry map[string]interface{}
if _, hasCreated := memberMap["Created"]; hasCreated {
// Inline entry — use directly.
entry = memberMap
} else {
// Linked entry — fetch by path.
memberPath := normalizeRedfishPath(asString(memberMap["@odata.id"]))
if memberPath == "" {
continue
}
entry, err = c.getJSON(ctx, client, req, baseURL, memberPath)
if err != nil || len(entry) == 0 {
continue
}
}
// Dedup by entry Id or path.
entryKey := asString(entry["Id"])
if entryKey == "" {
entryKey = asString(entry["@odata.id"])
}
if entryKey != "" {
if _, dup := seen[entryKey]; dup {
continue
}
seen[entryKey] = struct{}{}
}
// Time filter.
created := parseRedfishEntryTime(asString(entry["Created"]))
if !created.IsZero() && created.Before(cutoff) {
hitOldEntry = true
continue
}
// Hardware relevance filter.
if !isHardwareLogEntry(entry) {
continue
}
out = append(out, entry)
}
// Stop paging once we've seen entries older than the window.
if hitOldEntry {
break
}
nextPath = firstNonEmpty(
normalizeRedfishPath(asString(collection["Members@odata.nextLink"])),
normalizeRedfishPath(asString(collection["@odata.nextLink"])),
)
}
return out
}
// isManagerSELService returns true only for the IPMI SEL exposed under Managers.
// On AMI/MSI BMCs the hardware SEL lives at Managers/{mgr}/LogServices/SEL.
// All other Manager log services (AuditLog, EventLog, Journal) are excluded.
func isManagerSELService(svc map[string]interface{}) bool {
id := strings.ToLower(strings.TrimSpace(asString(svc["Id"])))
return id == "sel"
}
// isHardwareLogService returns true if the log service looks like a hardware event log
// (SEL, System Event Log) rather than a BMC audit/journal log.
func isHardwareLogService(svc map[string]interface{}) bool {
id := strings.ToLower(strings.TrimSpace(asString(svc["Id"])))
name := strings.ToLower(strings.TrimSpace(asString(svc["Name"])))
for _, skip := range []string{"audit", "journal", "bmc", "security", "manager", "debug"} {
if strings.Contains(id, skip) || strings.Contains(name, skip) {
return false
}
}
return true
}
// redfishLogServiceEntriesPath returns the Entries collection path for a LogService document.
func redfishLogServiceEntriesPath(svc map[string]interface{}) string {
if entriesLink, ok := svc["Entries"].(map[string]interface{}); ok {
if p := normalizeRedfishPath(asString(entriesLink["@odata.id"])); p != "" {
return p
}
}
if id := normalizeRedfishPath(asString(svc["@odata.id"])); id != "" {
return joinPath(id, "/Entries")
}
return ""
}
// isHardwareLogEntry returns true if the log entry is hardware-related.
// Audit, authentication, and session events are excluded.
func isHardwareLogEntry(entry map[string]interface{}) bool {
entryType := strings.TrimSpace(asString(entry["EntryType"]))
if strings.EqualFold(entryType, "Oem") {
return false
}
msgID := strings.ToLower(strings.TrimSpace(asString(entry["MessageId"])))
for _, skip := range []string{
"user", "account", "password", "login", "logon", "session",
"auth", "certificate", "security", "credential", "privilege",
} {
if strings.Contains(msgID, skip) {
return false
}
}
// Also check the human-readable message for obvious audit patterns.
msg := strings.ToLower(strings.TrimSpace(asString(entry["Message"])))
for _, skip := range []string{"logged in", "logged out", "log in", "log out", "sign in", "signed in"} {
if strings.Contains(msg, skip) {
return false
}
}
return true
}
// parseRedfishEntryTime parses a Redfish LogEntry Created timestamp (ISO 8601 / RFC 3339).
func parseRedfishEntryTime(raw string) time.Time {
raw = strings.TrimSpace(raw)
if raw == "" {
return time.Time{}
}
for _, layout := range []string{time.RFC3339, time.RFC3339Nano, "2006-01-02T15:04:05Z07:00"} {
if t, err := time.Parse(layout, raw); err == nil {
return t.UTC()
}
}
return time.Time{}
}
// parseRedfishLogEntries converts raw log entries stored in RawPayloads into models.Event slice.
// Called during Redfish replay for both live and offline (archive) collections.
func parseRedfishLogEntries(rawPayloads map[string]any, collectedAt time.Time) []models.Event {
raw, ok := rawPayloads["redfish_log_entries"]
if !ok {
return nil
}
var entries []map[string]interface{}
switch v := raw.(type) {
case []map[string]interface{}:
entries = v
case []interface{}:
for _, item := range v {
if m, ok := item.(map[string]interface{}); ok {
entries = append(entries, m)
}
}
default:
return nil
}
if len(entries) == 0 {
return nil
}
out := make([]models.Event, 0, len(entries))
for _, entry := range entries {
ev := redfishLogEntryToEvent(entry, collectedAt)
if ev == nil {
continue
}
out = append(out, *ev)
}
return out
}
// redfishLogEntryToEvent converts a single Redfish LogEntry document to models.Event.
func redfishLogEntryToEvent(entry map[string]interface{}, collectedAt time.Time) *models.Event {
// Prefer EventTimestamp (actual hardware event time) over Created (Redfish record creation time).
ts := parseRedfishEntryTime(asString(entry["EventTimestamp"]))
if ts.IsZero() {
ts = parseRedfishEntryTime(asString(entry["Created"]))
}
if ts.IsZero() {
ts = collectedAt
}
severity := redfishLogEntrySeverity(entry)
sensorType := strings.TrimSpace(asString(entry["SensorType"]))
messageID := strings.TrimSpace(asString(entry["MessageId"]))
entryType := strings.TrimSpace(asString(entry["EntryType"]))
entryCode := strings.TrimSpace(asString(entry["EntryCode"]))
// SensorName: prefer "Name", fall back to "SensorNumber" + SensorType.
sensorName := strings.TrimSpace(asString(entry["Name"]))
if sensorName == "" {
num := strings.TrimSpace(asString(entry["SensorNumber"]))
if num != "" && sensorType != "" {
sensorName = sensorType + " " + num
}
}
rawMessage := strings.TrimSpace(asString(entry["Message"]))
// AMI/MSI BMCs dump raw IPMI record fields into Message instead of human-readable text.
// Detect this and build a readable description from structured fields instead.
description, rawData := redfishDecodeMessage(rawMessage, sensorType, entryCode, entry)
if description == "" {
return nil
}
return &models.Event{
ID: messageID,
Timestamp: ts,
Source: "redfish",
SensorType: sensorType,
SensorName: sensorName,
EventType: entryType,
Severity: severity,
Description: description,
RawData: rawData,
}
}
// redfishDecodeMessage returns a human-readable description and optional raw data.
// AMI/MSI BMCs dump raw IPMI record fields into Message as "Key : Value, Key : Value, ..."
// instead of a plain human-readable string. We extract the useful decoded fields from it.
func redfishDecodeMessage(message, sensorType, entryCode string, entry map[string]interface{}) (description, rawData string) {
if !isRawIPMIDump(message) {
description = message
return
}
rawData = message
kv := parseIPMIDumpKV(message)
// Sensor_Type inside the dump is more specific than the top-level SensorType field.
if v := kv["Sensor_Type"]; v != "" {
sensorType = v
}
eventType := kv["Event_Type"] // human-readable IPMI event type, e.g. "Legacy OFF State"
var parts []string
if sensorType != "" {
parts = append(parts, sensorType)
}
if eventType != "" {
parts = append(parts, eventType)
} else if entryCode != "" {
parts = append(parts, entryCode)
}
description = strings.Join(parts, ": ")
return
}
// isRawIPMIDump returns true if the message is an AMI raw IPMI record dump.
func isRawIPMIDump(message string) bool {
return strings.Contains(message, "Event_Data_1 :") && strings.Contains(message, "Record_Type :")
}
// parseIPMIDumpKV parses the AMI "Key : Value, Key : Value, " format into a map.
func parseIPMIDumpKV(message string) map[string]string {
out := make(map[string]string)
for _, part := range strings.Split(message, ",") {
part = strings.TrimSpace(part)
idx := strings.Index(part, " : ")
if idx < 0 {
continue
}
k := strings.TrimSpace(part[:idx])
v := strings.TrimSpace(part[idx+3:])
if k != "" && v != "" {
out[k] = v
}
}
return out
}
// redfishLogEntrySeverity maps a Redfish LogEntry to models.Severity.
// AMI/MSI BMCs often set Severity="OK" on all SEL records regardless of content,
// so we fall back to inferring severity from SensorType when the explicit field is unhelpful.
func redfishLogEntrySeverity(entry map[string]interface{}) models.Severity {
// Newer Redfish uses MessageSeverity; older uses Severity.
raw := strings.ToLower(firstNonEmpty(
strings.TrimSpace(asString(entry["MessageSeverity"])),
strings.TrimSpace(asString(entry["Severity"])),
))
switch raw {
case "critical":
return models.SeverityCritical
case "warning":
return models.SeverityWarning
case "ok", "informational", "":
// BMC didn't set a meaningful severity — infer from SensorType.
return redfishSeverityFromSensorType(strings.TrimSpace(asString(entry["SensorType"])))
default:
return models.SeverityInfo
}
}
// redfishSeverityFromSensorType infers event severity from the IPMI/Redfish SensorType string.
func redfishSeverityFromSensorType(sensorType string) models.Severity {
switch strings.ToLower(sensorType) {
case "critical interrupt", "processor", "memory", "power unit",
"power supply", "drive slot", "system firmware progress":
return models.SeverityWarning
default:
return models.SeverityInfo
}
}

View File

@@ -0,0 +1,57 @@
package collector
import "testing"
func TestShouldIncludeCriticalPlanBPath(t *testing.T) {
tests := []struct {
name string
req Request
path string
want bool
}{
{
name: "skip hgx erot pcie without extended diagnostics",
req: Request{},
path: "/redfish/v1/Chassis/HGX_ERoT_NVSwitch_0/PCIeDevices",
want: false,
},
{
name: "skip hgx chassis assembly without extended diagnostics",
req: Request{},
path: "/redfish/v1/Chassis/HGX_Chassis_0/Assembly",
want: false,
},
{
name: "keep standard chassis inventory without extended diagnostics",
req: Request{},
path: "/redfish/v1/Chassis/1/PCIeDevices",
want: true,
},
{
name: "keep nvme storage backplane drives without extended diagnostics",
req: Request{},
path: "/redfish/v1/Chassis/NVMeSSD.0.Group.0.StorageBackplane/Drives",
want: true,
},
{
name: "keep system processors without extended diagnostics",
req: Request{},
path: "/redfish/v1/Systems/HGX_Baseboard_0/Processors",
want: true,
},
{
name: "include hgx erot pcie when extended diagnostics enabled",
req: Request{DebugPayloads: true},
path: "/redfish/v1/Chassis/HGX_ERoT_NVSwitch_0/PCIeDevices",
want: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := shouldIncludeCriticalPlanBPath(tt.req, tt.path); got != tt.want {
t.Fatalf("shouldIncludeCriticalPlanBPath(%q) = %v, want %v", tt.path, got, tt.want)
}
})
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,159 @@
package collector
import (
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
func (r redfishSnapshotReader) collectBoardFallbackDocs(systemPaths, chassisPaths []string) []map[string]interface{} {
out := make([]map[string]interface{}, 0)
for _, chassisPath := range chassisPaths {
for _, suffix := range []string{"/Boards", "/Backplanes"} {
path := joinPath(chassisPath, suffix)
if docs, err := r.getCollectionMembers(path); err == nil && len(docs) > 0 {
out = append(out, docs...)
continue
}
if doc, err := r.getJSON(path); err == nil && len(doc) > 0 {
out = append(out, doc)
}
}
}
for _, path := range append(append([]string{}, systemPaths...), chassisPaths...) {
for _, suffix := range []string{"/Oem/Public", "/Oem/Public/ThermalConfig", "/ThermalConfig"} {
docPath := joinPath(path, suffix)
if doc, err := r.getJSON(docPath); err == nil && len(doc) > 0 {
out = append(out, doc)
}
}
}
return out
}
func applyBoardInfoFallbackFromDocs(board *models.BoardInfo, docs []map[string]interface{}) {
if board == nil || len(docs) == 0 {
return
}
for _, doc := range docs {
candidate := parseBoardInfoFromFRUDoc(doc)
if !isLikelyServerProductName(candidate.ProductName) {
continue
}
if board.Manufacturer == "" {
board.Manufacturer = candidate.Manufacturer
}
if board.ProductName == "" {
board.ProductName = candidate.ProductName
}
if board.SerialNumber == "" {
board.SerialNumber = candidate.SerialNumber
}
if board.PartNumber == "" {
board.PartNumber = candidate.PartNumber
}
if board.Manufacturer != "" && board.ProductName != "" && board.SerialNumber != "" && board.PartNumber != "" {
return
}
}
}
func isLikelyServerProductName(v string) bool {
v = strings.TrimSpace(v)
if v == "" {
return false
}
n := strings.ToUpper(v)
if strings.Contains(n, "NULL") {
return false
}
componentTokens := []string{
"DIMM", "DDR", "NVME", "SSD", "HDD", "GPU", "NIC", "RAID",
"PSU", "FAN", "BACKPLANE", "FRU",
}
for _, token := range componentTokens {
if strings.Contains(n, strings.ToUpper(token)) {
return false
}
}
return true
}
// collectAssemblyFRU reads Chassis/*/Assembly documents and returns FRU entries
// for subcomponents (backplanes, PSUs, DIMMs, etc.) that carry meaningful
// serial or part numbers. Entries already present in dedicated collections
// (PSUs, DIMMs) are included here as well so that all FRU data is available
// in one place; deduplication by serial is performed.
func (r redfishSnapshotReader) collectAssemblyFRU(chassisPaths []string) []models.FRUInfo {
seen := make(map[string]struct{})
var out []models.FRUInfo
add := func(fru models.FRUInfo) {
key := strings.ToUpper(strings.TrimSpace(fru.SerialNumber))
if key == "" {
key = strings.ToUpper(strings.TrimSpace(fru.Description + "|" + fru.PartNumber))
}
if key == "" || key == "|" {
return
}
if _, ok := seen[key]; ok {
return
}
seen[key] = struct{}{}
out = append(out, fru)
}
for _, chassisPath := range chassisPaths {
doc, err := r.getJSON(joinPath(chassisPath, "/Assembly"))
if err != nil || len(doc) == 0 {
continue
}
assemblies, _ := doc["Assemblies"].([]interface{})
for _, aAny := range assemblies {
a, ok := aAny.(map[string]interface{})
if !ok {
continue
}
name := strings.TrimSpace(firstNonEmpty(asString(a["Name"]), asString(a["Description"])))
model := strings.TrimSpace(asString(a["Model"]))
partNumber := strings.TrimSpace(asString(a["PartNumber"]))
serial := extractAssemblySerial(a)
if serial == "" && partNumber == "" {
continue
}
add(models.FRUInfo{
Description: name,
ProductName: model,
SerialNumber: serial,
PartNumber: partNumber,
})
}
}
return out
}
// extractAssemblySerial tries to find a serial number in an Assembly entry.
// Standard Redfish Assembly has no top-level SerialNumber; vendors put it in Oem.
func extractAssemblySerial(a map[string]interface{}) string {
if s := strings.TrimSpace(asString(a["SerialNumber"])); s != "" {
return s
}
oem, _ := a["Oem"].(map[string]interface{})
for _, v := range oem {
subtree, ok := v.(map[string]interface{})
if !ok {
continue
}
for _, v2 := range subtree {
node, ok := v2.(map[string]interface{})
if !ok {
continue
}
if s := strings.TrimSpace(asString(node["SerialNumber"])); s != "" {
return s
}
}
}
return ""
}

View File

@@ -0,0 +1,198 @@
package collector
import (
"fmt"
"strings"
"git.mchus.pro/mchus/logpile/internal/collector/redfishprofile"
"git.mchus.pro/mchus/logpile/internal/models"
)
func (r redfishSnapshotReader) collectGPUs(systemPaths, chassisPaths []string, plan redfishprofile.ResolvedAnalysisPlan) []models.GPU {
collections := make([]string, 0, len(systemPaths)*3+len(chassisPaths)*2)
for _, systemPath := range systemPaths {
collections = append(collections, joinPath(systemPath, "/PCIeDevices"))
collections = append(collections, joinPath(systemPath, "/Accelerators"))
collections = append(collections, joinPath(systemPath, "/GraphicsControllers"))
}
for _, chassisPath := range chassisPaths {
collections = append(collections, joinPath(chassisPath, "/PCIeDevices"))
collections = append(collections, joinPath(chassisPath, "/Accelerators"))
}
var out []models.GPU
seen := make(map[string]struct{})
idx := 1
for _, collectionPath := range collections {
memberDocs, err := r.getCollectionMembers(collectionPath)
if err != nil || len(memberDocs) == 0 {
continue
}
for _, doc := range memberDocs {
functionDocs := r.getLinkedPCIeFunctions(doc)
if !looksLikeGPU(doc, functionDocs) {
continue
}
supplementalDocs := r.getLinkedSupplementalDocs(doc, "EnvironmentMetrics", "Metrics")
for _, fn := range functionDocs {
supplementalDocs = append(supplementalDocs, r.getLinkedSupplementalDocs(fn, "EnvironmentMetrics", "Metrics")...)
}
gpu := parseGPUWithSupplementalDocs(doc, functionDocs, supplementalDocs, idx)
idx++
if plan.Directives.EnableGenericGraphicsControllerDedup && shouldSkipGenericGPUDuplicate(out, gpu) {
continue
}
key := gpuDocDedupKey(doc, gpu)
if key == "" {
continue
}
if _, ok := seen[key]; ok {
continue
}
seen[key] = struct{}{}
out = append(out, gpu)
}
}
if plan.Directives.EnableGenericGraphicsControllerDedup {
return dropModelOnlyGPUPlaceholders(out)
}
return out
}
// msiGhostGPUFilter returns true when the GPU chassis for gpuID shows a temperature
// of 0 on a powered-on host, which is the reliable MSI/AMI signal that the GPU is
// no longer physically installed (stale BMC inventory cache).
// It only filters when the system PowerState is "On" — when the host is off, all
// temperature readings are 0 and we cannot distinguish absent from idle.
func (r redfishSnapshotReader) msiGhostGPUFilter(systemPaths []string, gpuID, chassisPath string) bool {
// Require host powered on.
for _, sp := range systemPaths {
doc, err := r.getJSON(sp)
if err != nil {
continue
}
if !strings.EqualFold(strings.TrimSpace(asString(doc["PowerState"])), "on") {
return false
}
break
}
// Read the temperature sensor for this GPU chassis.
sensorPath := joinPath(chassisPath, "/Sensors/"+gpuID+"_Temperature")
sensorDoc, err := r.getJSON(sensorPath)
if err != nil || len(sensorDoc) == 0 {
return false
}
reading, ok := sensorDoc["Reading"]
if !ok {
return false
}
switch v := reading.(type) {
case float64:
return v == 0
case int:
return v == 0
case int64:
return v == 0
}
return false
}
// collectGPUsFromProcessors finds GPUs that some BMCs (e.g. MSI) expose as
// Processor entries with ProcessorType=GPU rather than as PCIe devices.
// It supplements the existing gpus slice (already found via PCIe path),
// skipping entries already present by UUID or SerialNumber.
// Serial numbers are looked up from Chassis members named after each GPU Id.
func (r redfishSnapshotReader) collectGPUsFromProcessors(systemPaths, chassisPaths []string, existing []models.GPU, plan redfishprofile.ResolvedAnalysisPlan) []models.GPU {
if !plan.Directives.EnableProcessorGPUFallback {
return append([]models.GPU{}, existing...)
}
chassisByID := make(map[string]map[string]interface{})
chassisPathByID := make(map[string]string)
for _, cp := range chassisPaths {
doc, err := r.getJSON(cp)
if err != nil || len(doc) == 0 {
continue
}
id := strings.TrimSpace(asString(doc["Id"]))
if id != "" {
chassisByID[strings.ToUpper(id)] = doc
chassisPathByID[strings.ToUpper(id)] = cp
}
}
seenUUID := make(map[string]struct{})
seenSerial := make(map[string]struct{})
for _, g := range existing {
if u := strings.ToUpper(strings.TrimSpace(g.UUID)); u != "" {
seenUUID[u] = struct{}{}
}
if s := strings.ToUpper(strings.TrimSpace(g.SerialNumber)); s != "" {
seenSerial[s] = struct{}{}
}
}
out := append([]models.GPU{}, existing...)
idx := len(existing) + 1
for _, systemPath := range systemPaths {
procDocs, err := r.getCollectionMembers(joinPath(systemPath, "/Processors"))
if err != nil {
continue
}
for _, doc := range procDocs {
if !strings.EqualFold(strings.TrimSpace(asString(doc["ProcessorType"])), "GPU") {
continue
}
gpuID := strings.TrimSpace(asString(doc["Id"]))
serial := findFirstNormalizedStringByKeys(doc, "SerialNumber")
if serial == "" {
serial = resolveProcessorGPUChassisSerial(chassisByID, gpuID, plan)
}
if plan.Directives.EnableMSIGhostGPUFilter {
chassisPath := resolveProcessorGPUChassisPath(chassisPathByID, gpuID, plan)
if chassisPath != "" && r.msiGhostGPUFilter(systemPaths, gpuID, chassisPath) {
continue
}
}
uuid := strings.TrimSpace(asString(doc["UUID"]))
uuidKey := strings.ToUpper(uuid)
serialKey := strings.ToUpper(serial)
if uuidKey != "" {
if _, dup := seenUUID[uuidKey]; dup {
continue
}
seenUUID[uuidKey] = struct{}{}
}
if serialKey != "" {
if _, dup := seenSerial[serialKey]; dup {
continue
}
seenSerial[serialKey] = struct{}{}
}
slotLabel := firstNonEmpty(
redfishLocationLabel(doc["Location"]),
redfishLocationLabel(doc["PhysicalLocation"]),
)
if slotLabel == "" && gpuID != "" {
slotLabel = gpuID
}
if slotLabel == "" {
slotLabel = fmt.Sprintf("GPU%d", idx)
}
out = append(out, models.GPU{
Slot: slotLabel,
Model: firstNonEmpty(asString(doc["Model"]), asString(doc["Name"])),
Manufacturer: asString(doc["Manufacturer"]),
PartNumber: asString(doc["PartNumber"]),
SerialNumber: serial,
UUID: uuid,
Status: mapStatus(doc["Status"]),
})
idx++
}
}
return out
}

View File

@@ -0,0 +1,599 @@
package collector
import (
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
func (r redfishSnapshotReader) enrichNICsFromNetworkInterfaces(nics *[]models.NetworkAdapter, systemPaths []string) {
if nics == nil {
return
}
bySlot := make(map[string]int, len(*nics))
for i, nic := range *nics {
bySlot[strings.ToLower(strings.TrimSpace(nic.Slot))] = i
}
for _, systemPath := range systemPaths {
ifaces, err := r.getCollectionMembers(joinPath(systemPath, "/NetworkInterfaces"))
if err != nil || len(ifaces) == 0 {
continue
}
for _, iface := range ifaces {
slot := firstNonEmpty(asString(iface["Id"]), asString(iface["Name"]))
if strings.TrimSpace(slot) == "" {
continue
}
idx, ok := bySlot[strings.ToLower(strings.TrimSpace(slot))]
if !ok {
// The NetworkInterface Id (e.g. "2") may not match the display slot of
// the real NIC that came from Chassis/NetworkAdapters (e.g. "RISER 5
// slot 1 (7)"). Try to find the real NIC via the Links.NetworkAdapter
// cross-reference before creating a ghost entry.
if linkedIdx := r.findNICIndexByLinkedNetworkAdapter(iface, *nics, bySlot); linkedIdx >= 0 {
idx = linkedIdx
ok = true
}
}
if !ok {
*nics = append(*nics, models.NetworkAdapter{
Slot: slot,
Present: true,
Model: firstNonEmpty(asString(iface["Model"]), asString(iface["Name"])),
Status: mapStatus(iface["Status"]),
})
idx = len(*nics) - 1
bySlot[strings.ToLower(strings.TrimSpace(slot))] = idx
}
portsPath := redfishLinkedPath(iface, "NetworkPorts")
if portsPath == "" {
continue
}
portDocs, err := r.getCollectionMembers(portsPath)
if err != nil || len(portDocs) == 0 {
continue
}
macs := append([]string{}, (*nics)[idx].MACAddresses...)
for _, p := range portDocs {
macs = append(macs, collectNetworkPortMACs(p)...)
}
(*nics)[idx].MACAddresses = dedupeStrings(macs)
if sanitizeNetworkPortCount((*nics)[idx].PortCount) == 0 {
(*nics)[idx].PortCount = len(portDocs)
}
}
}
}
func (r redfishSnapshotReader) collectNICs(chassisPaths []string) []models.NetworkAdapter {
var nics []models.NetworkAdapter
for _, chassisPath := range chassisPaths {
adapterDocs, err := r.getCollectionMembers(joinPath(chassisPath, "/NetworkAdapters"))
if err != nil {
continue
}
for _, doc := range adapterDocs {
nics = append(nics, r.buildNICFromAdapterDoc(doc))
}
}
return dedupeNetworkAdapters(nics)
}
func (r redfishSnapshotReader) buildNICFromAdapterDoc(adapterDoc map[string]interface{}) models.NetworkAdapter {
nic := parseNIC(adapterDoc)
adapterFunctionDocs := r.getNetworkAdapterFunctionDocs(adapterDoc)
for _, pciePath := range networkAdapterPCIeDevicePaths(adapterDoc) {
pcieDoc, err := r.getJSON(pciePath)
if err != nil {
continue
}
functionDocs := r.getLinkedPCIeFunctions(pcieDoc)
for _, adapterFnDoc := range adapterFunctionDocs {
functionDocs = append(functionDocs, r.getLinkedPCIeFunctions(adapterFnDoc)...)
}
functionDocs = dedupeJSONDocsByPath(functionDocs)
supplementalDocs := r.getLinkedSupplementalDocs(pcieDoc, "EnvironmentMetrics", "Metrics")
for _, fn := range functionDocs {
supplementalDocs = append(supplementalDocs, r.getLinkedSupplementalDocs(fn, "EnvironmentMetrics", "Metrics")...)
}
enrichNICFromPCIe(&nic, pcieDoc, functionDocs, supplementalDocs)
}
if len(nic.MACAddresses) == 0 {
r.enrichNICMACsFromNetworkDeviceFunctions(&nic, adapterDoc)
}
return nic
}
func (r redfishSnapshotReader) getNetworkAdapterFunctionDocs(adapterDoc map[string]interface{}) []map[string]interface{} {
ndfCol, ok := adapterDoc["NetworkDeviceFunctions"].(map[string]interface{})
if !ok {
return nil
}
colPath := asString(ndfCol["@odata.id"])
if colPath == "" {
return nil
}
funcDocs, err := r.getCollectionMembers(colPath)
if err != nil {
return nil
}
return funcDocs
}
func (r redfishSnapshotReader) collectPCIeDevices(systemPaths, chassisPaths []string) []models.PCIeDevice {
collections := make([]string, 0, len(systemPaths)+len(chassisPaths))
for _, systemPath := range systemPaths {
collections = append(collections, joinPath(systemPath, "/PCIeDevices"))
}
for _, chassisPath := range chassisPaths {
collections = append(collections, joinPath(chassisPath, "/PCIeDevices"))
}
var out []models.PCIeDevice
for _, collectionPath := range collections {
memberDocs, err := r.getCollectionMembers(collectionPath)
if err != nil || len(memberDocs) == 0 {
continue
}
for _, doc := range memberDocs {
functionDocs := r.getLinkedPCIeFunctions(doc)
if looksLikeGPU(doc, functionDocs) {
continue
}
if replayPCIeDeviceBackedByCanonicalNIC(doc, functionDocs) {
continue
}
supplementalDocs := r.getLinkedSupplementalDocs(doc, "EnvironmentMetrics", "Metrics")
supplementalDocs = append(supplementalDocs, r.getChassisScopedPCIeSupplementalDocs(doc)...)
for _, fn := range functionDocs {
supplementalDocs = append(supplementalDocs, r.getLinkedSupplementalDocs(fn, "EnvironmentMetrics", "Metrics")...)
}
dev := parsePCIeDeviceWithSupplementalDocs(doc, functionDocs, supplementalDocs)
if shouldSkipReplayPCIeDevice(doc, dev) {
continue
}
out = append(out, dev)
}
}
for _, systemPath := range systemPaths {
functionDocs, err := r.getCollectionMembers(joinPath(systemPath, "/PCIeFunctions"))
if err != nil || len(functionDocs) == 0 {
continue
}
for idx, fn := range functionDocs {
supplementalDocs := r.getLinkedSupplementalDocs(fn, "EnvironmentMetrics", "Metrics")
dev := parsePCIeFunctionWithSupplementalDocs(fn, supplementalDocs, idx+1)
if shouldSkipReplayPCIeDevice(fn, dev) {
continue
}
out = append(out, dev)
}
}
return dedupePCIeDevices(out)
}
func shouldSkipReplayPCIeDevice(doc map[string]interface{}, dev models.PCIeDevice) bool {
if isUnidentifiablePCIeDevice(dev) {
return true
}
if replayNetworkFunctionBackedByCanonicalNIC(doc, dev) {
return true
}
if isReplayStorageServiceEndpoint(doc, dev) {
return true
}
if isReplayNoisePCIeClass(dev.DeviceClass) {
return true
}
if isReplayDisplayDeviceDuplicate(doc, dev) {
return true
}
return false
}
func replayPCIeDeviceBackedByCanonicalNIC(doc map[string]interface{}, functionDocs []map[string]interface{}) bool {
if !looksLikeReplayNetworkPCIeDevice(doc, functionDocs) {
return false
}
for _, fn := range functionDocs {
if hasRedfishLinkedMember(fn, "NetworkDeviceFunctions") {
return true
}
}
return false
}
func replayNetworkFunctionBackedByCanonicalNIC(doc map[string]interface{}, dev models.PCIeDevice) bool {
if !looksLikeReplayNetworkClass(dev.DeviceClass) {
return false
}
return hasRedfishLinkedMember(doc, "NetworkDeviceFunctions")
}
func looksLikeReplayNetworkPCIeDevice(doc map[string]interface{}, functionDocs []map[string]interface{}) bool {
for _, fn := range functionDocs {
if looksLikeReplayNetworkClass(asString(fn["DeviceClass"])) {
return true
}
}
joined := strings.ToLower(strings.TrimSpace(strings.Join([]string{
asString(doc["DeviceType"]),
asString(doc["Description"]),
asString(doc["Name"]),
asString(doc["Model"]),
}, " ")))
return strings.Contains(joined, "network")
}
func looksLikeReplayNetworkClass(class string) bool {
class = strings.ToLower(strings.TrimSpace(class))
return strings.Contains(class, "network") || strings.Contains(class, "ethernet")
}
func isReplayStorageServiceEndpoint(doc map[string]interface{}, dev models.PCIeDevice) bool {
class := strings.ToLower(strings.TrimSpace(dev.DeviceClass))
if class != "massstoragecontroller" && class != "mass storage controller" {
return false
}
name := strings.ToLower(strings.TrimSpace(firstNonEmpty(
dev.PartNumber,
asString(doc["PartNumber"]),
asString(doc["Description"]),
)))
if strings.Contains(name, "pcie switch management endpoint") {
return true
}
if strings.Contains(name, "volume management device nvme raid controller") {
return true
}
return false
}
func hasRedfishLinkedMember(doc map[string]interface{}, key string) bool {
links, ok := doc["Links"].(map[string]interface{})
if !ok {
return false
}
if asInt(links[key+"@odata.count"]) > 0 {
return true
}
linked, ok := links[key]
if !ok {
return false
}
switch v := linked.(type) {
case []interface{}:
return len(v) > 0
case map[string]interface{}:
if asString(v["@odata.id"]) != "" {
return true
}
return len(v) > 0
default:
return false
}
}
func isReplayNoisePCIeClass(class string) bool {
switch strings.ToLower(strings.TrimSpace(class)) {
case "bridge", "processor", "signalprocessingcontroller", "signal processing controller", "serialbuscontroller", "serial bus controller":
return true
default:
return false
}
}
func isReplayDisplayDeviceDuplicate(doc map[string]interface{}, dev models.PCIeDevice) bool {
class := strings.ToLower(strings.TrimSpace(dev.DeviceClass))
if class != "displaycontroller" && class != "display controller" {
return false
}
return strings.EqualFold(strings.TrimSpace(asString(doc["Description"])), "Display Device")
}
func (r redfishSnapshotReader) getChassisScopedPCIeSupplementalDocs(doc map[string]interface{}) []map[string]interface{} {
docPath := normalizeRedfishPath(asString(doc["@odata.id"]))
chassisPath := chassisPathForPCIeDoc(docPath)
if chassisPath == "" {
return nil
}
out := make([]map[string]interface{}, 0, 6)
if looksLikeNVSwitchPCIeDoc(doc) {
for _, path := range []string{
joinPath(chassisPath, "/EnvironmentMetrics"),
joinPath(chassisPath, "/ThermalSubsystem/ThermalMetrics"),
} {
supplementalDoc, err := r.getJSON(path)
if err != nil || len(supplementalDoc) == 0 {
continue
}
out = append(out, supplementalDoc)
}
}
deviceDocs, err := r.getCollectionMembers(joinPath(chassisPath, "/Devices"))
if err == nil {
for _, deviceDoc := range deviceDocs {
if !redfishPCIeMatchesChassisDeviceDoc(doc, deviceDoc) {
continue
}
out = append(out, deviceDoc)
}
}
return out
}
// collectBMCMAC returns the MAC address of the best BMC management interface
// found in Managers/*/EthernetInterfaces. Prefer an active link with an IP
// address over a passive sideband interface.
func (r redfishSnapshotReader) collectBMCMAC(managerPaths []string) string {
summary := r.collectBMCManagementSummary(managerPaths)
if len(summary) == 0 {
return ""
}
return strings.ToUpper(strings.TrimSpace(asString(summary["mac_address"])))
}
func (r redfishSnapshotReader) collectBMCManagementSummary(managerPaths []string) map[string]any {
bestScore := -1
var best map[string]any
for _, managerPath := range managerPaths {
collectionPath := joinPath(managerPath, "/EthernetInterfaces")
collectionDoc, _ := r.getJSON(collectionPath)
ncsiEnabled, lldpMode, lldpByEth := redfishManagerEthernetCollectionHints(collectionDoc)
members, err := r.getCollectionMembers(collectionPath)
if err != nil || len(members) == 0 {
continue
}
for _, doc := range members {
mac := strings.TrimSpace(firstNonEmpty(
asString(doc["PermanentMACAddress"]),
asString(doc["MACAddress"]),
))
if mac == "" || strings.EqualFold(mac, "00:00:00:00:00:00") {
continue
}
ifaceID := strings.TrimSpace(firstNonEmpty(asString(doc["Id"]), asString(doc["Name"])))
summary := map[string]any{
"manager_path": managerPath,
"interface_id": ifaceID,
"hostname": strings.TrimSpace(asString(doc["HostName"])),
"fqdn": strings.TrimSpace(asString(doc["FQDN"])),
"mac_address": strings.ToUpper(mac),
"link_status": strings.TrimSpace(asString(doc["LinkStatus"])),
"speed_mbps": asInt(doc["SpeedMbps"]),
"interface_name": strings.TrimSpace(asString(doc["Name"])),
"interface_desc": strings.TrimSpace(asString(doc["Description"])),
"ncsi_enabled": ncsiEnabled,
"lldp_mode": lldpMode,
"ipv4_address": redfishManagerIPv4Field(doc, "Address"),
"ipv4_gateway": redfishManagerIPv4Field(doc, "Gateway"),
"ipv4_subnet": redfishManagerIPv4Field(doc, "SubnetMask"),
"ipv6_address": redfishManagerIPv6Field(doc, "Address"),
"link_is_active": strings.EqualFold(strings.TrimSpace(asString(doc["LinkStatus"])), "LinkActive"),
"interface_score": 0,
}
if lldp, ok := lldpByEth[strings.ToLower(ifaceID)]; ok {
summary["lldp_chassis_name"] = lldp["ChassisName"]
summary["lldp_port_desc"] = lldp["PortDesc"]
summary["lldp_port_id"] = lldp["PortId"]
if vlan := asInt(lldp["VlanId"]); vlan > 0 {
summary["lldp_vlan_id"] = vlan
}
}
score := redfishManagerInterfaceScore(summary)
summary["interface_score"] = score
if score > bestScore {
bestScore = score
best = summary
}
}
}
return best
}
func redfishManagerEthernetCollectionHints(collectionDoc map[string]interface{}) (bool, string, map[string]map[string]interface{}) {
lldpByEth := make(map[string]map[string]interface{})
if len(collectionDoc) == 0 {
return false, "", lldpByEth
}
oem, _ := collectionDoc["Oem"].(map[string]interface{})
public, _ := oem["Public"].(map[string]interface{})
ncsiEnabled := asBool(public["NcsiEnabled"])
lldp, _ := public["LLDP"].(map[string]interface{})
lldpMode := strings.TrimSpace(asString(lldp["LLDPMode"]))
if members, ok := lldp["Members"].([]interface{}); ok {
for _, item := range members {
member, ok := item.(map[string]interface{})
if !ok {
continue
}
ethIndex := strings.ToLower(strings.TrimSpace(asString(member["EthIndex"])))
if ethIndex == "" {
continue
}
lldpByEth[ethIndex] = member
}
}
return ncsiEnabled, lldpMode, lldpByEth
}
func redfishManagerIPv4Field(doc map[string]interface{}, key string) string {
if len(doc) == 0 {
return ""
}
for _, field := range []string{"IPv4Addresses", "IPv4StaticAddresses"} {
list, ok := doc[field].([]interface{})
if !ok {
continue
}
for _, item := range list {
entry, ok := item.(map[string]interface{})
if !ok {
continue
}
value := strings.TrimSpace(asString(entry[key]))
if value != "" {
return value
}
}
}
return ""
}
func redfishManagerIPv6Field(doc map[string]interface{}, key string) string {
if len(doc) == 0 {
return ""
}
list, ok := doc["IPv6Addresses"].([]interface{})
if !ok {
return ""
}
for _, item := range list {
entry, ok := item.(map[string]interface{})
if !ok {
continue
}
value := strings.TrimSpace(asString(entry[key]))
if value != "" {
return value
}
}
return ""
}
func redfishManagerInterfaceScore(summary map[string]any) int {
score := 0
if strings.EqualFold(strings.TrimSpace(asString(summary["link_status"])), "LinkActive") {
score += 100
}
if strings.TrimSpace(asString(summary["ipv4_address"])) != "" {
score += 40
}
if strings.TrimSpace(asString(summary["ipv6_address"])) != "" {
score += 10
}
if strings.TrimSpace(asString(summary["mac_address"])) != "" {
score += 10
}
if asInt(summary["speed_mbps"]) > 0 {
score += 5
}
if ifaceID := strings.ToLower(strings.TrimSpace(asString(summary["interface_id"]))); ifaceID != "" && !strings.HasPrefix(ifaceID, "usb") {
score += 3
}
if asBool(summary["ncsi_enabled"]) {
score += 1
}
return score
}
// findNICIndexByLinkedNetworkAdapter resolves a NetworkInterface document to an
// existing NIC in bySlot by following Links.NetworkAdapter → the Chassis
// NetworkAdapter doc and reconstructing the canonical NIC identity. Returns -1
// if no match is found.
func (r redfishSnapshotReader) findNICIndexByLinkedNetworkAdapter(iface map[string]interface{}, existing []models.NetworkAdapter, bySlot map[string]int) int {
links, ok := iface["Links"].(map[string]interface{})
if !ok {
return -1
}
adapterRef, ok := links["NetworkAdapter"].(map[string]interface{})
if !ok {
return -1
}
adapterPath := normalizeRedfishPath(asString(adapterRef["@odata.id"]))
if adapterPath == "" {
return -1
}
adapterDoc, err := r.getJSON(adapterPath)
if err != nil || len(adapterDoc) == 0 {
return -1
}
adapterNIC := r.buildNICFromAdapterDoc(adapterDoc)
if serial := normalizeRedfishIdentityField(adapterNIC.SerialNumber); serial != "" {
for idx, nic := range existing {
if strings.EqualFold(normalizeRedfishIdentityField(nic.SerialNumber), serial) {
return idx
}
}
}
if bdf := strings.TrimSpace(adapterNIC.BDF); bdf != "" {
for idx, nic := range existing {
if strings.EqualFold(strings.TrimSpace(nic.BDF), bdf) {
return idx
}
}
}
if slot := strings.ToLower(strings.TrimSpace(adapterNIC.Slot)); slot != "" {
if idx, ok := bySlot[slot]; ok {
return idx
}
}
for idx, nic := range existing {
if networkAdaptersShareMACs(nic, adapterNIC) {
return idx
}
}
return -1
}
func networkAdaptersShareMACs(a, b models.NetworkAdapter) bool {
if len(a.MACAddresses) == 0 || len(b.MACAddresses) == 0 {
return false
}
seen := make(map[string]struct{}, len(a.MACAddresses))
for _, mac := range a.MACAddresses {
normalized := strings.ToUpper(strings.TrimSpace(mac))
if normalized == "" {
continue
}
seen[normalized] = struct{}{}
}
for _, mac := range b.MACAddresses {
normalized := strings.ToUpper(strings.TrimSpace(mac))
if normalized == "" {
continue
}
if _, ok := seen[normalized]; ok {
return true
}
}
return false
}
// enrichNICMACsFromNetworkDeviceFunctions reads the NetworkDeviceFunctions
// collection linked from a NetworkAdapter document and populates the NIC's
// MACAddresses from each function's Ethernet.PermanentMACAddress / MACAddress.
// Called when PCIe-path enrichment does not produce any MACs.
func (r redfishSnapshotReader) enrichNICMACsFromNetworkDeviceFunctions(nic *models.NetworkAdapter, adapterDoc map[string]interface{}) {
ndfCol, ok := adapterDoc["NetworkDeviceFunctions"].(map[string]interface{})
if !ok {
return
}
colPath := asString(ndfCol["@odata.id"])
if colPath == "" {
return
}
funcDocs, err := r.getCollectionMembers(colPath)
if err != nil || len(funcDocs) == 0 {
return
}
for _, fn := range funcDocs {
eth, _ := fn["Ethernet"].(map[string]interface{})
if eth == nil {
continue
}
mac := strings.TrimSpace(firstNonEmpty(
asString(eth["PermanentMACAddress"]),
asString(eth["MACAddress"]),
))
if mac == "" {
continue
}
nic.MACAddresses = dedupeStrings(append(nic.MACAddresses, strings.ToUpper(mac)))
}
if len(funcDocs) > 0 && nic.PortCount == 0 {
nic.PortCount = sanitizeNetworkPortCount(len(funcDocs))
}
}

View File

@@ -0,0 +1,100 @@
package collector
import (
"strings"
"git.mchus.pro/mchus/logpile/internal/collector/redfishprofile"
)
func (r redfishSnapshotReader) collectKnownStorageMembers(systemPath string, relativeCollections []string) []map[string]interface{} {
var out []map[string]interface{}
for _, rel := range relativeCollections {
docs, err := r.getCollectionMembers(joinPath(systemPath, rel))
if err != nil || len(docs) == 0 {
continue
}
out = append(out, docs...)
}
return out
}
func (r redfishSnapshotReader) probeSupermicroNVMeDiskBays(backplanePath string) []map[string]interface{} {
return r.probeDirectDiskBayChildren(joinPath(backplanePath, "/Drives"))
}
func (r redfishSnapshotReader) probeDirectDiskBayChildren(drivesCollectionPath string) []map[string]interface{} {
var out []map[string]interface{}
for _, path := range directDiskBayCandidates(drivesCollectionPath) {
doc, err := r.getJSON(path)
if err != nil || !looksLikeDrive(doc) {
continue
}
out = append(out, doc)
}
return out
}
func resolveProcessorGPUChassisSerial(chassisByID map[string]map[string]interface{}, gpuID string, plan redfishprofile.ResolvedAnalysisPlan) string {
for _, candidateID := range processorGPUChassisCandidateIDs(gpuID, plan) {
if chassisDoc, ok := chassisByID[strings.ToUpper(candidateID)]; ok {
if serial := strings.TrimSpace(asString(chassisDoc["SerialNumber"])); serial != "" {
return serial
}
}
}
return ""
}
func resolveProcessorGPUChassisPath(chassisPathByID map[string]string, gpuID string, plan redfishprofile.ResolvedAnalysisPlan) string {
for _, candidateID := range processorGPUChassisCandidateIDs(gpuID, plan) {
if p, ok := chassisPathByID[strings.ToUpper(candidateID)]; ok {
return p
}
}
return ""
}
func processorGPUChassisCandidateIDs(gpuID string, plan redfishprofile.ResolvedAnalysisPlan) []string {
gpuID = strings.TrimSpace(gpuID)
if gpuID == "" {
return nil
}
candidates := []string{gpuID}
for _, mode := range plan.ProcessorGPUChassisLookupModes {
switch strings.ToLower(strings.TrimSpace(mode)) {
case "msi-index":
candidates = append(candidates, msiProcessorGPUChassisCandidateIDs(gpuID)...)
case "hgx-alias":
if strings.HasPrefix(strings.ToUpper(gpuID), "GPU_") {
candidates = append(candidates, "HGX_"+gpuID)
}
}
}
return dedupeStrings(candidates)
}
func msiProcessorGPUChassisCandidateIDs(gpuID string) []string {
gpuID = strings.TrimSpace(strings.ToUpper(gpuID))
if gpuID == "" {
return nil
}
var out []string
switch {
case strings.HasPrefix(gpuID, "GPU_SXM_"):
index := strings.TrimPrefix(gpuID, "GPU_SXM_")
if index != "" {
out = append(out, "GPU"+index, "GPU_"+index)
}
case strings.HasPrefix(gpuID, "GPU_"):
index := strings.TrimPrefix(gpuID, "GPU_")
if index != "" {
out = append(out, "GPU"+index, "GPU_SXM_"+index)
}
case strings.HasPrefix(gpuID, "GPU"):
index := strings.TrimPrefix(gpuID, "GPU")
if index != "" {
out = append(out, "GPU_"+index, "GPU_SXM_"+index)
}
}
return out
}

View File

@@ -0,0 +1,167 @@
package collector
import (
"git.mchus.pro/mchus/logpile/internal/collector/redfishprofile"
"git.mchus.pro/mchus/logpile/internal/models"
)
func (r redfishSnapshotReader) collectStorage(systemPath string, plan redfishprofile.ResolvedAnalysisPlan) []models.Storage {
var out []models.Storage
storageMembers, _ := r.getCollectionMembers(joinPath(systemPath, "/Storage"))
for _, member := range storageMembers {
if driveCollection, ok := member["Drives"].(map[string]interface{}); ok {
if driveCollectionPath := asString(driveCollection["@odata.id"]); driveCollectionPath != "" {
driveDocs, err := r.getCollectionMembers(driveCollectionPath)
if err == nil {
for _, driveDoc := range driveDocs {
if !isAbsentDriveDoc(driveDoc) && !isVirtualStorageDrive(driveDoc) {
supplementalDocs := r.getLinkedSupplementalDocs(driveDoc, "DriveMetrics", "EnvironmentMetrics", "Metrics")
out = append(out, parseDriveWithSupplementalDocs(driveDoc, supplementalDocs...))
}
}
if len(driveDocs) == 0 {
for _, driveDoc := range r.probeDirectDiskBayChildren(driveCollectionPath) {
if isAbsentDriveDoc(driveDoc) {
continue
}
supplementalDocs := r.getLinkedSupplementalDocs(driveDoc, "DriveMetrics", "EnvironmentMetrics", "Metrics")
out = append(out, parseDriveWithSupplementalDocs(driveDoc, supplementalDocs...))
}
}
}
continue
}
}
if drives, ok := member["Drives"].([]interface{}); ok {
for _, driveAny := range drives {
driveRef, ok := driveAny.(map[string]interface{})
if !ok {
continue
}
odata := asString(driveRef["@odata.id"])
if odata == "" {
continue
}
driveDoc, err := r.getJSON(odata)
if err != nil {
continue
}
if !isAbsentDriveDoc(driveDoc) && !isVirtualStorageDrive(driveDoc) {
supplementalDocs := r.getLinkedSupplementalDocs(driveDoc, "DriveMetrics", "EnvironmentMetrics", "Metrics")
out = append(out, parseDriveWithSupplementalDocs(driveDoc, supplementalDocs...))
}
}
continue
}
if looksLikeDrive(member) {
if isAbsentDriveDoc(member) || isVirtualStorageDrive(member) {
continue
}
supplementalDocs := r.getLinkedSupplementalDocs(member, "DriveMetrics", "EnvironmentMetrics", "Metrics")
out = append(out, parseDriveWithSupplementalDocs(member, supplementalDocs...))
}
if plan.Directives.EnableStorageEnclosureRecovery {
for _, enclosurePath := range redfishLinkRefs(member, "Links", "Enclosures") {
driveDocs, err := r.getCollectionMembers(joinPath(enclosurePath, "/Drives"))
if err == nil {
for _, driveDoc := range driveDocs {
if looksLikeDrive(driveDoc) && !isAbsentDriveDoc(driveDoc) && !isVirtualStorageDrive(driveDoc) {
supplementalDocs := r.getLinkedSupplementalDocs(driveDoc, "DriveMetrics", "EnvironmentMetrics", "Metrics")
out = append(out, parseDriveWithSupplementalDocs(driveDoc, supplementalDocs...))
}
}
if len(driveDocs) == 0 {
for _, driveDoc := range r.probeDirectDiskBayChildren(joinPath(enclosurePath, "/Drives")) {
if isAbsentDriveDoc(driveDoc) || isVirtualStorageDrive(driveDoc) {
continue
}
out = append(out, parseDrive(driveDoc))
}
}
}
}
}
}
if len(plan.KnownStorageDriveCollections) > 0 {
for _, driveDoc := range r.collectKnownStorageMembers(systemPath, plan.KnownStorageDriveCollections) {
if looksLikeDrive(driveDoc) && !isAbsentDriveDoc(driveDoc) && !isVirtualStorageDrive(driveDoc) {
supplementalDocs := r.getLinkedSupplementalDocs(driveDoc, "DriveMetrics", "EnvironmentMetrics", "Metrics")
out = append(out, parseDriveWithSupplementalDocs(driveDoc, supplementalDocs...))
}
}
}
simpleStorageMembers, _ := r.getCollectionMembers(joinPath(systemPath, "/SimpleStorage"))
for _, member := range simpleStorageMembers {
devices, ok := member["Devices"].([]interface{})
if !ok {
continue
}
for _, devAny := range devices {
devDoc, ok := devAny.(map[string]interface{})
if !ok || !looksLikeDrive(devDoc) || isAbsentDriveDoc(devDoc) || isVirtualStorageDrive(devDoc) {
continue
}
out = append(out, parseDrive(devDoc))
}
}
chassisPaths := r.discoverMemberPaths("/redfish/v1/Chassis", "/redfish/v1/Chassis/1")
for _, chassisPath := range chassisPaths {
driveDocs, err := r.getCollectionMembers(joinPath(chassisPath, "/Drives"))
if err != nil {
continue
}
for _, driveDoc := range driveDocs {
if !looksLikeDrive(driveDoc) || isAbsentDriveDoc(driveDoc) || isVirtualStorageDrive(driveDoc) {
continue
}
out = append(out, parseDrive(driveDoc))
}
}
if plan.Directives.EnableSupermicroNVMeBackplane {
for _, chassisPath := range chassisPaths {
if !isSupermicroNVMeBackplanePath(chassisPath) {
continue
}
for _, driveDoc := range r.probeSupermicroNVMeDiskBays(chassisPath) {
if !looksLikeDrive(driveDoc) || isAbsentDriveDoc(driveDoc) || isVirtualStorageDrive(driveDoc) {
continue
}
out = append(out, parseDrive(driveDoc))
}
}
}
return dedupeStorage(out)
}
func (r redfishSnapshotReader) collectStorageVolumes(systemPath string, plan redfishprofile.ResolvedAnalysisPlan) []models.StorageVolume {
var out []models.StorageVolume
storageMembers, _ := r.getCollectionMembers(joinPath(systemPath, "/Storage"))
for _, member := range storageMembers {
controller := firstNonEmpty(asString(member["Id"]), asString(member["Name"]))
volumeCollectionPath := redfishLinkedPath(member, "Volumes")
if volumeCollectionPath == "" {
continue
}
volumeDocs, err := r.getCollectionMembers(volumeCollectionPath)
if err != nil {
continue
}
for _, volDoc := range volumeDocs {
if looksLikeVolume(volDoc) {
out = append(out, parseStorageVolume(volDoc, controller))
}
}
}
if len(plan.KnownStorageVolumeCollections) > 0 {
for _, volDoc := range r.collectKnownStorageMembers(systemPath, plan.KnownStorageVolumeCollections) {
if looksLikeVolume(volDoc) {
out = append(out, parseStorageVolume(volDoc, storageControllerFromPath(asString(volDoc["@odata.id"]))))
}
}
}
return dedupeStorageVolumes(out)
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,162 @@
package redfishprofile
import "strings"
func ResolveAcquisitionPlan(match MatchResult, plan AcquisitionPlan, discovered DiscoveredResources, signals MatchSignals) ResolvedAcquisitionPlan {
seedGroups := [][]string{
baselineSeedPaths(discovered),
expandScopedSuffixes(discovered.SystemPaths, plan.ScopedPaths.SystemSeedSuffixes),
expandScopedSuffixes(discovered.ChassisPaths, plan.ScopedPaths.ChassisSeedSuffixes),
expandScopedSuffixes(discovered.ManagerPaths, plan.ScopedPaths.ManagerSeedSuffixes),
plan.SeedPaths,
}
if plan.Mode == ModeFallback {
seedGroups = append(seedGroups, plan.PlanBPaths)
}
criticalGroups := [][]string{
baselineCriticalPaths(discovered),
expandScopedSuffixes(discovered.SystemPaths, plan.ScopedPaths.SystemCriticalSuffixes),
expandScopedSuffixes(discovered.ChassisPaths, plan.ScopedPaths.ChassisCriticalSuffixes),
expandScopedSuffixes(discovered.ManagerPaths, plan.ScopedPaths.ManagerCriticalSuffixes),
plan.CriticalPaths,
}
resolved := ResolvedAcquisitionPlan{
Plan: plan,
SeedPaths: mergeResolvedPaths(seedGroups...),
CriticalPaths: mergeResolvedPaths(criticalGroups...),
}
for _, profile := range match.Profiles {
profile.RefineAcquisitionPlan(&resolved, discovered, signals)
}
resolved.SeedPaths = mergeResolvedPaths(resolved.SeedPaths)
resolved.CriticalPaths = mergeResolvedPaths(resolved.CriticalPaths, resolved.Plan.CriticalPaths)
resolved.Plan.SeedPaths = mergeResolvedPaths(resolved.Plan.SeedPaths)
resolved.Plan.CriticalPaths = mergeResolvedPaths(resolved.Plan.CriticalPaths)
resolved.Plan.PlanBPaths = mergeResolvedPaths(resolved.Plan.PlanBPaths)
return resolved
}
func baselineSeedPaths(discovered DiscoveredResources) []string {
var out []string
add := func(p string) {
if p = normalizePath(p); p != "" {
out = append(out, p)
}
}
add("/redfish/v1/UpdateService")
add("/redfish/v1/UpdateService/FirmwareInventory")
for _, p := range discovered.SystemPaths {
add(p)
add(joinPath(p, "/Bios"))
add(joinPath(p, "/Oem/Public"))
add(joinPath(p, "/Oem/Public/FRU"))
add(joinPath(p, "/Processors"))
add(joinPath(p, "/Memory"))
add(joinPath(p, "/EthernetInterfaces"))
add(joinPath(p, "/NetworkInterfaces"))
add(joinPath(p, "/PCIeDevices"))
add(joinPath(p, "/PCIeFunctions"))
add(joinPath(p, "/Accelerators"))
add(joinPath(p, "/GraphicsControllers"))
add(joinPath(p, "/Storage"))
}
for _, p := range discovered.ChassisPaths {
add(p)
add(joinPath(p, "/Oem/Public"))
add(joinPath(p, "/Oem/Public/FRU"))
add(joinPath(p, "/PCIeDevices"))
add(joinPath(p, "/PCIeSlots"))
add(joinPath(p, "/NetworkAdapters"))
add(joinPath(p, "/Drives"))
add(joinPath(p, "/Power"))
}
for _, p := range discovered.ManagerPaths {
add(p)
add(joinPath(p, "/EthernetInterfaces"))
add(joinPath(p, "/NetworkProtocol"))
}
return mergeResolvedPaths(out)
}
func baselineCriticalPaths(discovered DiscoveredResources) []string {
var out []string
for _, group := range [][]string{
{"/redfish/v1"},
discovered.SystemPaths,
discovered.ChassisPaths,
discovered.ManagerPaths,
} {
out = append(out, group...)
}
return mergeResolvedPaths(out)
}
func expandScopedSuffixes(basePaths, suffixes []string) []string {
if len(basePaths) == 0 || len(suffixes) == 0 {
return nil
}
out := make([]string, 0, len(basePaths)*len(suffixes))
for _, basePath := range basePaths {
basePath = normalizePath(basePath)
if basePath == "" {
continue
}
for _, suffix := range suffixes {
suffix = strings.TrimSpace(suffix)
if suffix == "" {
continue
}
out = append(out, joinPath(basePath, suffix))
}
}
return mergeResolvedPaths(out)
}
func mergeResolvedPaths(groups ...[]string) []string {
seen := make(map[string]struct{})
out := make([]string, 0)
for _, group := range groups {
for _, path := range group {
path = normalizePath(path)
if path == "" {
continue
}
if _, ok := seen[path]; ok {
continue
}
seen[path] = struct{}{}
out = append(out, path)
}
}
return out
}
func normalizePath(path string) string {
path = strings.TrimSpace(path)
if path == "" {
return ""
}
if !strings.HasPrefix(path, "/") {
path = "/" + path
}
return strings.TrimRight(path, "/")
}
func joinPath(base, rel string) string {
base = normalizePath(base)
rel = strings.TrimSpace(rel)
if base == "" {
return normalizePath(rel)
}
if rel == "" {
return base
}
if !strings.HasPrefix(rel, "/") {
rel = "/" + rel
}
return normalizePath(base + rel)
}

View File

@@ -0,0 +1,100 @@
package redfishprofile
import "strings"
func ResolveAnalysisPlan(match MatchResult, snapshot map[string]interface{}, discovered DiscoveredResources, signals MatchSignals) ResolvedAnalysisPlan {
plan := ResolvedAnalysisPlan{
Match: match,
Directives: AnalysisDirectives{},
}
if match.Mode == ModeFallback {
plan.Directives.EnableProcessorGPUFallback = true
plan.Directives.EnableSupermicroNVMeBackplane = true
plan.Directives.EnableProcessorGPUChassisAlias = true
plan.Directives.EnableGenericGraphicsControllerDedup = true
plan.Directives.EnableStorageEnclosureRecovery = true
plan.Directives.EnableKnownStorageControllerRecovery = true
addAnalysisLookupMode(&plan, "msi-index")
addAnalysisLookupMode(&plan, "hgx-alias")
addAnalysisStorageDriveCollections(&plan,
"/Storage/IntelVROC/Drives",
"/Storage/IntelVROC/Controllers/1/Drives",
)
addAnalysisStorageVolumeCollections(&plan,
"/Storage/IntelVROC/Volumes",
"/Storage/HA-RAID/Volumes",
"/Storage/MRVL.HA-RAID/Volumes",
)
addAnalysisNote(&plan, "fallback analysis enables broad recovery directives")
}
for _, profile := range match.Profiles {
profile.ApplyAnalysisDirectives(&plan.Directives, signals)
}
for _, profile := range match.Profiles {
profile.RefineAnalysisPlan(&plan, snapshot, discovered, signals)
}
return plan
}
func snapshotHasPathPrefix(snapshot map[string]interface{}, prefix string) bool {
prefix = normalizePath(prefix)
if prefix == "" {
return false
}
for path := range snapshot {
if strings.HasPrefix(normalizePath(path), prefix) {
return true
}
}
return false
}
func snapshotHasPathContaining(snapshot map[string]interface{}, sub string) bool {
sub = strings.ToLower(strings.TrimSpace(sub))
if sub == "" {
return false
}
for path := range snapshot {
if strings.Contains(strings.ToLower(path), sub) {
return true
}
}
return false
}
func snapshotHasGPUProcessor(snapshot map[string]interface{}, systemPaths []string) bool {
for _, systemPath := range systemPaths {
prefix := normalizePath(joinPath(systemPath, "/Processors")) + "/"
for path, docAny := range snapshot {
if !strings.HasPrefix(normalizePath(path), prefix) {
continue
}
doc, ok := docAny.(map[string]interface{})
if !ok {
continue
}
if strings.EqualFold(strings.TrimSpace(asString(doc["ProcessorType"])), "GPU") {
return true
}
}
}
return false
}
func snapshotHasStorageControllerHint(snapshot map[string]interface{}, needles ...string) bool {
for _, needle := range needles {
if snapshotHasPathContaining(snapshot, needle) {
return true
}
}
return false
}
func asString(v interface{}) string {
switch x := v.(type) {
case string:
return x
default:
return ""
}
}

View File

@@ -0,0 +1,450 @@
package redfishprofile
import (
"encoding/json"
"os"
"path/filepath"
"strings"
"testing"
)
func TestBuildAcquisitionPlan_Fixture_MSI_CG480(t *testing.T) {
signals := loadProfileFixtureSignals(t, "msi-cg480.json")
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, discoveredResourcesFromSignals(signals), signals)
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", match.Mode)
}
assertProfileSelected(t, match, "msi")
assertProfileSelected(t, match, "ami-family")
assertProfileNotSelected(t, match, "hgx-topology")
if plan.Tuning.PrefetchWorkers < 6 {
t.Fatalf("expected msi prefetch worker tuning, got %d", plan.Tuning.PrefetchWorkers)
}
if !containsString(resolved.SeedPaths, "/redfish/v1/Chassis/GPU1") {
t.Fatalf("expected MSI chassis GPU seed path")
}
if !containsString(resolved.CriticalPaths, "/redfish/v1/Chassis/GPU1/Sensors") {
t.Fatal("expected MSI GPU sensor critical path")
}
if !containsString(resolved.Plan.PlanBPaths, "/redfish/v1/Chassis/GPU1/Sensors") {
t.Fatal("expected MSI GPU sensor plan-b path")
}
if plan.Tuning.ETABaseline.SnapshotSeconds <= 0 {
t.Fatal("expected MSI snapshot eta baseline")
}
if !plan.Tuning.PostProbePolicy.EnableNumericCollectionProbe {
t.Fatal("expected MSI fixture to inherit generic numeric post-probe policy")
}
if !containsString(plan.ScopedPaths.SystemSeedSuffixes, "/SimpleStorage") {
t.Fatal("expected MSI fixture to inherit generic SimpleStorage scoped seed suffix")
}
if !containsString(plan.ScopedPaths.SystemCriticalSuffixes, "/Memory") {
t.Fatal("expected MSI fixture to inherit generic system critical suffixes")
}
if !containsString(plan.Tuning.PrefetchPolicy.IncludeSuffixes, "/Storage") {
t.Fatal("expected MSI fixture to inherit generic storage prefetch policy")
}
if !containsString(plan.CriticalPaths, "/redfish/v1/UpdateService") {
t.Fatal("expected MSI fixture to inherit generic top-level critical path")
}
if !plan.Tuning.RecoveryPolicy.EnableProfilePlanB {
t.Fatal("expected MSI fixture to enable profile plan-b")
}
}
func TestBuildAcquisitionPlan_Fixture_MSI_CG480_CopyMatchesSameProfiles(t *testing.T) {
originalSignals := loadProfileFixtureSignals(t, "msi-cg480.json")
copySignals := loadProfileFixtureSignals(t, "msi-cg480-copy.json")
originalMatch := MatchProfiles(originalSignals)
copyMatch := MatchProfiles(copySignals)
originalPlan := BuildAcquisitionPlan(originalSignals)
copyPlan := BuildAcquisitionPlan(copySignals)
originalResolved := ResolveAcquisitionPlan(originalMatch, originalPlan, discoveredResourcesFromSignals(originalSignals), originalSignals)
copyResolved := ResolveAcquisitionPlan(copyMatch, copyPlan, discoveredResourcesFromSignals(copySignals), copySignals)
assertSameProfileNames(t, originalMatch, copyMatch)
if originalPlan.Tuning.PrefetchWorkers != copyPlan.Tuning.PrefetchWorkers {
t.Fatalf("expected same MSI prefetch worker tuning, got %d vs %d", originalPlan.Tuning.PrefetchWorkers, copyPlan.Tuning.PrefetchWorkers)
}
if containsString(originalResolved.SeedPaths, "/redfish/v1/Chassis/GPU1") != containsString(copyResolved.SeedPaths, "/redfish/v1/Chassis/GPU1") {
t.Fatal("expected same MSI GPU chassis seed presence in both fixtures")
}
}
func TestBuildAcquisitionPlan_Fixture_MSI_CG290(t *testing.T) {
signals := loadProfileFixtureSignals(t, "msi-cg290.json")
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, discoveredResourcesFromSignals(signals), signals)
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", match.Mode)
}
assertProfileSelected(t, match, "msi")
assertProfileSelected(t, match, "ami-family")
assertProfileNotSelected(t, match, "hgx-topology")
if plan.Tuning.PrefetchWorkers < 6 {
t.Fatalf("expected MSI prefetch worker tuning, got %d", plan.Tuning.PrefetchWorkers)
}
if !containsString(resolved.SeedPaths, "/redfish/v1/Chassis/GPU1") {
t.Fatalf("expected MSI chassis GPU seed path")
}
}
func TestBuildAcquisitionPlan_Fixture_Supermicro_HGX(t *testing.T) {
signals := loadProfileFixtureSignals(t, "supermicro-hgx.json")
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
discovered := discoveredResourcesFromSignals(signals)
discovered.SystemPaths = dedupeSorted(append(discovered.SystemPaths, "/redfish/v1/Systems/HGX_Baseboard_0"))
resolved := ResolveAcquisitionPlan(match, plan, discovered, signals)
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", match.Mode)
}
assertProfileSelected(t, match, "supermicro")
assertProfileSelected(t, match, "hgx-topology")
assertProfileNotSelected(t, match, "msi")
if plan.Tuning.SnapshotMaxDocuments < 180000 {
t.Fatalf("expected widened HGX snapshot cap, got %d", plan.Tuning.SnapshotMaxDocuments)
}
if plan.Tuning.NVMePostProbeEnabled == nil || *plan.Tuning.NVMePostProbeEnabled {
t.Fatal("expected HGX fixture to disable NVMe post-probe")
}
if !containsString(resolved.SeedPaths, "/redfish/v1/Systems/HGX_Baseboard_0/Processors") {
t.Fatal("expected HGX baseboard processors seed path")
}
if !containsString(resolved.CriticalPaths, "/redfish/v1/Systems/HGX_Baseboard_0/Processors") {
t.Fatal("expected HGX baseboard processors critical path")
}
if !containsString(resolved.Plan.PlanBPaths, "/redfish/v1/Systems/HGX_Baseboard_0/Processors") {
t.Fatal("expected HGX baseboard processors plan-b path")
}
if plan.Tuning.ETABaseline.SnapshotSeconds < 300 {
t.Fatalf("expected HGX snapshot eta baseline, got %d", plan.Tuning.ETABaseline.SnapshotSeconds)
}
if !plan.Tuning.PostProbePolicy.EnableDirectNVMEDiskBayProbe {
t.Fatal("expected HGX fixture to retain Supermicro direct NVMe disk bay probe policy")
}
if !containsString(plan.ScopedPaths.SystemCriticalSuffixes, "/Storage/IntelVROC/Drives") {
t.Fatal("expected HGX fixture to inherit generic IntelVROC scoped critical suffix")
}
if !containsString(plan.ScopedPaths.ChassisCriticalSuffixes, "/Assembly") {
t.Fatal("expected HGX fixture to inherit generic chassis critical suffixes")
}
if !containsString(plan.Tuning.PrefetchPolicy.ExcludeContains, "/Assembly") {
t.Fatal("expected HGX fixture to inherit generic assembly prefetch exclusion")
}
if !plan.Tuning.RecoveryPolicy.EnableProfilePlanB {
t.Fatal("expected HGX fixture to enable profile plan-b")
}
}
func TestBuildAcquisitionPlan_Fixture_Supermicro_OAM_NoHGX(t *testing.T) {
signals := loadProfileFixtureSignals(t, "supermicro-oam-amd.json")
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, discoveredResourcesFromSignals(signals), signals)
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", match.Mode)
}
assertProfileSelected(t, match, "supermicro")
assertProfileNotSelected(t, match, "hgx-topology")
assertProfileNotSelected(t, match, "msi")
if containsString(resolved.SeedPaths, "/redfish/v1/Systems/HGX_Baseboard_0/Processors") {
t.Fatal("did not expect HGX baseboard processors seed path for OAM fixture")
}
if containsString(resolved.CriticalPaths, "/redfish/v1/Systems/HGX_Baseboard_0/Processors") {
t.Fatal("did not expect HGX baseboard processors critical path for OAM fixture")
}
if !containsString(resolved.CriticalPaths, "/redfish/v1/UpdateService/Oem/Supermicro/FirmwareInventory") {
t.Fatal("expected Supermicro firmware critical path")
}
if !containsString(resolved.Plan.PlanBPaths, "/redfish/v1/UpdateService/Oem/Supermicro/FirmwareInventory") {
t.Fatal("expected Supermicro firmware plan-b path")
}
if plan.Tuning.SnapshotMaxDocuments != 150000 {
t.Fatalf("expected generic supermicro snapshot cap, got %d", plan.Tuning.SnapshotMaxDocuments)
}
if plan.Tuning.NVMePostProbeEnabled != nil {
t.Fatal("did not expect HGX NVMe tuning for OAM fixture")
}
if plan.Tuning.ETABaseline.SnapshotSeconds < 180 {
t.Fatalf("expected Supermicro snapshot eta baseline, got %d", plan.Tuning.ETABaseline.SnapshotSeconds)
}
if !plan.Tuning.PostProbePolicy.EnableDirectNVMEDiskBayProbe {
t.Fatal("expected Supermicro OAM fixture to use direct NVMe disk bay probe policy")
}
if !plan.Tuning.PostProbePolicy.EnableNumericCollectionProbe {
t.Fatal("expected Supermicro OAM fixture to inherit generic numeric post-probe policy")
}
if !containsString(plan.ScopedPaths.SystemSeedSuffixes, "/Storage/IntelVROC") {
t.Fatal("expected Supermicro OAM fixture to inherit generic IntelVROC scoped seed suffix")
}
if !plan.Tuning.RecoveryPolicy.EnableProfilePlanB {
t.Fatal("expected Supermicro OAM fixture to enable profile plan-b")
}
}
func TestBuildAcquisitionPlan_Fixture_Dell_R750(t *testing.T) {
signals := loadProfileFixtureSignals(t, "dell-r750.json")
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/System.Embedded.1"},
ChassisPaths: []string{"/redfish/v1/Chassis/System.Embedded.1"},
ManagerPaths: []string{"/redfish/v1/Managers/1", "/redfish/v1/Managers/iDRAC.Embedded.1"},
}, signals)
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", match.Mode)
}
assertProfileSelected(t, match, "dell")
assertProfileNotSelected(t, match, "supermicro")
assertProfileNotSelected(t, match, "hgx-topology")
assertProfileNotSelected(t, match, "msi")
if !plan.Tuning.RecoveryPolicy.EnableProfilePlanB {
t.Fatal("expected dell fixture to enable profile plan-b")
}
if !containsString(resolved.SeedPaths, "/redfish/v1/Managers/iDRAC.Embedded.1") {
t.Fatal("expected Dell refinement to add iDRAC manager seed path")
}
if !containsString(resolved.CriticalPaths, "/redfish/v1/Managers/iDRAC.Embedded.1") {
t.Fatal("expected Dell refinement to add iDRAC manager critical path")
}
directives := ResolveAnalysisPlan(match, nil, DiscoveredResources{}, signals).Directives
if !directives.EnableGenericGraphicsControllerDedup {
t.Fatal("expected dell fixture to enable graphics controller dedup")
}
}
func TestBuildAcquisitionPlan_Fixture_AMI_Generic(t *testing.T) {
signals := loadProfileFixtureSignals(t, "ami-generic.json")
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", match.Mode)
}
assertProfileSelected(t, match, "ami-family")
assertProfileNotSelected(t, match, "msi")
assertProfileNotSelected(t, match, "supermicro")
assertProfileNotSelected(t, match, "dell")
assertProfileNotSelected(t, match, "hgx-topology")
if plan.Tuning.PrefetchEnabled == nil || !*plan.Tuning.PrefetchEnabled {
t.Fatal("expected ami-family fixture to force prefetch enabled")
}
if !containsString(plan.SeedPaths, "/redfish/v1/Oem/Ami") {
t.Fatal("expected ami-family fixture seed path /redfish/v1/Oem/Ami")
}
if !containsString(plan.SeedPaths, "/redfish/v1/Oem/Ami/InventoryData/Status") {
t.Fatal("expected ami-family fixture seed path /redfish/v1/Oem/Ami/InventoryData/Status")
}
if !containsString(plan.CriticalPaths, "/redfish/v1/UpdateService") {
t.Fatal("expected ami-family fixture to inherit generic critical path")
}
directives := ResolveAnalysisPlan(match, nil, DiscoveredResources{}, signals).Directives
if !directives.EnableGenericGraphicsControllerDedup {
t.Fatal("expected ami-family fixture to enable graphics controller dedup")
}
}
func TestBuildAcquisitionPlan_Fixture_UnknownVendor(t *testing.T) {
signals := loadProfileFixtureSignals(t, "unknown-vendor.json")
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/1"},
ChassisPaths: []string{"/redfish/v1/Chassis/1"},
ManagerPaths: []string{"/redfish/v1/Managers/1"},
}, signals)
if match.Mode != ModeFallback {
t.Fatalf("expected fallback mode for unknown vendor, got %q", match.Mode)
}
if len(match.Profiles) == 0 {
t.Fatal("expected fallback to aggregate profiles")
}
for _, profile := range match.Profiles {
if !profile.SafeForFallback() {
t.Fatalf("fallback mode included non-safe profile %q", profile.Name())
}
}
if plan.Tuning.SnapshotMaxDocuments < 180000 {
t.Fatalf("expected fallback to widen snapshot cap, got %d", plan.Tuning.SnapshotMaxDocuments)
}
if plan.Tuning.PrefetchEnabled == nil || !*plan.Tuning.PrefetchEnabled {
t.Fatal("expected fallback fixture to force prefetch enabled")
}
if !containsString(resolved.CriticalPaths, "/redfish/v1/Systems/1") {
t.Fatal("expected fallback resolved critical paths to include discovered system")
}
analysisPlan := ResolveAnalysisPlan(match, nil, DiscoveredResources{}, signals)
if !analysisPlan.Directives.EnableProcessorGPUFallback {
t.Fatal("expected fallback fixture to enable processor GPU fallback")
}
if !analysisPlan.Directives.EnableStorageEnclosureRecovery {
t.Fatal("expected fallback fixture to enable storage enclosure recovery")
}
if !analysisPlan.Directives.EnableGenericGraphicsControllerDedup {
t.Fatal("expected fallback fixture to enable graphics controller dedup")
}
}
func TestBuildAcquisitionPlan_Fixture_xFusion_G5500V7(t *testing.T) {
signals := loadProfileFixtureSignals(t, "xfusion-g5500v7.json")
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/1"},
ChassisPaths: []string{"/redfish/v1/Chassis/1"},
ManagerPaths: []string{"/redfish/v1/Managers/1"},
}, signals)
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode for xFusion, got %q", match.Mode)
}
assertProfileSelected(t, match, "xfusion")
assertProfileNotSelected(t, match, "supermicro")
assertProfileNotSelected(t, match, "hgx-topology")
assertProfileNotSelected(t, match, "msi")
assertProfileNotSelected(t, match, "dell")
if plan.Tuning.SnapshotMaxDocuments > 150000 {
t.Fatalf("expected xfusion snapshot cap <= 150000, got %d", plan.Tuning.SnapshotMaxDocuments)
}
if plan.Tuning.PrefetchEnabled == nil || !*plan.Tuning.PrefetchEnabled {
t.Fatal("expected xfusion fixture to enable prefetch")
}
if plan.Tuning.ETABaseline.SnapshotSeconds <= 0 {
t.Fatal("expected xfusion snapshot eta baseline")
}
if !containsString(resolved.CriticalPaths, "/redfish/v1/Systems/1") {
t.Fatal("expected system path in critical paths")
}
analysisPlan := ResolveAnalysisPlan(match, map[string]interface{}{
"/redfish/v1/Systems/1/Processors/Gpu1": map[string]interface{}{"ProcessorType": "GPU"},
}, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/1"},
}, signals)
if !analysisPlan.Directives.EnableProcessorGPUFallback {
t.Fatal("expected xfusion analysis to enable processor GPU fallback when GPU processors present")
}
if !analysisPlan.Directives.EnableGenericGraphicsControllerDedup {
t.Fatal("expected xfusion analysis to enable graphics controller dedup")
}
}
func loadProfileFixtureSignals(t *testing.T, fixtureName string) MatchSignals {
t.Helper()
path := filepath.Join("testdata", fixtureName)
data, err := os.ReadFile(path)
if err != nil {
t.Fatalf("read fixture %s: %v", path, err)
}
var signals MatchSignals
if err := json.Unmarshal(data, &signals); err != nil {
t.Fatalf("decode fixture %s: %v", path, err)
}
return normalizeSignals(signals)
}
func assertProfileSelected(t *testing.T, match MatchResult, want string) {
t.Helper()
for _, profile := range match.Profiles {
if profile.Name() == want {
return
}
}
t.Fatalf("expected profile %q in %v", want, profileNames(match))
}
func assertProfileNotSelected(t *testing.T, match MatchResult, want string) {
t.Helper()
for _, profile := range match.Profiles {
if profile.Name() == want {
t.Fatalf("did not expect profile %q in %v", want, profileNames(match))
}
}
}
func profileNames(match MatchResult) []string {
out := make([]string, 0, len(match.Profiles))
for _, profile := range match.Profiles {
out = append(out, profile.Name())
}
return out
}
func assertSameProfileNames(t *testing.T, left, right MatchResult) {
t.Helper()
leftNames := profileNames(left)
rightNames := profileNames(right)
if len(leftNames) != len(rightNames) {
t.Fatalf("profile stack size differs: %v vs %v", leftNames, rightNames)
}
for i := range leftNames {
if leftNames[i] != rightNames[i] {
t.Fatalf("profile stack differs: %v vs %v", leftNames, rightNames)
}
}
}
func containsString(items []string, want string) bool {
for _, item := range items {
if item == want {
return true
}
}
return false
}
func discoveredResourcesFromSignals(signals MatchSignals) DiscoveredResources {
var discovered DiscoveredResources
for _, hint := range signals.ResourceHints {
memberPath := discoveredMemberPath(hint)
switch {
case strings.HasPrefix(memberPath, "/redfish/v1/Systems/"):
discovered.SystemPaths = append(discovered.SystemPaths, memberPath)
case strings.HasPrefix(memberPath, "/redfish/v1/Chassis/"):
discovered.ChassisPaths = append(discovered.ChassisPaths, memberPath)
case strings.HasPrefix(memberPath, "/redfish/v1/Managers/"):
discovered.ManagerPaths = append(discovered.ManagerPaths, memberPath)
}
}
discovered.SystemPaths = dedupeSorted(discovered.SystemPaths)
discovered.ChassisPaths = dedupeSorted(discovered.ChassisPaths)
discovered.ManagerPaths = dedupeSorted(discovered.ManagerPaths)
return discovered
}
func discoveredMemberPath(path string) string {
path = strings.TrimSpace(path)
if path == "" {
return ""
}
parts := strings.Split(strings.Trim(path, "/"), "/")
if len(parts) < 4 || parts[0] != "redfish" || parts[1] != "v1" {
return ""
}
switch parts[2] {
case "Systems", "Chassis", "Managers":
return "/" + strings.Join(parts[:4], "/")
default:
return ""
}
}

View File

@@ -0,0 +1,122 @@
package redfishprofile
import (
"sort"
"git.mchus.pro/mchus/logpile/internal/models"
)
const (
ModeMatched = "matched"
ModeFallback = "fallback"
)
func MatchProfiles(signals MatchSignals) MatchResult {
type scored struct {
profile Profile
score int
}
builtins := BuiltinProfiles()
candidates := make([]scored, 0, len(builtins))
allScores := make([]ProfileScore, 0, len(builtins))
for _, profile := range builtins {
score := profile.Match(signals)
allScores = append(allScores, ProfileScore{
Name: profile.Name(),
Score: score,
Priority: profile.Priority(),
})
if score <= 0 {
continue
}
candidates = append(candidates, scored{profile: profile, score: score})
}
sort.Slice(allScores, func(i, j int) bool {
if allScores[i].Score == allScores[j].Score {
if allScores[i].Priority == allScores[j].Priority {
return allScores[i].Name < allScores[j].Name
}
return allScores[i].Priority < allScores[j].Priority
}
return allScores[i].Score > allScores[j].Score
})
sort.Slice(candidates, func(i, j int) bool {
if candidates[i].score == candidates[j].score {
return candidates[i].profile.Priority() < candidates[j].profile.Priority()
}
return candidates[i].score > candidates[j].score
})
if len(candidates) == 0 || candidates[0].score < 60 {
profiles := make([]Profile, 0, len(builtins))
active := make(map[string]struct{}, len(builtins))
for _, profile := range builtins {
if profile.SafeForFallback() {
profiles = append(profiles, profile)
active[profile.Name()] = struct{}{}
}
}
sortProfiles(profiles)
for i := range allScores {
_, ok := active[allScores[i].Name]
allScores[i].Active = ok
}
return MatchResult{Mode: ModeFallback, Profiles: profiles, Scores: allScores}
}
profiles := make([]Profile, 0, len(candidates))
seen := make(map[string]struct{}, len(candidates))
for _, candidate := range candidates {
name := candidate.profile.Name()
if _, ok := seen[name]; ok {
continue
}
seen[name] = struct{}{}
profiles = append(profiles, candidate.profile)
}
sortProfiles(profiles)
for i := range allScores {
_, ok := seen[allScores[i].Name]
allScores[i].Active = ok
}
return MatchResult{Mode: ModeMatched, Profiles: profiles, Scores: allScores}
}
func BuildAcquisitionPlan(signals MatchSignals) AcquisitionPlan {
match := MatchProfiles(signals)
plan := AcquisitionPlan{Mode: match.Mode}
for _, profile := range match.Profiles {
plan.Profiles = append(plan.Profiles, profile.Name())
profile.ExtendAcquisitionPlan(&plan, signals)
}
plan.Profiles = dedupeSorted(plan.Profiles)
plan.SeedPaths = dedupeSorted(plan.SeedPaths)
plan.CriticalPaths = dedupeSorted(plan.CriticalPaths)
plan.PlanBPaths = dedupeSorted(plan.PlanBPaths)
plan.Notes = dedupeSorted(plan.Notes)
if plan.Mode == ModeFallback {
ensureSnapshotMaxDocuments(&plan, 180000)
ensurePrefetchEnabled(&plan, true)
addPlanNote(&plan, "fallback acquisition expands safe profile probes")
}
return plan
}
func ApplyAnalysisProfiles(result *models.AnalysisResult, snapshot map[string]interface{}, signals MatchSignals) MatchResult {
match := MatchProfiles(signals)
for _, profile := range match.Profiles {
profile.PostAnalyze(result, snapshot, signals)
}
return match
}
func BuildAnalysisDirectives(match MatchResult) AnalysisDirectives {
return ResolveAnalysisPlan(match, nil, DiscoveredResources{}, MatchSignals{}).Directives
}
func sortProfiles(profiles []Profile) {
sort.Slice(profiles, func(i, j int) bool {
if profiles[i].Priority() == profiles[j].Priority() {
return profiles[i].Name() < profiles[j].Name()
}
return profiles[i].Priority() < profiles[j].Priority()
})
}

View File

@@ -0,0 +1,390 @@
package redfishprofile
import (
"strings"
"testing"
)
func TestMatchProfiles_UnknownVendorFallsBackToAggregateProfiles(t *testing.T) {
match := MatchProfiles(MatchSignals{
ServiceRootProduct: "Redfish Server",
})
if match.Mode != ModeFallback {
t.Fatalf("expected fallback mode, got %q", match.Mode)
}
if len(match.Profiles) < 2 {
t.Fatalf("expected aggregated fallback profiles, got %d", len(match.Profiles))
}
}
func TestMatchProfiles_MSISelectsMatchedMode(t *testing.T) {
match := MatchProfiles(MatchSignals{
SystemManufacturer: "Micro-Star International Co., Ltd.",
ResourceHints: []string{"/redfish/v1/Chassis/GPU1"},
})
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", match.Mode)
}
found := false
for _, profile := range match.Profiles {
if profile.Name() == "msi" {
found = true
break
}
}
if !found {
t.Fatal("expected msi profile to be selected")
}
}
func TestBuildAcquisitionPlan_FallbackIncludesProfileNotes(t *testing.T) {
plan := BuildAcquisitionPlan(MatchSignals{
ServiceRootVendor: "AMI",
})
if len(plan.Profiles) == 0 {
t.Fatal("expected acquisition plan profiles")
}
if len(plan.Notes) == 0 {
t.Fatal("expected acquisition plan notes")
}
}
func TestBuildAcquisitionPlan_FallbackAddsBroadCrawlTuning(t *testing.T) {
plan := BuildAcquisitionPlan(MatchSignals{
ServiceRootProduct: "Unknown Redfish",
})
if plan.Mode != ModeFallback {
t.Fatalf("expected fallback mode, got %q", plan.Mode)
}
if plan.Tuning.SnapshotMaxDocuments < 180000 {
t.Fatalf("expected widened snapshot cap, got %d", plan.Tuning.SnapshotMaxDocuments)
}
if plan.Tuning.PrefetchEnabled == nil || !*plan.Tuning.PrefetchEnabled {
t.Fatal("expected fallback to force prefetch enabled")
}
if !plan.Tuning.RecoveryPolicy.EnableCriticalCollectionMemberRetry {
t.Fatal("expected fallback to inherit critical member retry recovery")
}
if !plan.Tuning.RecoveryPolicy.EnableCriticalSlowProbe {
t.Fatal("expected fallback to inherit critical slow probe recovery")
}
}
func TestBuildAcquisitionPlan_HGXDisablesNVMePostProbe(t *testing.T) {
plan := BuildAcquisitionPlan(MatchSignals{
SystemModel: "HGX B200",
ResourceHints: []string{"/redfish/v1/Systems/HGX_Baseboard_0"},
})
if plan.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", plan.Mode)
}
if plan.Tuning.NVMePostProbeEnabled == nil || *plan.Tuning.NVMePostProbeEnabled {
t.Fatal("expected hgx profile to disable NVMe post-probe")
}
}
func TestResolveAcquisitionPlan_ExpandsScopedPaths(t *testing.T) {
signals := MatchSignals{}
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/1", "/redfish/v1/Systems/2"},
}, signals)
joined := joinResolvedPaths(resolved.SeedPaths)
for _, wanted := range []string{
"/redfish/v1/Systems/1/SimpleStorage",
"/redfish/v1/Systems/1/Storage/IntelVROC",
"/redfish/v1/Systems/2/SimpleStorage",
"/redfish/v1/Systems/2/Storage/IntelVROC",
} {
if !containsJoinedPath(joined, wanted) {
t.Fatalf("expected resolved seed path %q", wanted)
}
}
}
func TestResolveAcquisitionPlan_CriticalBaselineIsShapedByProfiles(t *testing.T) {
signals := MatchSignals{}
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/1"},
ChassisPaths: []string{"/redfish/v1/Chassis/1"},
ManagerPaths: []string{"/redfish/v1/Managers/1"},
}, signals)
joined := joinResolvedPaths(resolved.CriticalPaths)
for _, wanted := range []string{
"/redfish/v1",
"/redfish/v1/Systems/1",
"/redfish/v1/Systems/1/Memory",
"/redfish/v1/Chassis/1/Assembly",
"/redfish/v1/Managers/1/NetworkProtocol",
"/redfish/v1/UpdateService",
} {
if !containsJoinedPath(joined, wanted) {
t.Fatalf("expected resolved critical path %q", wanted)
}
}
}
func TestResolveAcquisitionPlan_FallbackAppendsPlanBToSeeds(t *testing.T) {
signals := MatchSignals{ServiceRootProduct: "Unknown Redfish"}
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
if plan.Mode != ModeFallback {
t.Fatalf("expected fallback mode, got %q", plan.Mode)
}
plan.PlanBPaths = append(plan.PlanBPaths, "/redfish/v1/Systems/1/Oem/TestPlanB")
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/1"},
}, signals)
if !containsJoinedPath(joinResolvedPaths(resolved.SeedPaths), "/redfish/v1/Systems/1/Oem/TestPlanB") {
t.Fatal("expected fallback resolved seeds to include plan-b path")
}
}
func TestResolveAcquisitionPlan_MSIRefinesDiscoveredGPUChassis(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Micro-Star International Co., Ltd.",
ResourceHints: []string{"/redfish/v1/Chassis/GPU1", "/redfish/v1/Chassis/GPU4/Sensors"},
}
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
ChassisPaths: []string{"/redfish/v1/Chassis/1", "/redfish/v1/Chassis/GPU1", "/redfish/v1/Chassis/GPU4"},
}, signals)
joinedSeeds := joinResolvedPaths(resolved.SeedPaths)
joinedCritical := joinResolvedPaths(resolved.CriticalPaths)
if !containsJoinedPath(joinedSeeds, "/redfish/v1/Chassis/GPU1") || !containsJoinedPath(joinedSeeds, "/redfish/v1/Chassis/GPU4") {
t.Fatal("expected MSI refinement to add discovered GPU chassis seed paths")
}
if containsJoinedPath(joinedSeeds, "/redfish/v1/Chassis/GPU2") {
t.Fatal("did not expect undiscovered MSI GPU chassis in resolved seeds")
}
if !containsJoinedPath(joinedCritical, "/redfish/v1/Chassis/GPU1/Sensors") || !containsJoinedPath(joinedCritical, "/redfish/v1/Chassis/GPU4/Sensors") {
t.Fatal("expected MSI refinement to add discovered GPU sensor critical paths")
}
if containsJoinedPath(joinedCritical, "/redfish/v1/Chassis/GPU3/Sensors") {
t.Fatal("did not expect undiscovered MSI GPU sensor critical path")
}
}
func TestResolveAcquisitionPlan_HGXRefinesDiscoveredBaseboardSystems(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Supermicro",
SystemModel: "SYS-821GE-TNHR",
ChassisModel: "HGX B200",
ResourceHints: []string{
"/redfish/v1/Systems/HGX_Baseboard_0",
"/redfish/v1/Systems/HGX_Baseboard_0/Processors",
"/redfish/v1/Systems/1",
},
}
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/1", "/redfish/v1/Systems/HGX_Baseboard_0"},
}, signals)
joinedSeeds := joinResolvedPaths(resolved.SeedPaths)
joinedCritical := joinResolvedPaths(resolved.CriticalPaths)
if !containsJoinedPath(joinedSeeds, "/redfish/v1/Systems/HGX_Baseboard_0") || !containsJoinedPath(joinedSeeds, "/redfish/v1/Systems/HGX_Baseboard_0/Processors") {
t.Fatal("expected HGX refinement to add discovered baseboard system paths")
}
if !containsJoinedPath(joinedCritical, "/redfish/v1/Systems/HGX_Baseboard_0") || !containsJoinedPath(joinedCritical, "/redfish/v1/Systems/HGX_Baseboard_0/Processors") {
t.Fatal("expected HGX refinement to add discovered baseboard critical paths")
}
if containsJoinedPath(joinedSeeds, "/redfish/v1/Systems/HGX_Baseboard_1") {
t.Fatal("did not expect undiscovered HGX baseboard system path")
}
}
func TestResolveAcquisitionPlan_SupermicroRefinesFirmwareInventoryFromHint(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Supermicro",
ResourceHints: []string{
"/redfish/v1/UpdateService/Oem/Supermicro/FirmwareInventory",
"/redfish/v1/Managers/1/Oem/Supermicro/FanMode",
},
}
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
ManagerPaths: []string{"/redfish/v1/Managers/1"},
}, signals)
joinedCritical := joinResolvedPaths(resolved.CriticalPaths)
if !containsJoinedPath(joinedCritical, "/redfish/v1/UpdateService/Oem/Supermicro/FirmwareInventory") {
t.Fatal("expected Supermicro refinement to add firmware inventory critical path")
}
if !containsJoinedPath(joinResolvedPaths(resolved.Plan.PlanBPaths), "/redfish/v1/UpdateService/Oem/Supermicro/FirmwareInventory") {
t.Fatal("expected Supermicro refinement to add firmware inventory plan-b path")
}
}
func TestResolveAcquisitionPlan_DellRefinesDiscoveredIDRACManager(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Dell Inc.",
ServiceRootProduct: "iDRAC Redfish Service",
}
match := MatchProfiles(signals)
plan := BuildAcquisitionPlan(signals)
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
ManagerPaths: []string{"/redfish/v1/Managers/1", "/redfish/v1/Managers/iDRAC.Embedded.1"},
}, signals)
joinedSeeds := joinResolvedPaths(resolved.SeedPaths)
joinedCritical := joinResolvedPaths(resolved.CriticalPaths)
if !containsJoinedPath(joinedSeeds, "/redfish/v1/Managers/iDRAC.Embedded.1") {
t.Fatal("expected Dell refinement to add discovered iDRAC manager seed path")
}
if !containsJoinedPath(joinedCritical, "/redfish/v1/Managers/iDRAC.Embedded.1") {
t.Fatal("expected Dell refinement to add discovered iDRAC manager critical path")
}
}
func TestBuildAnalysisDirectives_SupermicroEnablesVendorStorageFallbacks(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Supermicro",
SystemModel: "SYS-821GE",
}
match := MatchProfiles(signals)
plan := ResolveAnalysisPlan(match, map[string]interface{}{
"/redfish/v1/Chassis/NVMeSSD.1.StorageBackplane/Drives": map[string]interface{}{},
}, DiscoveredResources{}, signals)
directives := plan.Directives
if !directives.EnableSupermicroNVMeBackplane {
t.Fatal("expected supermicro nvme backplane fallback")
}
}
func joinResolvedPaths(paths []string) string {
return "\n" + strings.Join(paths, "\n") + "\n"
}
func containsJoinedPath(joined, want string) bool {
return strings.Contains(joined, "\n"+want+"\n")
}
func TestBuildAnalysisDirectives_HGXEnablesGPUFallbacks(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Supermicro",
SystemModel: "SYS-821GE-TNHR",
ChassisModel: "HGX B200",
ResourceHints: []string{"/redfish/v1/Systems/HGX_Baseboard_0", "/redfish/v1/Chassis/HGX_Chassis_0/PCIeDevices/GPU_SXM_1"},
}
match := MatchProfiles(signals)
plan := ResolveAnalysisPlan(match, map[string]interface{}{
"/redfish/v1/Systems/HGX_Baseboard_0/Processors/GPU_SXM_1": map[string]interface{}{"ProcessorType": "GPU"},
"/redfish/v1/Chassis/HGX_Chassis_0/PCIeDevices/GPU_SXM_1": map[string]interface{}{},
}, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/HGX_Baseboard_0"},
}, signals)
directives := plan.Directives
if !directives.EnableProcessorGPUFallback {
t.Fatal("expected processor GPU fallback for hgx profile")
}
if !directives.EnableProcessorGPUChassisAlias {
t.Fatal("expected processor GPU chassis alias resolution for hgx profile")
}
if !directives.EnableGenericGraphicsControllerDedup {
t.Fatal("expected graphics-controller dedup for hgx profile")
}
}
func TestBuildAnalysisDirectives_MSIEnablesMSIChassisLookup(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Micro-Star International Co., Ltd.",
}
match := MatchProfiles(signals)
plan := ResolveAnalysisPlan(match, map[string]interface{}{
"/redfish/v1/Systems/1/Processors/GPU1": map[string]interface{}{"ProcessorType": "GPU"},
"/redfish/v1/Chassis/GPU1": map[string]interface{}{},
}, DiscoveredResources{
SystemPaths: []string{"/redfish/v1/Systems/1"},
ChassisPaths: []string{"/redfish/v1/Chassis/GPU1"},
}, signals)
directives := plan.Directives
if !directives.EnableMSIProcessorGPUChassisLookup {
t.Fatal("expected MSI processor GPU chassis lookup")
}
}
func TestBuildAnalysisDirectives_SupermicroEnablesStorageRecovery(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Supermicro",
}
match := MatchProfiles(signals)
plan := ResolveAnalysisPlan(match, map[string]interface{}{
"/redfish/v1/Chassis/1/Drives": map[string]interface{}{},
"/redfish/v1/Systems/1/Storage/IntelVROC": map[string]interface{}{},
"/redfish/v1/Systems/1/Storage/IntelVROC/Drives": map[string]interface{}{},
}, DiscoveredResources{}, signals)
directives := plan.Directives
if !directives.EnableStorageEnclosureRecovery {
t.Fatal("expected storage enclosure recovery for supermicro")
}
if !directives.EnableKnownStorageControllerRecovery {
t.Fatal("expected known storage controller recovery for supermicro")
}
}
func TestMatchProfiles_OrderingIsDeterministic(t *testing.T) {
signals := MatchSignals{
SystemManufacturer: "Micro-Star International Co., Ltd.",
ResourceHints: []string{"/redfish/v1/Chassis/GPU1"},
}
first := MatchProfiles(signals)
second := MatchProfiles(signals)
if len(first.Profiles) != len(second.Profiles) {
t.Fatalf("profile stack size differs across calls: %d vs %d", len(first.Profiles), len(second.Profiles))
}
for i := range first.Profiles {
if first.Profiles[i].Name() != second.Profiles[i].Name() {
t.Fatalf("profile ordering differs at index %d: %q vs %q", i, first.Profiles[i].Name(), second.Profiles[i].Name())
}
}
}
func TestMatchProfiles_FallbackOrderingIsDeterministic(t *testing.T) {
signals := MatchSignals{ServiceRootProduct: "Unknown Redfish"}
first := MatchProfiles(signals)
second := MatchProfiles(signals)
if first.Mode != ModeFallback || second.Mode != ModeFallback {
t.Fatalf("expected fallback mode in both calls")
}
if len(first.Profiles) != len(second.Profiles) {
t.Fatalf("fallback profile stack size differs: %d vs %d", len(first.Profiles), len(second.Profiles))
}
for i := range first.Profiles {
if first.Profiles[i].Name() != second.Profiles[i].Name() {
t.Fatalf("fallback profile ordering differs at index %d: %q vs %q", i, first.Profiles[i].Name(), second.Profiles[i].Name())
}
}
}
func TestMatchProfiles_FallbackOnlySelectsSafeProfiles(t *testing.T) {
match := MatchProfiles(MatchSignals{ServiceRootProduct: "Unknown Generic Redfish Server"})
if match.Mode != ModeFallback {
t.Fatalf("expected fallback mode, got %q", match.Mode)
}
for _, profile := range match.Profiles {
if !profile.SafeForFallback() {
t.Fatalf("fallback mode included non-safe profile %q", profile.Name())
}
}
}
func TestBuildAnalysisDirectives_GenericMatchedKeepsFallbacksDisabled(t *testing.T) {
match := MatchResult{
Mode: ModeMatched,
Profiles: []Profile{genericProfile()},
}
directives := ResolveAnalysisPlan(match, nil, DiscoveredResources{}, MatchSignals{}).Directives
if directives.EnableProcessorGPUFallback {
t.Fatal("did not expect processor GPU fallback for generic matched profile")
}
if directives.EnableSupermicroNVMeBackplane {
t.Fatal("did not expect supermicro nvme fallback for generic matched profile")
}
if directives.EnableGenericGraphicsControllerDedup {
t.Fatal("did not expect generic graphics-controller dedup for generic matched profile")
}
}

View File

@@ -0,0 +1,33 @@
package redfishprofile
func amiProfile() Profile {
return staticProfile{
name: "ami-family",
priority: 10,
safeForFallback: true,
matchFn: func(s MatchSignals) int {
score := 0
if containsFold(s.ServiceRootVendor, "ami") || containsFold(s.ServiceRootProduct, "ami") {
score += 70
}
for _, ns := range s.OEMNamespaces {
if containsFold(ns, "ami") {
score += 30
break
}
}
return min(score, 100)
},
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
addPlanPaths(&plan.SeedPaths,
"/redfish/v1/Oem/Ami",
"/redfish/v1/Oem/Ami/InventoryData/Status",
)
ensurePrefetchEnabled(plan, true)
addPlanNote(plan, "ami-family acquisition extensions enabled")
},
applyAnalysisDirectives: func(d *AnalysisDirectives, _ MatchSignals) {
d.EnableGenericGraphicsControllerDedup = true
},
}
}

View File

@@ -0,0 +1,45 @@
package redfishprofile
func dellProfile() Profile {
return staticProfile{
name: "dell",
priority: 20,
safeForFallback: true,
matchFn: func(s MatchSignals) int {
score := 0
if containsFold(s.SystemManufacturer, "dell") || containsFold(s.ChassisManufacturer, "dell") {
score += 80
}
for _, ns := range s.OEMNamespaces {
if containsFold(ns, "dell") {
score += 30
break
}
}
if containsFold(s.ServiceRootProduct, "idrac") {
score += 30
}
return min(score, 100)
},
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
ensureRecoveryPolicy(plan, AcquisitionRecoveryPolicy{
EnableProfilePlanB: true,
})
addPlanNote(plan, "dell iDRAC acquisition extensions enabled")
},
refineAcquisition: func(resolved *ResolvedAcquisitionPlan, discovered DiscoveredResources, _ MatchSignals) {
for _, managerPath := range discovered.ManagerPaths {
if !containsFold(managerPath, "idrac") {
continue
}
addPlanPaths(&resolved.SeedPaths, managerPath)
addPlanPaths(&resolved.Plan.SeedPaths, managerPath)
addPlanPaths(&resolved.CriticalPaths, managerPath)
addPlanPaths(&resolved.Plan.CriticalPaths, managerPath)
}
},
applyAnalysisDirectives: func(d *AnalysisDirectives, _ MatchSignals) {
d.EnableGenericGraphicsControllerDedup = true
},
}
}

View File

@@ -0,0 +1,116 @@
package redfishprofile
func genericProfile() Profile {
return staticProfile{
name: "generic",
priority: 100,
safeForFallback: true,
matchFn: func(MatchSignals) int { return 10 },
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
ensurePrefetchPolicy(plan, AcquisitionPrefetchPolicy{
IncludeSuffixes: []string{
"/Bios",
"/Processors",
"/Memory",
"/Storage",
"/SimpleStorage",
"/PCIeDevices",
"/PCIeFunctions",
"/Accelerators",
"/GraphicsControllers",
"/EthernetInterfaces",
"/NetworkInterfaces",
"/NetworkAdapters",
"/Drives",
"/Power",
"/PowerSubsystem/PowerSupplies",
"/NetworkProtocol",
"/UpdateService",
"/UpdateService/FirmwareInventory",
},
ExcludeContains: []string{
"/Fabrics",
"/Backplanes",
"/Boards",
"/Assembly",
"/Sensors",
"/ThresholdSensors",
"/DiscreteSensors",
"/ThermalConfig",
"/ThermalSubsystem",
"/EnvironmentMetrics",
"/Certificates",
"/LogServices",
},
})
ensureScopedPathPolicy(plan, AcquisitionScopedPathPolicy{
SystemCriticalSuffixes: []string{
"/Bios",
"/Oem/Public",
"/Oem/Public/FRU",
"/Processors",
"/Memory",
"/Storage",
"/PCIeDevices",
"/PCIeFunctions",
"/Accelerators",
"/GraphicsControllers",
"/EthernetInterfaces",
"/NetworkInterfaces",
"/SimpleStorage",
"/Storage/IntelVROC",
"/Storage/IntelVROC/Drives",
"/Storage/IntelVROC/Volumes",
},
ChassisCriticalSuffixes: []string{
"/Oem/Public",
"/Oem/Public/FRU",
"/Power",
"/NetworkAdapters",
"/PCIeDevices",
"/Accelerators",
"/Drives",
"/Assembly",
},
ManagerCriticalSuffixes: []string{
"/NetworkProtocol",
},
SystemSeedSuffixes: []string{
"/SimpleStorage",
"/Storage/IntelVROC",
"/Storage/IntelVROC/Drives",
"/Storage/IntelVROC/Volumes",
},
})
addPlanPaths(&plan.CriticalPaths,
"/redfish/v1/UpdateService",
"/redfish/v1/UpdateService/FirmwareInventory",
)
ensureSnapshotMaxDocuments(plan, 100000)
ensureSnapshotWorkers(plan, 6)
ensurePrefetchWorkers(plan, 4)
ensureETABaseline(plan, AcquisitionETABaseline{
DiscoverySeconds: 8,
SnapshotSeconds: 90,
PrefetchSeconds: 20,
CriticalPlanBSeconds: 20,
ProfilePlanBSeconds: 15,
})
ensurePostProbePolicy(plan, AcquisitionPostProbePolicy{
EnableNumericCollectionProbe: true,
})
ensureRecoveryPolicy(plan, AcquisitionRecoveryPolicy{
EnableCriticalCollectionMemberRetry: true,
EnableCriticalSlowProbe: true,
EnableEmptyCriticalCollectionRetry: true,
})
ensureRatePolicy(plan, AcquisitionRatePolicy{
TargetP95LatencyMS: 900,
ThrottleP95LatencyMS: 1800,
MinSnapshotWorkers: 2,
MinPrefetchWorkers: 1,
DisablePrefetchOnErrors: true,
})
},
}
}

View File

@@ -0,0 +1,85 @@
package redfishprofile
func hgxProfile() Profile {
return staticProfile{
name: "hgx-topology",
priority: 30,
safeForFallback: true,
matchFn: func(s MatchSignals) int {
score := 0
if containsFold(s.SystemModel, "hgx") || containsFold(s.ChassisModel, "hgx") {
score += 70
}
for _, hint := range s.ResourceHints {
if containsFold(hint, "hgx_") || containsFold(hint, "gpu_sxm") {
score += 20
break
}
}
return min(score, 100)
},
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
ensureSnapshotMaxDocuments(plan, 180000)
ensureSnapshotWorkers(plan, 4)
ensurePrefetchWorkers(plan, 4)
ensureNVMePostProbeEnabled(plan, false)
ensureRecoveryPolicy(plan, AcquisitionRecoveryPolicy{
EnableProfilePlanB: true,
})
ensureETABaseline(plan, AcquisitionETABaseline{
DiscoverySeconds: 20,
SnapshotSeconds: 300,
PrefetchSeconds: 50,
CriticalPlanBSeconds: 90,
ProfilePlanBSeconds: 40,
})
ensureRatePolicy(plan, AcquisitionRatePolicy{
TargetP95LatencyMS: 1500,
ThrottleP95LatencyMS: 3000,
MinSnapshotWorkers: 1,
MinPrefetchWorkers: 1,
DisablePrefetchOnErrors: true,
})
addPlanNote(plan, "hgx topology acquisition extensions enabled")
},
refineAcquisition: func(resolved *ResolvedAcquisitionPlan, discovered DiscoveredResources, _ MatchSignals) {
for _, systemPath := range discovered.SystemPaths {
if !containsFold(systemPath, "hgx_baseboard_") {
continue
}
addPlanPaths(&resolved.SeedPaths, systemPath, joinPath(systemPath, "/Processors"))
addPlanPaths(&resolved.Plan.SeedPaths, systemPath, joinPath(systemPath, "/Processors"))
addPlanPaths(&resolved.CriticalPaths, systemPath, joinPath(systemPath, "/Processors"))
addPlanPaths(&resolved.Plan.CriticalPaths, systemPath, joinPath(systemPath, "/Processors"))
addPlanPaths(&resolved.Plan.PlanBPaths, systemPath, joinPath(systemPath, "/Processors"))
}
},
applyAnalysisDirectives: func(d *AnalysisDirectives, _ MatchSignals) {
d.EnableGenericGraphicsControllerDedup = true
d.EnableStorageEnclosureRecovery = true
},
refineAnalysis: func(plan *ResolvedAnalysisPlan, snapshot map[string]interface{}, discovered DiscoveredResources, _ MatchSignals) {
if snapshotHasGPUProcessor(snapshot, discovered.SystemPaths) && (snapshotHasPathContaining(snapshot, "gpu_sxm") || snapshotHasPathContaining(snapshot, "hgx_")) {
plan.Directives.EnableProcessorGPUFallback = true
plan.Directives.EnableProcessorGPUChassisAlias = true
addAnalysisLookupMode(plan, "hgx-alias")
addAnalysisNote(plan, "hgx analysis enables processor-gpu alias fallback from snapshot topology")
}
if snapshotHasStorageControllerHint(snapshot, "/storage/intelvroc", "/storage/ha-raid", "/storage/mrvl.ha-raid") {
plan.Directives.EnableKnownStorageControllerRecovery = true
addAnalysisStorageDriveCollections(plan,
"/Storage/IntelVROC/Drives",
"/Storage/IntelVROC/Controllers/1/Drives",
)
addAnalysisStorageVolumeCollections(plan,
"/Storage/IntelVROC/Volumes",
"/Storage/HA-RAID/Volumes",
"/Storage/MRVL.HA-RAID/Volumes",
)
}
if snapshotHasPathContaining(snapshot, "/chassis/nvmessd.") && snapshotHasPathContaining(snapshot, ".storagebackplane") {
plan.Directives.EnableSupermicroNVMeBackplane = true
}
},
}
}

View File

@@ -0,0 +1,67 @@
package redfishprofile
func hpeProfile() Profile {
return staticProfile{
name: "hpe",
priority: 20,
safeForFallback: true,
matchFn: func(s MatchSignals) int {
score := 0
if containsFold(s.SystemManufacturer, "hpe") ||
containsFold(s.SystemManufacturer, "hewlett packard") ||
containsFold(s.ChassisManufacturer, "hpe") ||
containsFold(s.ChassisManufacturer, "hewlett packard") {
score += 80
}
for _, ns := range s.OEMNamespaces {
if containsFold(ns, "hpe") {
score += 30
break
}
}
if containsFold(s.ServiceRootProduct, "ilo") {
score += 30
}
if containsFold(s.ManagerManufacturer, "hpe") || containsFold(s.ManagerManufacturer, "ilo") {
score += 20
}
return min(score, 100)
},
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
// HPE ProLiant SmartStorage RAID controller inventory is not reachable
// via standard Redfish Storage paths — it requires the HPE OEM SmartStorage tree.
ensureScopedPathPolicy(plan, AcquisitionScopedPathPolicy{
SystemCriticalSuffixes: []string{
"/SmartStorage",
"/SmartStorageConfig",
},
ManagerCriticalSuffixes: []string{
"/LicenseService",
},
})
// HPE iLO responds more slowly than average BMCs under load; give the
// ETA estimator a realistic baseline so progress reports are accurate.
ensureETABaseline(plan, AcquisitionETABaseline{
DiscoverySeconds: 12,
SnapshotSeconds: 180,
PrefetchSeconds: 30,
CriticalPlanBSeconds: 40,
ProfilePlanBSeconds: 25,
})
ensureRecoveryPolicy(plan, AcquisitionRecoveryPolicy{
EnableProfilePlanB: true,
})
// HPE iLO starts throttling under high request rates. Setting a higher
// latency tolerance prevents the adaptive throttler from treating normal
// iLO slowness as a reason to stall the collection.
ensureRatePolicy(plan, AcquisitionRatePolicy{
TargetP95LatencyMS: 1200,
ThrottleP95LatencyMS: 2500,
MinSnapshotWorkers: 2,
MinPrefetchWorkers: 1,
DisablePrefetchOnErrors: true,
})
addPlanNote(plan, "hpe ilo acquisition extensions enabled")
},
}
}

View File

@@ -0,0 +1,149 @@
package redfishprofile
import (
"regexp"
"strings"
)
var (
outboardCardHintRe = regexp.MustCompile(`/outboardPCIeCard\d+(?:/|$)`)
obDriveHintRe = regexp.MustCompile(`/Drives/OB\d+$`)
fpDriveHintRe = regexp.MustCompile(`/Drives/FP00HDD\d+$`)
vrFirmwareHintRe = regexp.MustCompile(`^CPU\d+_PVCC.*_VR$`)
)
var inspurGroupOEMFirmwareHints = map[string]struct{}{
"Front_HDD_CPLD0": {},
"MainBoard0CPLD": {},
"MainBoardCPLD": {},
"PDBBoardCPLD": {},
"SCMCPLD": {},
"SWBoardCPLD": {},
}
func inspurGroupOEMPlatformsProfile() Profile {
return staticProfile{
name: "inspur-group-oem-platforms",
priority: 25,
safeForFallback: false,
matchFn: func(s MatchSignals) int {
topologyScore := 0
boardScore := 0
chassisOutboard := matchedPathTokens(s.ResourceHints, "/redfish/v1/Chassis/", outboardCardHintRe)
systemOutboard := matchedPathTokens(s.ResourceHints, "/redfish/v1/Systems/", outboardCardHintRe)
obDrives := matchedPathTokens(s.ResourceHints, "", obDriveHintRe)
fpDrives := matchedPathTokens(s.ResourceHints, "", fpDriveHintRe)
firmwareNames, vrFirmwareNames := inspurGroupOEMFirmwareMatches(s.ResourceHints)
if len(chassisOutboard) > 0 {
topologyScore += 20
}
if len(systemOutboard) > 0 {
topologyScore += 10
}
switch {
case len(obDrives) > 0 && len(fpDrives) > 0:
topologyScore += 15
}
switch {
case len(firmwareNames) >= 2:
boardScore += 15
}
switch {
case len(vrFirmwareNames) >= 2:
boardScore += 10
}
if anySignalContains(s, "COMMONbAssembly") {
boardScore += 12
}
if anySignalContains(s, "EnvironmentMetrcs") {
boardScore += 8
}
if anySignalContains(s, "GetServerAllUSBStatus") {
boardScore += 8
}
if topologyScore == 0 || boardScore == 0 {
return 0
}
return min(topologyScore+boardScore, 100)
},
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
addPlanNote(plan, "Inspur Group OEM platform fingerprint matched")
},
applyAnalysisDirectives: func(d *AnalysisDirectives, _ MatchSignals) {
d.EnableGenericGraphicsControllerDedup = true
},
}
}
func matchedPathTokens(paths []string, requiredPrefix string, re *regexp.Regexp) []string {
seen := make(map[string]struct{})
for _, rawPath := range paths {
path := normalizePath(rawPath)
if path == "" || (requiredPrefix != "" && !strings.HasPrefix(path, requiredPrefix)) {
continue
}
token := re.FindString(path)
if token == "" {
continue
}
token = strings.Trim(token, "/")
if token == "" {
continue
}
seen[token] = struct{}{}
}
out := make([]string, 0, len(seen))
for token := range seen {
out = append(out, token)
}
return dedupeSorted(out)
}
func inspurGroupOEMFirmwareMatches(paths []string) ([]string, []string) {
firmwareNames := make(map[string]struct{})
vrNames := make(map[string]struct{})
for _, rawPath := range paths {
path := normalizePath(rawPath)
if !strings.HasPrefix(path, "/redfish/v1/UpdateService/FirmwareInventory/") {
continue
}
name := strings.TrimSpace(path[strings.LastIndex(path, "/")+1:])
if name == "" {
continue
}
if _, ok := inspurGroupOEMFirmwareHints[name]; ok {
firmwareNames[name] = struct{}{}
}
if vrFirmwareHintRe.MatchString(name) {
vrNames[name] = struct{}{}
}
}
return mapKeysSorted(firmwareNames), mapKeysSorted(vrNames)
}
func anySignalContains(signals MatchSignals, needle string) bool {
needle = strings.TrimSpace(needle)
if needle == "" {
return false
}
for _, signal := range signals.ResourceHints {
if strings.Contains(signal, needle) {
return true
}
}
for _, signal := range signals.DocHints {
if strings.Contains(signal, needle) {
return true
}
}
return false
}
func mapKeysSorted(items map[string]struct{}) []string {
out := make([]string, 0, len(items))
for item := range items {
out = append(out, item)
}
return dedupeSorted(out)
}

View File

@@ -0,0 +1,182 @@
package redfishprofile
import (
"archive/zip"
"encoding/json"
"os"
"path/filepath"
"testing"
)
func TestCollectSignalsFromTree_InspurGroupOEMPlatformsSelectsMatchedMode(t *testing.T) {
tree := map[string]interface{}{
"/redfish/v1": map[string]interface{}{
"@odata.id": "/redfish/v1",
},
"/redfish/v1/Systems": map[string]interface{}{
"Members": []interface{}{
map[string]interface{}{"@odata.id": "/redfish/v1/Systems/1"},
},
},
"/redfish/v1/Systems/1": map[string]interface{}{
"@odata.id": "/redfish/v1/Systems/1",
"Oem": map[string]interface{}{
"Public": map[string]interface{}{
"USB": map[string]interface{}{
"@odata.id": "/redfish/v1/Systems/1/Oem/Public/GetServerAllUSBStatus",
},
},
},
"NetworkInterfaces": map[string]interface{}{
"@odata.id": "/redfish/v1/Systems/1/NetworkInterfaces",
},
},
"/redfish/v1/Systems/1/NetworkInterfaces": map[string]interface{}{
"Members": []interface{}{
map[string]interface{}{"@odata.id": "/redfish/v1/Systems/1/NetworkInterfaces/outboardPCIeCard0"},
map[string]interface{}{"@odata.id": "/redfish/v1/Systems/1/NetworkInterfaces/outboardPCIeCard1"},
},
},
"/redfish/v1/Chassis": map[string]interface{}{
"Members": []interface{}{
map[string]interface{}{"@odata.id": "/redfish/v1/Chassis/1"},
},
},
"/redfish/v1/Chassis/1": map[string]interface{}{
"@odata.id": "/redfish/v1/Chassis/1",
"Actions": map[string]interface{}{
"Oem": map[string]interface{}{
"Public": map[string]interface{}{
"NvGpuPowerLimitWatts": map[string]interface{}{
"target": "/redfish/v1/Chassis/1/GPU/EnvironmentMetrcs",
},
},
},
},
"Drives": map[string]interface{}{
"@odata.id": "/redfish/v1/Chassis/1/Drives",
},
"NetworkAdapters": map[string]interface{}{
"@odata.id": "/redfish/v1/Chassis/1/NetworkAdapters",
},
},
"/redfish/v1/Chassis/1/Drives": map[string]interface{}{
"Members": []interface{}{
map[string]interface{}{"@odata.id": "/redfish/v1/Chassis/1/Drives/OB01"},
map[string]interface{}{"@odata.id": "/redfish/v1/Chassis/1/Drives/FP00HDD00"},
},
},
"/redfish/v1/Chassis/1/NetworkAdapters": map[string]interface{}{
"Members": []interface{}{
map[string]interface{}{"@odata.id": "/redfish/v1/Chassis/1/NetworkAdapters/outboardPCIeCard0"},
map[string]interface{}{"@odata.id": "/redfish/v1/Chassis/1/NetworkAdapters/outboardPCIeCard1"},
},
},
"/redfish/v1/Chassis/1/Assembly": map[string]interface{}{
"Assemblies": []interface{}{
map[string]interface{}{
"Oem": map[string]interface{}{
"COMMONb": map[string]interface{}{
"COMMONbAssembly": map[string]interface{}{
"@odata.type": "#COMMONbAssembly.v1_0_0.COMMONbAssembly",
},
},
},
},
},
},
"/redfish/v1/Managers": map[string]interface{}{
"Members": []interface{}{
map[string]interface{}{"@odata.id": "/redfish/v1/Managers/1"},
},
},
"/redfish/v1/Managers/1": map[string]interface{}{
"Actions": map[string]interface{}{
"Oem": map[string]interface{}{
"#PublicManager.ExportConfFile": map[string]interface{}{
"target": "/redfish/v1/Managers/1/Actions/Oem/Public/ExportConfFile",
},
},
},
},
"/redfish/v1/UpdateService/FirmwareInventory": map[string]interface{}{
"Members": []interface{}{
map[string]interface{}{"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/Front_HDD_CPLD0"},
map[string]interface{}{"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/SCMCPLD"},
map[string]interface{}{"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/CPU0_PVCCD_HV_VR"},
map[string]interface{}{"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/CPU1_PVCCIN_VR"},
},
},
}
signals := CollectSignalsFromTree(tree)
match := MatchProfiles(signals)
if match.Mode != ModeMatched {
t.Fatalf("expected matched mode, got %q", match.Mode)
}
assertProfileSelected(t, match, "inspur-group-oem-platforms")
}
func TestCollectSignalsFromTree_InspurGroupOEMPlatformsDoesNotFalsePositiveOnExampleRawExports(t *testing.T) {
examples := []string{
"2026-03-18 (G5500 V7) - 210619KUGGXGS2000015.zip",
"2026-03-11 (SYS-821GE-TNHR) - A514359X5C08846.zip",
"2026-03-15 (CG480-S5063) - P5T0006091.zip",
"2026-03-18 (CG290-S3063) - PAT0011258.zip",
"2024-04-25 (AS -4124GQ-TNMI) - S490387X4418273.zip",
}
for _, name := range examples {
t.Run(name, func(t *testing.T) {
tree := loadRawExportTreeFromExampleZip(t, name)
match := MatchProfiles(CollectSignalsFromTree(tree))
assertProfileNotSelected(t, match, "inspur-group-oem-platforms")
})
}
}
func loadRawExportTreeFromExampleZip(t *testing.T, name string) map[string]interface{} {
t.Helper()
path := filepath.Join("..", "..", "..", "example", name)
f, err := os.Open(path)
if err != nil {
t.Fatalf("open example zip %s: %v", path, err)
}
defer f.Close()
info, err := f.Stat()
if err != nil {
t.Fatalf("stat example zip %s: %v", path, err)
}
zr, err := zip.NewReader(f, info.Size())
if err != nil {
t.Fatalf("read example zip %s: %v", path, err)
}
for _, file := range zr.File {
if file.Name != "raw_export.json" {
continue
}
rc, err := file.Open()
if err != nil {
t.Fatalf("open %s in %s: %v", file.Name, path, err)
}
defer rc.Close()
var payload struct {
Source struct {
RawPayloads struct {
RedfishTree map[string]interface{} `json:"redfish_tree"`
} `json:"raw_payloads"`
} `json:"source"`
}
if err := json.NewDecoder(rc).Decode(&payload); err != nil {
t.Fatalf("decode raw_export.json from %s: %v", path, err)
}
if len(payload.Source.RawPayloads.RedfishTree) == 0 {
t.Fatalf("example %s has empty redfish_tree", path)
}
return payload.Source.RawPayloads.RedfishTree
}
t.Fatalf("raw_export.json not found in %s", path)
return nil
}

View File

@@ -0,0 +1,74 @@
package redfishprofile
import "strings"
func msiProfile() Profile {
return staticProfile{
name: "msi",
priority: 20,
safeForFallback: true,
matchFn: func(s MatchSignals) int {
score := 0
if containsFold(s.SystemManufacturer, "micro-star") || containsFold(s.ChassisManufacturer, "micro-star") {
score += 80
}
if containsFold(s.SystemManufacturer, "msi") || containsFold(s.ChassisManufacturer, "msi") {
score += 40
}
for _, hint := range s.ResourceHints {
if strings.HasPrefix(hint, "/redfish/v1/Chassis/GPU") {
score += 10
break
}
}
return min(score, 100)
},
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
ensureSnapshotWorkers(plan, 6)
ensurePrefetchWorkers(plan, 8)
ensureETABaseline(plan, AcquisitionETABaseline{
DiscoverySeconds: 12,
SnapshotSeconds: 120,
PrefetchSeconds: 25,
CriticalPlanBSeconds: 35,
ProfilePlanBSeconds: 25,
})
ensureRatePolicy(plan, AcquisitionRatePolicy{
TargetP95LatencyMS: 1000,
ThrottleP95LatencyMS: 2200,
MinSnapshotWorkers: 2,
MinPrefetchWorkers: 2,
DisablePrefetchOnErrors: true,
})
ensureRecoveryPolicy(plan, AcquisitionRecoveryPolicy{
EnableProfilePlanB: true,
})
addPlanNote(plan, "msi gpu chassis probes enabled")
},
refineAcquisition: func(resolved *ResolvedAcquisitionPlan, discovered DiscoveredResources, _ MatchSignals) {
for _, chassisPath := range discovered.ChassisPaths {
if !strings.HasPrefix(chassisPath, "/redfish/v1/Chassis/GPU") {
continue
}
addPlanPaths(&resolved.SeedPaths, chassisPath)
addPlanPaths(&resolved.Plan.SeedPaths, chassisPath)
addPlanPaths(&resolved.CriticalPaths, joinPath(chassisPath, "/Sensors"))
addPlanPaths(&resolved.Plan.CriticalPaths, joinPath(chassisPath, "/Sensors"))
addPlanPaths(&resolved.Plan.PlanBPaths, joinPath(chassisPath, "/Sensors"))
}
},
applyAnalysisDirectives: func(d *AnalysisDirectives, _ MatchSignals) {
d.EnableGenericGraphicsControllerDedup = true
},
refineAnalysis: func(plan *ResolvedAnalysisPlan, snapshot map[string]interface{}, discovered DiscoveredResources, _ MatchSignals) {
if snapshotHasGPUProcessor(snapshot, discovered.SystemPaths) && snapshotHasPathPrefix(snapshot, "/redfish/v1/Chassis/GPU") {
plan.Directives.EnableProcessorGPUFallback = true
plan.Directives.EnableMSIProcessorGPUChassisLookup = true
plan.Directives.EnableMSIGhostGPUFilter = true
addAnalysisLookupMode(plan, "msi-index")
addAnalysisNote(plan, "msi analysis enables processor-gpu fallback from discovered GPU chassis")
addAnalysisNote(plan, "msi ghost-gpu filter enabled: GPUs with temperature=0 on powered-on host are excluded")
}
},
}
}

View File

@@ -0,0 +1,81 @@
package redfishprofile
func supermicroProfile() Profile {
return staticProfile{
name: "supermicro",
priority: 20,
safeForFallback: true,
matchFn: func(s MatchSignals) int {
score := 0
if containsFold(s.SystemManufacturer, "supermicro") || containsFold(s.ChassisManufacturer, "supermicro") {
score += 80
}
for _, hint := range s.ResourceHints {
if containsFold(hint, "hgx_baseboard") || containsFold(hint, "hgx_gpu_sxm") {
score += 20
break
}
}
return min(score, 100)
},
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
ensureSnapshotMaxDocuments(plan, 150000)
ensureSnapshotWorkers(plan, 6)
ensurePrefetchWorkers(plan, 4)
ensureETABaseline(plan, AcquisitionETABaseline{
DiscoverySeconds: 15,
SnapshotSeconds: 180,
PrefetchSeconds: 35,
CriticalPlanBSeconds: 45,
ProfilePlanBSeconds: 30,
})
ensurePostProbePolicy(plan, AcquisitionPostProbePolicy{
EnableDirectNVMEDiskBayProbe: true,
})
ensureRecoveryPolicy(plan, AcquisitionRecoveryPolicy{
EnableProfilePlanB: true,
})
ensureRatePolicy(plan, AcquisitionRatePolicy{
TargetP95LatencyMS: 1200,
ThrottleP95LatencyMS: 2400,
MinSnapshotWorkers: 2,
MinPrefetchWorkers: 1,
DisablePrefetchOnErrors: true,
})
addPlanNote(plan, "supermicro acquisition extensions enabled")
},
refineAcquisition: func(resolved *ResolvedAcquisitionPlan, _ DiscoveredResources, signals MatchSignals) {
for _, hint := range signals.ResourceHints {
if normalizePath(hint) != "/redfish/v1/UpdateService/Oem/Supermicro/FirmwareInventory" {
continue
}
addPlanPaths(&resolved.CriticalPaths, hint)
addPlanPaths(&resolved.Plan.CriticalPaths, hint)
addPlanPaths(&resolved.Plan.PlanBPaths, hint)
break
}
},
applyAnalysisDirectives: func(d *AnalysisDirectives, _ MatchSignals) {
d.EnableStorageEnclosureRecovery = true
},
refineAnalysis: func(plan *ResolvedAnalysisPlan, snapshot map[string]interface{}, _ DiscoveredResources, _ MatchSignals) {
if snapshotHasPathContaining(snapshot, "/chassis/nvmessd.") && snapshotHasPathContaining(snapshot, ".storagebackplane") {
plan.Directives.EnableSupermicroNVMeBackplane = true
addAnalysisNote(plan, "supermicro analysis enables NVMe backplane recovery from snapshot paths")
}
if snapshotHasStorageControllerHint(snapshot, "/storage/intelvroc", "/storage/ha-raid", "/storage/mrvl.ha-raid") {
plan.Directives.EnableKnownStorageControllerRecovery = true
addAnalysisStorageDriveCollections(plan,
"/Storage/IntelVROC/Drives",
"/Storage/IntelVROC/Controllers/1/Drives",
)
addAnalysisStorageVolumeCollections(plan,
"/Storage/IntelVROC/Volumes",
"/Storage/HA-RAID/Volumes",
"/Storage/MRVL.HA-RAID/Volumes",
)
addAnalysisNote(plan, "supermicro analysis enables known storage-controller recovery from snapshot paths")
}
},
}
}

View File

@@ -0,0 +1,55 @@
package redfishprofile
func xfusionProfile() Profile {
return staticProfile{
name: "xfusion",
priority: 20,
safeForFallback: true,
matchFn: func(s MatchSignals) int {
score := 0
if containsFold(s.ServiceRootVendor, "xfusion") {
score += 90
}
for _, ns := range s.OEMNamespaces {
if containsFold(ns, "xfusion") {
score += 20
break
}
}
if containsFold(s.SystemManufacturer, "xfusion") || containsFold(s.ChassisManufacturer, "xfusion") {
score += 40
}
return min(score, 100)
},
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
ensureSnapshotMaxDocuments(plan, 120000)
ensureSnapshotWorkers(plan, 4)
ensurePrefetchWorkers(plan, 4)
ensurePrefetchEnabled(plan, true)
ensureETABaseline(plan, AcquisitionETABaseline{
DiscoverySeconds: 10,
SnapshotSeconds: 90,
PrefetchSeconds: 20,
CriticalPlanBSeconds: 30,
ProfilePlanBSeconds: 20,
})
ensureRatePolicy(plan, AcquisitionRatePolicy{
TargetP95LatencyMS: 800,
ThrottleP95LatencyMS: 1800,
MinSnapshotWorkers: 2,
MinPrefetchWorkers: 1,
DisablePrefetchOnErrors: true,
})
addPlanNote(plan, "xfusion ibmc acquisition extensions enabled")
},
applyAnalysisDirectives: func(d *AnalysisDirectives, _ MatchSignals) {
d.EnableGenericGraphicsControllerDedup = true
},
refineAnalysis: func(plan *ResolvedAnalysisPlan, snapshot map[string]interface{}, discovered DiscoveredResources, _ MatchSignals) {
if snapshotHasGPUProcessor(snapshot, discovered.SystemPaths) {
plan.Directives.EnableProcessorGPUFallback = true
addAnalysisNote(plan, "xfusion analysis enables processor-gpu fallback from snapshot topology")
}
},
}
}

View File

@@ -0,0 +1,234 @@
package redfishprofile
import (
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
)
type staticProfile struct {
name string
priority int
safeForFallback bool
matchFn func(MatchSignals) int
extendAcquisition func(*AcquisitionPlan, MatchSignals)
refineAcquisition func(*ResolvedAcquisitionPlan, DiscoveredResources, MatchSignals)
applyAnalysisDirectives func(*AnalysisDirectives, MatchSignals)
refineAnalysis func(*ResolvedAnalysisPlan, map[string]interface{}, DiscoveredResources, MatchSignals)
postAnalyze func(*models.AnalysisResult, map[string]interface{}, MatchSignals)
}
func (p staticProfile) Name() string { return p.name }
func (p staticProfile) Priority() int { return p.priority }
func (p staticProfile) Match(signals MatchSignals) int { return p.matchFn(normalizeSignals(signals)) }
func (p staticProfile) SafeForFallback() bool { return p.safeForFallback }
func (p staticProfile) ExtendAcquisitionPlan(plan *AcquisitionPlan, signals MatchSignals) {
if p.extendAcquisition != nil {
p.extendAcquisition(plan, normalizeSignals(signals))
}
}
func (p staticProfile) RefineAcquisitionPlan(resolved *ResolvedAcquisitionPlan, discovered DiscoveredResources, signals MatchSignals) {
if p.refineAcquisition != nil {
p.refineAcquisition(resolved, discovered, normalizeSignals(signals))
}
}
func (p staticProfile) ApplyAnalysisDirectives(directives *AnalysisDirectives, signals MatchSignals) {
if p.applyAnalysisDirectives != nil {
p.applyAnalysisDirectives(directives, normalizeSignals(signals))
}
}
func (p staticProfile) RefineAnalysisPlan(plan *ResolvedAnalysisPlan, snapshot map[string]interface{}, discovered DiscoveredResources, signals MatchSignals) {
if p.refineAnalysis != nil {
p.refineAnalysis(plan, snapshot, discovered, normalizeSignals(signals))
}
}
func (p staticProfile) PostAnalyze(result *models.AnalysisResult, snapshot map[string]interface{}, signals MatchSignals) {
if p.postAnalyze != nil {
p.postAnalyze(result, snapshot, normalizeSignals(signals))
}
}
func BuiltinProfiles() []Profile {
return []Profile{
genericProfile(),
amiProfile(),
msiProfile(),
supermicroProfile(),
dellProfile(),
hpeProfile(),
inspurGroupOEMPlatformsProfile(),
hgxProfile(),
xfusionProfile(),
}
}
func containsFold(v, sub string) bool {
return strings.Contains(strings.ToLower(strings.TrimSpace(v)), strings.ToLower(strings.TrimSpace(sub)))
}
func addPlanPaths(dst *[]string, paths ...string) {
*dst = append(*dst, paths...)
*dst = dedupeSorted(*dst)
}
func addPlanNote(plan *AcquisitionPlan, note string) {
if strings.TrimSpace(note) == "" {
return
}
plan.Notes = append(plan.Notes, note)
plan.Notes = dedupeSorted(plan.Notes)
}
func addAnalysisNote(plan *ResolvedAnalysisPlan, note string) {
if plan == nil || strings.TrimSpace(note) == "" {
return
}
plan.Notes = append(plan.Notes, note)
plan.Notes = dedupeSorted(plan.Notes)
}
func addAnalysisLookupMode(plan *ResolvedAnalysisPlan, mode string) {
if plan == nil || strings.TrimSpace(mode) == "" {
return
}
plan.ProcessorGPUChassisLookupModes = dedupeSorted(append(plan.ProcessorGPUChassisLookupModes, mode))
}
func addAnalysisStorageDriveCollections(plan *ResolvedAnalysisPlan, rels ...string) {
if plan == nil {
return
}
plan.KnownStorageDriveCollections = dedupeSorted(append(plan.KnownStorageDriveCollections, rels...))
}
func addAnalysisStorageVolumeCollections(plan *ResolvedAnalysisPlan, rels ...string) {
if plan == nil {
return
}
plan.KnownStorageVolumeCollections = dedupeSorted(append(plan.KnownStorageVolumeCollections, rels...))
}
func ensureSnapshotMaxDocuments(plan *AcquisitionPlan, n int) {
if n <= 0 {
return
}
if plan.Tuning.SnapshotMaxDocuments < n {
plan.Tuning.SnapshotMaxDocuments = n
}
}
func ensureSnapshotWorkers(plan *AcquisitionPlan, n int) {
if n <= 0 {
return
}
if plan.Tuning.SnapshotWorkers < n {
plan.Tuning.SnapshotWorkers = n
}
}
func ensurePrefetchEnabled(plan *AcquisitionPlan, enabled bool) {
if plan.Tuning.PrefetchEnabled == nil {
plan.Tuning.PrefetchEnabled = new(bool)
}
*plan.Tuning.PrefetchEnabled = enabled
}
func ensurePrefetchWorkers(plan *AcquisitionPlan, n int) {
if n <= 0 {
return
}
if plan.Tuning.PrefetchWorkers < n {
plan.Tuning.PrefetchWorkers = n
}
}
func ensureNVMePostProbeEnabled(plan *AcquisitionPlan, enabled bool) {
if plan.Tuning.NVMePostProbeEnabled == nil {
plan.Tuning.NVMePostProbeEnabled = new(bool)
}
*plan.Tuning.NVMePostProbeEnabled = enabled
}
func ensureRatePolicy(plan *AcquisitionPlan, policy AcquisitionRatePolicy) {
if policy.TargetP95LatencyMS > plan.Tuning.RatePolicy.TargetP95LatencyMS {
plan.Tuning.RatePolicy.TargetP95LatencyMS = policy.TargetP95LatencyMS
}
if policy.ThrottleP95LatencyMS > plan.Tuning.RatePolicy.ThrottleP95LatencyMS {
plan.Tuning.RatePolicy.ThrottleP95LatencyMS = policy.ThrottleP95LatencyMS
}
if policy.MinSnapshotWorkers > plan.Tuning.RatePolicy.MinSnapshotWorkers {
plan.Tuning.RatePolicy.MinSnapshotWorkers = policy.MinSnapshotWorkers
}
if policy.MinPrefetchWorkers > plan.Tuning.RatePolicy.MinPrefetchWorkers {
plan.Tuning.RatePolicy.MinPrefetchWorkers = policy.MinPrefetchWorkers
}
if policy.DisablePrefetchOnErrors {
plan.Tuning.RatePolicy.DisablePrefetchOnErrors = true
}
}
func ensureETABaseline(plan *AcquisitionPlan, baseline AcquisitionETABaseline) {
if baseline.DiscoverySeconds > plan.Tuning.ETABaseline.DiscoverySeconds {
plan.Tuning.ETABaseline.DiscoverySeconds = baseline.DiscoverySeconds
}
if baseline.SnapshotSeconds > plan.Tuning.ETABaseline.SnapshotSeconds {
plan.Tuning.ETABaseline.SnapshotSeconds = baseline.SnapshotSeconds
}
if baseline.PrefetchSeconds > plan.Tuning.ETABaseline.PrefetchSeconds {
plan.Tuning.ETABaseline.PrefetchSeconds = baseline.PrefetchSeconds
}
if baseline.CriticalPlanBSeconds > plan.Tuning.ETABaseline.CriticalPlanBSeconds {
plan.Tuning.ETABaseline.CriticalPlanBSeconds = baseline.CriticalPlanBSeconds
}
if baseline.ProfilePlanBSeconds > plan.Tuning.ETABaseline.ProfilePlanBSeconds {
plan.Tuning.ETABaseline.ProfilePlanBSeconds = baseline.ProfilePlanBSeconds
}
}
func ensurePostProbePolicy(plan *AcquisitionPlan, policy AcquisitionPostProbePolicy) {
if policy.EnableDirectNVMEDiskBayProbe {
plan.Tuning.PostProbePolicy.EnableDirectNVMEDiskBayProbe = true
}
if policy.EnableNumericCollectionProbe {
plan.Tuning.PostProbePolicy.EnableNumericCollectionProbe = true
}
if policy.EnableSensorCollectionProbe {
plan.Tuning.PostProbePolicy.EnableSensorCollectionProbe = true
}
}
func ensureRecoveryPolicy(plan *AcquisitionPlan, policy AcquisitionRecoveryPolicy) {
if policy.EnableCriticalCollectionMemberRetry {
plan.Tuning.RecoveryPolicy.EnableCriticalCollectionMemberRetry = true
}
if policy.EnableCriticalSlowProbe {
plan.Tuning.RecoveryPolicy.EnableCriticalSlowProbe = true
}
if policy.EnableProfilePlanB {
plan.Tuning.RecoveryPolicy.EnableProfilePlanB = true
}
if policy.EnableEmptyCriticalCollectionRetry {
plan.Tuning.RecoveryPolicy.EnableEmptyCriticalCollectionRetry = true
}
}
func ensureScopedPathPolicy(plan *AcquisitionPlan, policy AcquisitionScopedPathPolicy) {
addPlanPaths(&plan.ScopedPaths.SystemSeedSuffixes, policy.SystemSeedSuffixes...)
addPlanPaths(&plan.ScopedPaths.SystemCriticalSuffixes, policy.SystemCriticalSuffixes...)
addPlanPaths(&plan.ScopedPaths.ChassisSeedSuffixes, policy.ChassisSeedSuffixes...)
addPlanPaths(&plan.ScopedPaths.ChassisCriticalSuffixes, policy.ChassisCriticalSuffixes...)
addPlanPaths(&plan.ScopedPaths.ManagerSeedSuffixes, policy.ManagerSeedSuffixes...)
addPlanPaths(&plan.ScopedPaths.ManagerCriticalSuffixes, policy.ManagerCriticalSuffixes...)
}
func ensurePrefetchPolicy(plan *AcquisitionPlan, policy AcquisitionPrefetchPolicy) {
addPlanPaths(&plan.Tuning.PrefetchPolicy.IncludeSuffixes, policy.IncludeSuffixes...)
addPlanPaths(&plan.Tuning.PrefetchPolicy.ExcludeContains, policy.ExcludeContains...)
}
func min(a, b int) int {
if a < b {
return a
}
return b
}

View File

@@ -0,0 +1,177 @@
package redfishprofile
import "strings"
func CollectSignals(serviceRootDoc, systemDoc, chassisDoc, managerDoc map[string]interface{}, resourceHints []string, hintDocs ...map[string]interface{}) MatchSignals {
resourceHints = append([]string{}, resourceHints...)
docHints := make([]string, 0)
for _, doc := range append([]map[string]interface{}{serviceRootDoc, systemDoc, chassisDoc, managerDoc}, hintDocs...) {
embeddedPaths, embeddedHints := collectDocSignalHints(doc)
resourceHints = append(resourceHints, embeddedPaths...)
docHints = append(docHints, embeddedHints...)
}
signals := MatchSignals{
ServiceRootVendor: lookupString(serviceRootDoc, "Vendor"),
ServiceRootProduct: lookupString(serviceRootDoc, "Product"),
SystemManufacturer: lookupString(systemDoc, "Manufacturer"),
SystemModel: lookupString(systemDoc, "Model"),
SystemSKU: lookupString(systemDoc, "SKU"),
ChassisManufacturer: lookupString(chassisDoc, "Manufacturer"),
ChassisModel: lookupString(chassisDoc, "Model"),
ManagerManufacturer: lookupString(managerDoc, "Manufacturer"),
ResourceHints: resourceHints,
DocHints: docHints,
}
signals.OEMNamespaces = dedupeSorted(append(
oemNamespaces(serviceRootDoc),
append(oemNamespaces(systemDoc), append(oemNamespaces(chassisDoc), oemNamespaces(managerDoc)...)...)...,
))
return normalizeSignals(signals)
}
func CollectSignalsFromTree(tree map[string]interface{}) MatchSignals {
getDoc := func(path string) map[string]interface{} {
if v, ok := tree[path]; ok {
if doc, ok := v.(map[string]interface{}); ok {
return doc
}
}
return nil
}
memberPath := func(collectionPath, fallbackPath string) string {
collection := getDoc(collectionPath)
if len(collection) != 0 {
if members, ok := collection["Members"].([]interface{}); ok && len(members) > 0 {
if ref, ok := members[0].(map[string]interface{}); ok {
if path := lookupString(ref, "@odata.id"); path != "" {
return path
}
}
}
}
return fallbackPath
}
systemPath := memberPath("/redfish/v1/Systems", "/redfish/v1/Systems/1")
chassisPath := memberPath("/redfish/v1/Chassis", "/redfish/v1/Chassis/1")
managerPath := memberPath("/redfish/v1/Managers", "/redfish/v1/Managers/1")
resourceHints := make([]string, 0, len(tree))
hintDocs := make([]map[string]interface{}, 0, len(tree))
for path := range tree {
path = strings.TrimSpace(path)
if path == "" {
continue
}
resourceHints = append(resourceHints, path)
}
for _, v := range tree {
doc, ok := v.(map[string]interface{})
if !ok {
continue
}
hintDocs = append(hintDocs, doc)
}
return CollectSignals(
getDoc("/redfish/v1"),
getDoc(systemPath),
getDoc(chassisPath),
getDoc(managerPath),
resourceHints,
hintDocs...,
)
}
func collectDocSignalHints(doc map[string]interface{}) ([]string, []string) {
if len(doc) == 0 {
return nil, nil
}
paths := make([]string, 0)
hints := make([]string, 0)
var walk func(any)
walk = func(v any) {
switch x := v.(type) {
case map[string]interface{}:
for rawKey, child := range x {
key := strings.TrimSpace(rawKey)
if key != "" {
hints = append(hints, key)
}
if s, ok := child.(string); ok {
s = strings.TrimSpace(s)
if s != "" {
switch key {
case "@odata.id", "target":
paths = append(paths, s)
case "@odata.type":
hints = append(hints, s)
default:
if isInterestingSignalString(s) {
hints = append(hints, s)
if strings.HasPrefix(s, "/") {
paths = append(paths, s)
}
}
}
}
}
walk(child)
}
case []interface{}:
for _, child := range x {
walk(child)
}
}
}
walk(doc)
return paths, hints
}
func isInterestingSignalString(s string) bool {
switch {
case strings.HasPrefix(s, "/"):
return true
case strings.HasPrefix(s, "#"):
return true
case strings.Contains(s, "COMMONb"):
return true
case strings.Contains(s, "EnvironmentMetrcs"):
return true
case strings.Contains(s, "GetServerAllUSBStatus"):
return true
default:
return false
}
}
func lookupString(doc map[string]interface{}, key string) string {
if len(doc) == 0 {
return ""
}
value, _ := doc[key]
if s, ok := value.(string); ok {
return strings.TrimSpace(s)
}
return ""
}
func oemNamespaces(doc map[string]interface{}) []string {
if len(doc) == 0 {
return nil
}
oem, ok := doc["Oem"].(map[string]interface{})
if !ok {
return nil
}
out := make([]string, 0, len(oem))
for key := range oem {
key = strings.TrimSpace(key)
if key == "" {
continue
}
out = append(out, key)
}
return out
}

View File

@@ -0,0 +1,17 @@
{
"ServiceRootVendor": "AMI",
"ServiceRootProduct": "AMI Redfish Server",
"SystemManufacturer": "Gigabyte",
"SystemModel": "G292-Z42",
"SystemSKU": "",
"ChassisManufacturer": "",
"ChassisModel": "",
"ManagerManufacturer": "",
"OEMNamespaces": ["Ami"],
"ResourceHints": [
"/redfish/v1/Chassis/Self",
"/redfish/v1/Managers/Self",
"/redfish/v1/Oem/Ami",
"/redfish/v1/Systems/Self"
]
}

View File

@@ -0,0 +1,18 @@
{
"ServiceRootVendor": "",
"ServiceRootProduct": "iDRAC Redfish Service",
"SystemManufacturer": "Dell Inc.",
"SystemModel": "PowerEdge R750",
"SystemSKU": "0A42H9",
"ChassisManufacturer": "Dell Inc.",
"ChassisModel": "PowerEdge R750",
"ManagerManufacturer": "Dell Inc.",
"OEMNamespaces": ["Dell"],
"ResourceHints": [
"/redfish/v1/Chassis/System.Embedded.1",
"/redfish/v1/Managers/iDRAC.Embedded.1",
"/redfish/v1/Managers/iDRAC.Embedded.1/Oem/Dell",
"/redfish/v1/Systems/System.Embedded.1",
"/redfish/v1/Systems/System.Embedded.1/Storage"
]
}

View File

@@ -0,0 +1,33 @@
{
"ServiceRootVendor": "AMI",
"ServiceRootProduct": "AMI Redfish Server",
"SystemManufacturer": "Micro-Star International Co., Ltd.",
"SystemModel": "CG290-S3063",
"SystemSKU": "S3063G290RAU4",
"ChassisManufacturer": "NVIDIA",
"ChassisModel": "",
"ManagerManufacturer": "",
"OEMNamespaces": ["Ami"],
"ResourceHints": [
"/redfish/v1/Chassis/GPU1",
"/redfish/v1/Chassis/GPU1/NetworkAdapters",
"/redfish/v1/Chassis/GPU1/Sensors",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_Power",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_TLimit",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_Temperature",
"/redfish/v1/Chassis/GPU2",
"/redfish/v1/Chassis/GPU2/NetworkAdapters",
"/redfish/v1/Chassis/GPU2/Sensors",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_Power",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_TLimit",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_Temperature",
"/redfish/v1/Chassis/GPU3",
"/redfish/v1/Chassis/GPU3/NetworkAdapters",
"/redfish/v1/Chassis/GPU3/Sensors",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_Power",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_TLimit",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_Temperature",
"/redfish/v1/Chassis/GPU4",
"/redfish/v1/Chassis/GPU4/NetworkAdapters"
]
}

View File

@@ -0,0 +1,33 @@
{
"ServiceRootVendor": "AMI",
"ServiceRootProduct": "AMI Redfish Server",
"SystemManufacturer": "Micro-Star International Co., Ltd.",
"SystemModel": "CG480-S5063",
"SystemSKU": "5063G480RAE20",
"ChassisManufacturer": "NVIDIA",
"ChassisModel": "",
"ManagerManufacturer": "",
"OEMNamespaces": ["Ami"],
"ResourceHints": [
"/redfish/v1/Chassis/GPU1",
"/redfish/v1/Chassis/GPU1/NetworkAdapters",
"/redfish/v1/Chassis/GPU1/Sensors",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_Power",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_TLimit",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_Temperature",
"/redfish/v1/Chassis/GPU2",
"/redfish/v1/Chassis/GPU2/NetworkAdapters",
"/redfish/v1/Chassis/GPU2/Sensors",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_Power",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_TLimit",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_Temperature",
"/redfish/v1/Chassis/GPU3",
"/redfish/v1/Chassis/GPU3/NetworkAdapters",
"/redfish/v1/Chassis/GPU3/Sensors",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_Power",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_TLimit",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_Temperature",
"/redfish/v1/Chassis/GPU4",
"/redfish/v1/Chassis/GPU4/NetworkAdapters"
]
}

View File

@@ -0,0 +1,33 @@
{
"ServiceRootVendor": "AMI",
"ServiceRootProduct": "AMI Redfish Server",
"SystemManufacturer": "Micro-Star International Co., Ltd.",
"SystemModel": "CG480-S5063",
"SystemSKU": "5063G480RAE20",
"ChassisManufacturer": "NVIDIA",
"ChassisModel": "",
"ManagerManufacturer": "",
"OEMNamespaces": ["Ami"],
"ResourceHints": [
"/redfish/v1/Chassis/GPU1",
"/redfish/v1/Chassis/GPU1/NetworkAdapters",
"/redfish/v1/Chassis/GPU1/Sensors",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_Power",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_TLimit",
"/redfish/v1/Chassis/GPU1/Sensors/GPU1_Temperature",
"/redfish/v1/Chassis/GPU2",
"/redfish/v1/Chassis/GPU2/NetworkAdapters",
"/redfish/v1/Chassis/GPU2/Sensors",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_Power",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_TLimit",
"/redfish/v1/Chassis/GPU2/Sensors/GPU2_Temperature",
"/redfish/v1/Chassis/GPU3",
"/redfish/v1/Chassis/GPU3/NetworkAdapters",
"/redfish/v1/Chassis/GPU3/Sensors",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_Power",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_TLimit",
"/redfish/v1/Chassis/GPU3/Sensors/GPU3_Temperature",
"/redfish/v1/Chassis/GPU4",
"/redfish/v1/Chassis/GPU4/NetworkAdapters"
]
}

View File

@@ -0,0 +1,33 @@
{
"ServiceRootVendor": "Supermicro",
"ServiceRootProduct": "",
"SystemManufacturer": "Supermicro",
"SystemModel": "SYS-821GE-TNHR",
"SystemSKU": "0x1D1415D9",
"ChassisManufacturer": "Supermicro",
"ChassisModel": "X13DEG-OAD",
"ManagerManufacturer": "",
"OEMNamespaces": ["Supermicro"],
"ResourceHints": [
"/redfish/v1/Chassis/HGX_BMC_0",
"/redfish/v1/Chassis/HGX_BMC_0/Assembly",
"/redfish/v1/Chassis/HGX_BMC_0/Controls",
"/redfish/v1/Chassis/HGX_BMC_0/Drives",
"/redfish/v1/Chassis/HGX_BMC_0/EnvironmentMetrics",
"/redfish/v1/Chassis/HGX_BMC_0/LogServices",
"/redfish/v1/Chassis/HGX_BMC_0/PCIeDevices",
"/redfish/v1/Chassis/HGX_BMC_0/PCIeSlots",
"/redfish/v1/Chassis/HGX_BMC_0/PowerSubsystem",
"/redfish/v1/Chassis/HGX_BMC_0/PowerSubsystem/PowerSupplies",
"/redfish/v1/Chassis/HGX_BMC_0/Sensors",
"/redfish/v1/Chassis/HGX_BMC_0/Sensors/HGX_BMC_0_Temp_0",
"/redfish/v1/Chassis/HGX_BMC_0/ThermalSubsystem",
"/redfish/v1/Chassis/HGX_BMC_0/ThermalSubsystem/ThermalMetrics",
"/redfish/v1/Chassis/HGX_Chassis_0",
"/redfish/v1/Chassis/HGX_Chassis_0/Assembly",
"/redfish/v1/Chassis/HGX_Chassis_0/Controls",
"/redfish/v1/Chassis/HGX_Chassis_0/Controls/TotalGPU_Power_0",
"/redfish/v1/Chassis/HGX_Chassis_0/Drives",
"/redfish/v1/Chassis/HGX_Chassis_0/EnvironmentMetrics"
]
}

View File

@@ -0,0 +1,51 @@
{
"ServiceRootVendor": "",
"ServiceRootProduct": "H12DGQ-NT6",
"SystemManufacturer": "Supermicro",
"SystemModel": "AS -4124GQ-TNMI",
"SystemSKU": "091715D9",
"ChassisManufacturer": "Supermicro",
"ChassisModel": "H12DGQ-NT6",
"ManagerManufacturer": "",
"OEMNamespaces": [
"Supermicro"
],
"ResourceHints": [
"/redfish/v1/Chassis/1/PCIeDevices",
"/redfish/v1/Chassis/1/PCIeDevices/GPU1",
"/redfish/v1/Chassis/1/PCIeDevices/GPU1/PCIeFunctions",
"/redfish/v1/Chassis/1/PCIeDevices/GPU1/PCIeFunctions/1",
"/redfish/v1/Chassis/1/PCIeDevices/GPU2",
"/redfish/v1/Chassis/1/PCIeDevices/GPU2/PCIeFunctions",
"/redfish/v1/Chassis/1/PCIeDevices/GPU2/PCIeFunctions/1",
"/redfish/v1/Chassis/1/PCIeDevices/GPU3",
"/redfish/v1/Chassis/1/PCIeDevices/GPU3/PCIeFunctions",
"/redfish/v1/Chassis/1/PCIeDevices/GPU3/PCIeFunctions/1",
"/redfish/v1/Chassis/1/PCIeDevices/GPU4",
"/redfish/v1/Chassis/1/PCIeDevices/GPU4/PCIeFunctions",
"/redfish/v1/Chassis/1/PCIeDevices/GPU4/PCIeFunctions/1",
"/redfish/v1/Chassis/1/PCIeDevices/GPU5",
"/redfish/v1/Chassis/1/PCIeDevices/GPU5/PCIeFunctions",
"/redfish/v1/Chassis/1/PCIeDevices/GPU5/PCIeFunctions/1",
"/redfish/v1/Chassis/1/PCIeDevices/GPU6",
"/redfish/v1/Chassis/1/PCIeDevices/GPU6/PCIeFunctions",
"/redfish/v1/Chassis/1/PCIeDevices/GPU6/PCIeFunctions/1",
"/redfish/v1/Chassis/1/PCIeDevices/GPU7",
"/redfish/v1/Chassis/1/PCIeDevices/GPU7/PCIeFunctions",
"/redfish/v1/Chassis/1/PCIeDevices/GPU7/PCIeFunctions/1",
"/redfish/v1/Chassis/1/PCIeDevices/GPU8",
"/redfish/v1/Chassis/1/PCIeDevices/GPU8/PCIeFunctions",
"/redfish/v1/Chassis/1/PCIeDevices/GPU8/PCIeFunctions/1",
"/redfish/v1/Managers/1/Oem/Supermicro/FanMode",
"/redfish/v1/Oem/Supermicro/DumpService",
"/redfish/v1/UpdateService/FirmwareInventory/GPU1",
"/redfish/v1/UpdateService/FirmwareInventory/GPU2",
"/redfish/v1/UpdateService/FirmwareInventory/GPU3",
"/redfish/v1/UpdateService/FirmwareInventory/GPU4",
"/redfish/v1/UpdateService/FirmwareInventory/GPU5",
"/redfish/v1/UpdateService/FirmwareInventory/GPU6",
"/redfish/v1/UpdateService/FirmwareInventory/GPU7",
"/redfish/v1/UpdateService/FirmwareInventory/GPU8",
"/redfish/v1/UpdateService/Oem/Supermicro/FirmwareInventory"
]
}

View File

@@ -0,0 +1,16 @@
{
"ServiceRootVendor": "",
"ServiceRootProduct": "Redfish Service",
"SystemManufacturer": "",
"SystemModel": "",
"SystemSKU": "",
"ChassisManufacturer": "",
"ChassisModel": "",
"ManagerManufacturer": "",
"OEMNamespaces": [],
"ResourceHints": [
"/redfish/v1/Chassis/1",
"/redfish/v1/Managers/1",
"/redfish/v1/Systems/1"
]
}

View File

@@ -0,0 +1,24 @@
{
"ServiceRootVendor": "xFusion",
"ServiceRootProduct": "G5500 V7",
"SystemManufacturer": "OEM",
"SystemModel": "G5500 V7",
"SystemSKU": "",
"ChassisManufacturer": "OEM",
"ChassisModel": "G5500 V7",
"ManagerManufacturer": "XFUSION",
"OEMNamespaces": ["xFusion"],
"ResourceHints": [
"/redfish/v1/Chassis/1",
"/redfish/v1/Chassis/1/Drives",
"/redfish/v1/Chassis/1/PCIeDevices",
"/redfish/v1/Chassis/1/Sensors",
"/redfish/v1/Managers/1",
"/redfish/v1/Systems/1",
"/redfish/v1/Systems/1/GraphicsControllers",
"/redfish/v1/Systems/1/Processors",
"/redfish/v1/Systems/1/Processors/Gpu1",
"/redfish/v1/Systems/1/Storages",
"/redfish/v1/UpdateService/FirmwareInventory"
]
}

View File

@@ -0,0 +1,171 @@
package redfishprofile
import (
"sort"
"git.mchus.pro/mchus/logpile/internal/models"
)
type MatchSignals struct {
ServiceRootVendor string
ServiceRootProduct string
SystemManufacturer string
SystemModel string
SystemSKU string
ChassisManufacturer string
ChassisModel string
ManagerManufacturer string
OEMNamespaces []string
ResourceHints []string
DocHints []string
}
type AcquisitionPlan struct {
Mode string
Profiles []string
SeedPaths []string
CriticalPaths []string
PlanBPaths []string
Notes []string
ScopedPaths AcquisitionScopedPathPolicy
Tuning AcquisitionTuning
}
type DiscoveredResources struct {
SystemPaths []string
ChassisPaths []string
ManagerPaths []string
}
type ResolvedAcquisitionPlan struct {
Plan AcquisitionPlan
SeedPaths []string
CriticalPaths []string
}
type AcquisitionScopedPathPolicy struct {
SystemSeedSuffixes []string
SystemCriticalSuffixes []string
ChassisSeedSuffixes []string
ChassisCriticalSuffixes []string
ManagerSeedSuffixes []string
ManagerCriticalSuffixes []string
}
type AcquisitionTuning struct {
SnapshotMaxDocuments int
SnapshotWorkers int
PrefetchEnabled *bool
PrefetchWorkers int
NVMePostProbeEnabled *bool
RatePolicy AcquisitionRatePolicy
ETABaseline AcquisitionETABaseline
PostProbePolicy AcquisitionPostProbePolicy
RecoveryPolicy AcquisitionRecoveryPolicy
PrefetchPolicy AcquisitionPrefetchPolicy
}
type AcquisitionRatePolicy struct {
TargetP95LatencyMS int
ThrottleP95LatencyMS int
MinSnapshotWorkers int
MinPrefetchWorkers int
DisablePrefetchOnErrors bool
}
type AcquisitionETABaseline struct {
DiscoverySeconds int
SnapshotSeconds int
PrefetchSeconds int
CriticalPlanBSeconds int
ProfilePlanBSeconds int
}
type AcquisitionPostProbePolicy struct {
EnableDirectNVMEDiskBayProbe bool
EnableNumericCollectionProbe bool
EnableSensorCollectionProbe bool
}
type AcquisitionRecoveryPolicy struct {
EnableCriticalCollectionMemberRetry bool
EnableCriticalSlowProbe bool
EnableProfilePlanB bool
EnableEmptyCriticalCollectionRetry bool
}
type AcquisitionPrefetchPolicy struct {
IncludeSuffixes []string
ExcludeContains []string
}
type AnalysisDirectives struct {
EnableProcessorGPUFallback bool
EnableSupermicroNVMeBackplane bool
EnableProcessorGPUChassisAlias bool
EnableGenericGraphicsControllerDedup bool
EnableMSIProcessorGPUChassisLookup bool
EnableMSIGhostGPUFilter bool
EnableStorageEnclosureRecovery bool
EnableKnownStorageControllerRecovery bool
}
type ResolvedAnalysisPlan struct {
Match MatchResult
Directives AnalysisDirectives
Notes []string
ProcessorGPUChassisLookupModes []string
KnownStorageDriveCollections []string
KnownStorageVolumeCollections []string
}
type Profile interface {
Name() string
Priority() int
Match(signals MatchSignals) int
SafeForFallback() bool
ExtendAcquisitionPlan(plan *AcquisitionPlan, signals MatchSignals)
RefineAcquisitionPlan(resolved *ResolvedAcquisitionPlan, discovered DiscoveredResources, signals MatchSignals)
ApplyAnalysisDirectives(directives *AnalysisDirectives, signals MatchSignals)
RefineAnalysisPlan(plan *ResolvedAnalysisPlan, snapshot map[string]interface{}, discovered DiscoveredResources, signals MatchSignals)
PostAnalyze(result *models.AnalysisResult, snapshot map[string]interface{}, signals MatchSignals)
}
type MatchResult struct {
Mode string
Profiles []Profile
Scores []ProfileScore
}
type ProfileScore struct {
Name string
Score int
Active bool
Priority int
}
func normalizeSignals(signals MatchSignals) MatchSignals {
signals.OEMNamespaces = dedupeSorted(signals.OEMNamespaces)
signals.ResourceHints = dedupeSorted(signals.ResourceHints)
signals.DocHints = dedupeSorted(signals.DocHints)
return signals
}
func dedupeSorted(items []string) []string {
if len(items) == 0 {
return nil
}
set := make(map[string]struct{}, len(items))
for _, item := range items {
if item == "" {
continue
}
set[item] = struct{}{}
}
out := make([]string, 0, len(set))
for item := range set {
out = append(out, item)
}
sort.Strings(out)
return out
}

View File

@@ -7,25 +7,73 @@ import (
)
type Request struct {
Host string
Protocol string
Port int
Username string
AuthType string
Password string
Token string
TLSMode string
Host string
Protocol string
Port int
Username string
AuthType string
Password string
Token string
TLSMode string
DebugPayloads bool
SkipHungCh <-chan struct{}
}
type Progress struct {
Status string
Progress int
Message string
Status string
Progress int
Message string
CurrentPhase string
ETASeconds int
ActiveModules []ModuleActivation
ModuleScores []ModuleScore
DebugInfo *CollectDebugInfo
}
type ProgressFn func(Progress)
type ModuleActivation struct {
Name string
Score int
}
type ModuleScore struct {
Name string
Score int
Active bool
Priority int
}
type CollectDebugInfo struct {
AdaptiveThrottled bool
SnapshotWorkers int
PrefetchWorkers int
PrefetchEnabled *bool
PhaseTelemetry []PhaseTelemetry
}
type PhaseTelemetry struct {
Phase string
Requests int
Errors int
ErrorRate float64
AvgMS int64
P95MS int64
}
type ProbeResult struct {
Reachable bool
Protocol string
HostPowerState string
HostPoweredOn bool
SystemPath string
}
type Connector interface {
Protocol() string
Collect(ctx context.Context, req Request, emit ProgressFn) (*models.AnalysisResult, error)
}
type Prober interface {
Probe(ctx context.Context, req Request) (*ProbeResult, error)
}

View File

@@ -66,104 +66,15 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
}
}
// CPUs
for _, cpu := range e.result.Hardware.CPUs {
if !hasUsableSerial(cpu.SerialNumber) {
seenCanonical := make(map[string]struct{})
for _, dev := range canonicalDevicesForExport(e.result.Hardware) {
if !hasUsableSerial(dev.SerialNumber) {
continue
}
if err := writer.Write([]string{
cpu.Model,
strings.TrimSpace(cpu.SerialNumber),
"",
"CPU",
}); err != nil {
return err
}
}
// Memory
for _, mem := range e.result.Hardware.Memory {
if !hasUsableSerial(mem.SerialNumber) {
continue
}
location := mem.Location
if location == "" {
location = mem.Slot
}
if err := writer.Write([]string{
mem.PartNumber,
strings.TrimSpace(mem.SerialNumber),
mem.Manufacturer,
location,
}); err != nil {
return err
}
}
// Storage
for _, stor := range e.result.Hardware.Storage {
if !hasUsableSerial(stor.SerialNumber) {
continue
}
if err := writer.Write([]string{
stor.Model,
strings.TrimSpace(stor.SerialNumber),
stor.Manufacturer,
stor.Slot,
}); err != nil {
return err
}
}
// GPUs
for _, gpu := range e.result.Hardware.GPUs {
if !hasUsableSerial(gpu.SerialNumber) {
continue
}
component := gpu.Model
if component == "" {
component = "GPU"
}
if err := writer.Write([]string{
component,
strings.TrimSpace(gpu.SerialNumber),
gpu.Manufacturer,
gpu.Slot,
}); err != nil {
return err
}
}
// PCIe devices
for _, pcie := range e.result.Hardware.PCIeDevices {
if !hasUsableSerial(pcie.SerialNumber) {
continue
}
if err := writer.Write([]string{
pcie.DeviceClass,
strings.TrimSpace(pcie.SerialNumber),
pcie.Manufacturer,
pcie.Slot,
}); err != nil {
return err
}
}
// Network adapters
for _, nic := range e.result.Hardware.NetworkAdapters {
if !hasUsableSerial(nic.SerialNumber) {
continue
}
location := nic.Location
if location == "" {
location = nic.Slot
}
if err := writer.Write([]string{
nic.Model,
strings.TrimSpace(nic.SerialNumber),
nic.Vendor,
location,
}); err != nil {
serial := strings.TrimSpace(dev.SerialNumber)
seenCanonical[serial] = struct{}{}
component, manufacturer, location := csvFieldsFromCanonicalDevice(dev)
if err := writer.Write([]string{component, serial, manufacturer, location}); err != nil {
return err
}
}
@@ -173,26 +84,15 @@ func (e *Exporter) ExportCSV(w io.Writer) error {
if !hasUsableSerial(nic.SerialNumber) {
continue
}
if err := writer.Write([]string{
nic.Model,
strings.TrimSpace(nic.SerialNumber),
"",
"Network",
}); err != nil {
return err
}
}
// Power supplies
for _, psu := range e.result.Hardware.PowerSupply {
if !hasUsableSerial(psu.SerialNumber) {
serial := strings.TrimSpace(nic.SerialNumber)
if _, ok := seenCanonical[serial]; ok {
continue
}
if err := writer.Write([]string{
psu.Model,
strings.TrimSpace(psu.SerialNumber),
psu.Vendor,
psu.Slot,
nic.Model,
serial,
"",
"Network",
}); err != nil {
return err
}
@@ -221,3 +121,52 @@ func hasUsableSerial(serial string) bool {
return true
}
}
func csvFieldsFromCanonicalDevice(dev models.HardwareDevice) (component, manufacturer, location string) {
component = firstNonEmptyString(
dev.Model,
dev.PartNumber,
dev.DeviceClass,
dev.Kind,
)
manufacturer = firstNonEmptyString(dev.Manufacturer, inferCSVVendor(dev))
location = firstNonEmptyString(dev.Location, dev.Slot, dev.BDF, dev.Kind)
switch dev.Kind {
case models.DeviceKindCPU:
if component == "" {
component = "CPU"
}
if location == "" {
location = "CPU"
}
case models.DeviceKindMemory:
component = firstNonEmptyString(dev.PartNumber, dev.Model, "Memory")
case models.DeviceKindPCIe, models.DeviceKindGPU, models.DeviceKindNetwork:
if location == "" {
location = firstNonEmptyString(dev.Slot, dev.BDF, "PCIe")
}
case models.DeviceKindPSU:
component = firstNonEmptyString(dev.Model, "Power Supply")
}
return component, manufacturer, location
}
func inferCSVVendor(dev models.HardwareDevice) string {
switch dev.Kind {
case models.DeviceKindCPU:
return ""
default:
return ""
}
}
func firstNonEmptyString(values ...string) string {
for _, value := range values {
if strings.TrimSpace(value) != "" {
return strings.TrimSpace(value)
}
}
return ""
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -19,6 +19,8 @@ type ReanimatorHardware struct {
Storage []ReanimatorStorage `json:"storage,omitempty"`
PCIeDevices []ReanimatorPCIe `json:"pcie_devices,omitempty"`
PowerSupplies []ReanimatorPSU `json:"power_supplies,omitempty"`
Sensors *ReanimatorSensors `json:"sensors,omitempty"`
EventLogs []ReanimatorEventLog `json:"event_logs,omitempty"`
}
// ReanimatorBoard represents motherboard/server information
@@ -36,11 +38,6 @@ type ReanimatorFirmware struct {
Version string `json:"version"`
}
type ReanimatorStatusAtCollection struct {
Status string `json:"status"`
At string `json:"at"`
}
type ReanimatorStatusHistoryEntry struct {
Status string `json:"status"`
ChangedAt string `json:"changed_at"`
@@ -49,105 +46,209 @@ type ReanimatorStatusHistoryEntry struct {
// ReanimatorCPU represents processor information
type ReanimatorCPU struct {
Socket int `json:"socket"`
Model string `json:"model"`
Cores int `json:"cores,omitempty"`
Threads int `json:"threads,omitempty"`
FrequencyMHz int `json:"frequency_mhz,omitempty"`
MaxFrequencyMHz int `json:"max_frequency_mhz,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Socket int `json:"socket"`
Model string `json:"model,omitempty"`
Cores int `json:"cores,omitempty"`
Threads int `json:"threads,omitempty"`
FrequencyMHz int `json:"frequency_mhz,omitempty"`
MaxFrequencyMHz int `json:"max_frequency_mhz,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
PowerW float64 `json:"power_w,omitempty"`
Throttled *bool `json:"throttled,omitempty"`
CorrectableErrorCount int64 `json:"correctable_error_count,omitempty"`
UncorrectableErrorCount int64 `json:"uncorrectable_error_count,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Present *bool `json:"present,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorMemory represents a memory module (DIMM)
type ReanimatorMemory struct {
Slot string `json:"slot"`
Location string `json:"location,omitempty"`
Present bool `json:"present"`
SizeMB int `json:"size_mb,omitempty"`
Type string `json:"type,omitempty"`
MaxSpeedMHz int `json:"max_speed_mhz,omitempty"`
CurrentSpeedMHz int `json:"current_speed_mhz,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Slot string `json:"slot"`
Location string `json:"location,omitempty"`
Present *bool `json:"present,omitempty"`
SizeMB int `json:"size_mb,omitempty"`
Type string `json:"type,omitempty"`
MaxSpeedMHz int `json:"max_speed_mhz,omitempty"`
CurrentSpeedMHz int `json:"current_speed_mhz,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
CorrectableECCErrorCount int64 `json:"correctable_ecc_error_count,omitempty"`
UncorrectableECCErrorCount int64 `json:"uncorrectable_ecc_error_count,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
SpareBlocksRemainingPct float64 `json:"spare_blocks_remaining_pct,omitempty"`
PerformanceDegraded *bool `json:"performance_degraded,omitempty"`
DataLossDetected *bool `json:"data_loss_detected,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorStorage represents a storage device
type ReanimatorStorage struct {
Slot string `json:"slot"`
Type string `json:"type,omitempty"`
Model string `json:"model"`
SizeGB int `json:"size_gb,omitempty"`
SerialNumber string `json:"serial_number"`
Manufacturer string `json:"manufacturer,omitempty"`
Firmware string `json:"firmware,omitempty"`
Interface string `json:"interface,omitempty"`
Present bool `json:"present"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Slot string `json:"slot"`
Type string `json:"type,omitempty"`
Model string `json:"model"`
SizeGB int `json:"size_gb,omitempty"`
SerialNumber string `json:"serial_number"`
Manufacturer string `json:"manufacturer,omitempty"`
Firmware string `json:"firmware,omitempty"`
Interface string `json:"interface,omitempty"`
Present *bool `json:"present,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
PowerOnHours int64 `json:"power_on_hours,omitempty"`
PowerCycles int64 `json:"power_cycles,omitempty"`
UnsafeShutdowns int64 `json:"unsafe_shutdowns,omitempty"`
MediaErrors int64 `json:"media_errors,omitempty"`
ErrorLogEntries int64 `json:"error_log_entries,omitempty"`
WrittenBytes int64 `json:"written_bytes,omitempty"`
ReadBytes int64 `json:"read_bytes,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
RemainingEndurancePct *int `json:"remaining_endurance_pct,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
AvailableSparePct float64 `json:"available_spare_pct,omitempty"`
ReallocatedSectors int64 `json:"reallocated_sectors,omitempty"`
CurrentPendingSectors int64 `json:"current_pending_sectors,omitempty"`
OfflineUncorrectable int64 `json:"offline_uncorrectable,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorPCIe represents a PCIe device
type ReanimatorPCIe struct {
Slot string `json:"slot"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
BDF string `json:"bdf,omitempty"`
DeviceClass string `json:"device_class,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Model string `json:"model,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
PowerW int `json:"power_w,omitempty"`
VoltageV float64 `json:"voltage_v,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Slot string `json:"slot"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
NUMANode int `json:"numa_node,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
PowerW float64 `json:"power_w,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
ECCCorrectedTotal int64 `json:"ecc_corrected_total,omitempty"`
ECCUncorrectedTotal int64 `json:"ecc_uncorrected_total,omitempty"`
HWSlowdown *bool `json:"hw_slowdown,omitempty"`
BatteryChargePct float64 `json:"battery_charge_pct,omitempty"`
BatteryHealthPct float64 `json:"battery_health_pct,omitempty"`
BatteryTemperatureC float64 `json:"battery_temperature_c,omitempty"`
BatteryVoltageV float64 `json:"battery_voltage_v,omitempty"`
BatteryReplaceRequired *bool `json:"battery_replace_required,omitempty"`
SFPTemperatureC float64 `json:"sfp_temperature_c,omitempty"`
SFPTXPowerDBm float64 `json:"sfp_tx_power_dbm,omitempty"`
SFPRXPowerDBm float64 `json:"sfp_rx_power_dbm,omitempty"`
SFPVoltageV float64 `json:"sfp_voltage_v,omitempty"`
SFPBiasMA float64 `json:"sfp_bias_ma,omitempty"`
BDF string `json:"-"`
DeviceClass string `json:"device_class,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Model string `json:"model,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
Present *bool `json:"present,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
// ReanimatorPSU represents a power supply unit
type ReanimatorPSU struct {
Slot string `json:"slot"`
Present bool `json:"present"`
Model string `json:"model,omitempty"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
StatusAtCollect *ReanimatorStatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Slot string `json:"slot"`
Present *bool `json:"present,omitempty"`
Model string `json:"model,omitempty"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW float64 `json:"input_power_w,omitempty"`
OutputPowerW float64 `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
TemperatureC float64 `json:"temperature_c,omitempty"`
LifeRemainingPct float64 `json:"life_remaining_pct,omitempty"`
LifeUsedPct float64 `json:"life_used_pct,omitempty"`
StatusCheckedAt string `json:"status_checked_at,omitempty"`
StatusChangedAt string `json:"status_changed_at,omitempty"`
ManufacturedYearWeek string `json:"manufactured_year_week,omitempty"`
StatusHistory []ReanimatorStatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
}
type ReanimatorEventLog struct {
Source string `json:"source"`
EventTime string `json:"event_time,omitempty"`
Severity string `json:"severity,omitempty"`
MessageID string `json:"message_id,omitempty"`
Message string `json:"message"`
ComponentRef string `json:"component_ref,omitempty"`
Fingerprint string `json:"fingerprint,omitempty"`
IsActive *bool `json:"is_active,omitempty"`
RawPayload map[string]any `json:"raw_payload,omitempty"`
}
type ReanimatorSensors struct {
Fans []ReanimatorFanSensor `json:"fans,omitempty"`
Power []ReanimatorPowerSensor `json:"power,omitempty"`
Temperatures []ReanimatorTemperatureSensor `json:"temperatures,omitempty"`
Other []ReanimatorOtherSensor `json:"other,omitempty"`
}
type ReanimatorFanSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
RPM int `json:"rpm,omitempty"`
Status string `json:"status,omitempty"`
}
type ReanimatorPowerSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
VoltageV float64 `json:"voltage_v,omitempty"`
CurrentA float64 `json:"current_a,omitempty"`
PowerW float64 `json:"power_w,omitempty"`
Status string `json:"status,omitempty"`
}
type ReanimatorTemperatureSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
Celsius float64 `json:"celsius,omitempty"`
ThresholdWarningCelsius float64 `json:"threshold_warning_celsius,omitempty"`
ThresholdCriticalCelsius float64 `json:"threshold_critical_celsius,omitempty"`
Status string `json:"status,omitempty"`
}
type ReanimatorOtherSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
Value float64 `json:"value,omitempty"`
Unit string `json:"unit,omitempty"`
Status string `json:"status,omitempty"`
}

View File

@@ -0,0 +1,63 @@
package ingest
import (
"bytes"
"fmt"
"strings"
"git.mchus.pro/mchus/logpile/internal/collector"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
type Service struct{}
type RedfishSourceMetadata struct {
TargetHost string
SourceTimezone string
Filename string
}
func NewService() *Service {
return &Service{}
}
func (s *Service) AnalyzeArchivePayload(filename string, payload []byte) (*models.AnalysisResult, string, error) {
p := parser.NewBMCParser()
if err := p.ParseFromReader(bytes.NewReader(payload), filename); err != nil {
return nil, "", err
}
return p.Result(), p.DetectedVendor(), nil
}
func (s *Service) AnalyzeRedfishRawPayloads(rawPayloads map[string]any, meta RedfishSourceMetadata) (*models.AnalysisResult, string, error) {
result, err := collector.ReplayRedfishFromRawPayloads(rawPayloads, nil)
if err != nil {
return nil, "", err
}
if result == nil {
return nil, "", fmt.Errorf("redfish replay returned nil result")
}
if strings.TrimSpace(result.Protocol) == "" {
result.Protocol = "redfish"
}
if strings.TrimSpace(result.SourceType) == "" {
result.SourceType = models.SourceTypeAPI
}
if strings.TrimSpace(result.TargetHost) == "" {
result.TargetHost = strings.TrimSpace(meta.TargetHost)
}
if strings.TrimSpace(result.SourceTimezone) == "" {
result.SourceTimezone = strings.TrimSpace(meta.SourceTimezone)
}
if strings.TrimSpace(result.Filename) == "" {
if strings.TrimSpace(meta.Filename) != "" {
result.Filename = strings.TrimSpace(meta.Filename)
} else if target := strings.TrimSpace(result.TargetHost); target != "" {
result.Filename = "redfish://" + target
} else {
result.Filename = "redfish://snapshot"
}
}
return result, "redfish", nil
}

29
internal/models/memory.go Normal file
View File

@@ -0,0 +1,29 @@
package models
import "strings"
// HasInventoryIdentity reports whether the DIMM has enough identifying
// inventory data to treat it as a populated module even when size is unknown.
func (m MemoryDIMM) HasInventoryIdentity() bool {
return strings.TrimSpace(m.SerialNumber) != "" ||
strings.TrimSpace(m.PartNumber) != "" ||
strings.TrimSpace(m.Type) != "" ||
strings.TrimSpace(m.Technology) != "" ||
strings.TrimSpace(m.Description) != ""
}
// IsInstalledInventory reports whether the DIMM represents an installed module
// that should be kept in canonical inventory and exports.
func (m MemoryDIMM) IsInstalledInventory() bool {
if !m.Present {
return false
}
status := strings.ToLower(strings.TrimSpace(m.Status))
switch status {
case "empty", "absent", "not installed":
return false
}
return m.SizeMB > 0 || m.HasInventoryIdentity()
}

View File

@@ -9,17 +9,18 @@ const (
// AnalysisResult contains all parsed data from an archive
type AnalysisResult struct {
Filename string `json:"filename"`
SourceType string `json:"source_type,omitempty"` // archive | api
Protocol string `json:"protocol,omitempty"` // redfish | ipmi
TargetHost string `json:"target_host,omitempty"` // BMC host for live collect
SourceTimezone string `json:"source_timezone,omitempty"` // Source timezone/offset used during collection (e.g. +08:00)
CollectedAt time.Time `json:"collected_at,omitempty"` // Collection/upload timestamp
RawPayloads map[string]any `json:"raw_payloads,omitempty"` // Additional source payloads (e.g. Redfish tree)
Events []Event `json:"events"`
FRU []FRUInfo `json:"fru"`
Sensors []SensorReading `json:"sensors"`
Hardware *HardwareConfig `json:"hardware"`
Filename string `json:"filename"`
SourceType string `json:"source_type,omitempty"` // archive | api
Protocol string `json:"protocol,omitempty"` // redfish | ipmi
TargetHost string `json:"target_host,omitempty"` // BMC host for live collect
SourceTimezone string `json:"source_timezone,omitempty"` // Source timezone/offset used during collection (e.g. +08:00)
CollectedAt time.Time `json:"collected_at,omitempty"` // Collection/upload timestamp
InventoryLastModifiedAt time.Time `json:"inventory_last_modified_at,omitempty"` // Redfish inventory last modified (InventoryData/Status)
RawPayloads map[string]any `json:"raw_payloads,omitempty"` // Additional source payloads (e.g. Redfish tree)
Events []Event `json:"events"`
FRU []FRUInfo `json:"fru"`
Sensors []SensorReading `json:"sensors"`
Hardware *HardwareConfig `json:"hardware"`
}
// Event represents a single log event
@@ -110,43 +111,45 @@ const (
// HardwareDevice is canonical device inventory used across UI and exports.
type HardwareDevice struct {
ID string `json:"id"`
Kind string `json:"kind"`
Source string `json:"source,omitempty"`
Slot string `json:"slot,omitempty"`
Location string `json:"location,omitempty"`
BDF string `json:"bdf,omitempty"`
DeviceClass string `json:"device_class,omitempty"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
Model string `json:"model,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Type string `json:"type,omitempty"`
Interface string `json:"interface,omitempty"`
Present *bool `json:"present,omitempty"`
SizeMB int `json:"size_mb,omitempty"`
SizeGB int `json:"size_gb,omitempty"`
Cores int `json:"cores,omitempty"`
Threads int `json:"threads,omitempty"`
FrequencyMHz int `json:"frequency_mhz,omitempty"`
MaxFreqMHz int `json:"max_frequency_mhz,omitempty"`
PortCount int `json:"port_count,omitempty"`
PortType string `json:"port_type,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
Status string `json:"status,omitempty"`
ID string `json:"id"`
Kind string `json:"kind"`
Source string `json:"source,omitempty"`
Slot string `json:"slot,omitempty"`
Location string `json:"location,omitempty"`
BDF string `json:"bdf,omitempty"`
DeviceClass string `json:"device_class,omitempty"`
VendorID int `json:"vendor_id,omitempty"`
DeviceID int `json:"device_id,omitempty"`
Model string `json:"model,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Type string `json:"type,omitempty"`
Interface string `json:"interface,omitempty"`
Present *bool `json:"present,omitempty"`
SizeMB int `json:"size_mb,omitempty"`
SizeGB int `json:"size_gb,omitempty"`
Cores int `json:"cores,omitempty"`
Threads int `json:"threads,omitempty"`
FrequencyMHz int `json:"frequency_mhz,omitempty"`
MaxFreqMHz int `json:"max_frequency_mhz,omitempty"`
PortCount int `json:"port_count,omitempty"`
PortType string `json:"port_type,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
RemainingEndurancePct *int `json:"remaining_endurance_pct,omitempty"` // 0-100 %; nil = not reported
NUMANode int `json:"numa_node,omitempty"` // 0 = not reported/N/A
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
@@ -167,14 +170,14 @@ type FirmwareInfo struct {
// BoardInfo represents motherboard/system information
type BoardInfo struct {
Manufacturer string `json:"manufacturer,omitempty"`
ProductName string `json:"product_name,omitempty"`
Description string `json:"description,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Version string `json:"version,omitempty"`
UUID string `json:"uuid,omitempty"`
BMCMACAddress string `json:"bmc_mac_address,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
ProductName string `json:"product_name,omitempty"`
Description string `json:"description,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Version string `json:"version,omitempty"`
UUID string `json:"uuid,omitempty"`
BMCMACAddress string `json:"bmc_mac_address,omitempty"`
}
// CPU represents processor information
@@ -194,11 +197,12 @@ type CPU struct {
SerialNumber string `json:"serial_number,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// MemoryDIMM represents a memory module
@@ -218,31 +222,34 @@ type MemoryDIMM struct {
Status string `json:"status,omitempty"`
Ranks int `json:"ranks,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// Storage represents a storage device
type Storage struct {
Slot string `json:"slot"`
Type string `json:"type"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
SizeGB int `json:"size_gb"`
SerialNumber string `json:"serial_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Firmware string `json:"firmware,omitempty"`
Interface string `json:"interface,omitempty"`
Present bool `json:"present"`
Location string `json:"location,omitempty"` // Front/Rear
BackplaneID int `json:"backplane_id,omitempty"`
Status string `json:"status,omitempty"`
Slot string `json:"slot"`
Type string `json:"type"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
SizeGB int `json:"size_gb"`
SerialNumber string `json:"serial_number,omitempty"`
Manufacturer string `json:"manufacturer,omitempty"`
Firmware string `json:"firmware,omitempty"`
Interface string `json:"interface,omitempty"`
Present bool `json:"present"`
Location string `json:"location,omitempty"` // Front/Rear
BackplaneID int `json:"backplane_id,omitempty"`
RemainingEndurancePct *int `json:"remaining_endurance_pct,omitempty"` // 0-100 %; nil = not reported
Status string `json:"status,omitempty"`
Details map[string]any `json:"details,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
@@ -250,15 +257,15 @@ type Storage struct {
// StorageVolume represents a logical storage volume (RAID/VROC/etc.).
type StorageVolume struct {
ID string `json:"id,omitempty"`
Name string `json:"name,omitempty"`
Controller string `json:"controller,omitempty"`
RAIDLevel string `json:"raid_level,omitempty"`
SizeGB int `json:"size_gb,omitempty"`
CapacityBytes int64 `json:"capacity_bytes,omitempty"`
Status string `json:"status,omitempty"`
Bootable bool `json:"bootable,omitempty"`
Encrypted bool `json:"encrypted,omitempty"`
ID string `json:"id,omitempty"`
Name string `json:"name,omitempty"`
Controller string `json:"controller,omitempty"`
RAIDLevel string `json:"raid_level,omitempty"`
SizeGB int `json:"size_gb,omitempty"`
CapacityBytes int64 `json:"capacity_bytes,omitempty"`
Status string `json:"status,omitempty"`
Bootable bool `json:"bootable,omitempty"`
Encrypted bool `json:"encrypted,omitempty"`
}
// PCIeDevice represents a PCIe device
@@ -277,13 +284,15 @@ type PCIeDevice struct {
PartNumber string `json:"part_number,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
NUMANode int `json:"numa_node,omitempty"` // 0 = not reported/N/A
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// NIC represents a network interface card
@@ -298,25 +307,26 @@ type NIC struct {
// PSU represents a power supply unit
type PSU struct {
Slot string `json:"slot"`
Present bool `json:"present"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
OutputVoltage float64 `json:"output_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
Slot string `json:"slot"`
Present bool `json:"present"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Vendor string `json:"vendor,omitempty"`
WattageW int `json:"wattage_w,omitempty"`
SerialNumber string `json:"serial_number,omitempty"`
PartNumber string `json:"part_number,omitempty"`
Firmware string `json:"firmware,omitempty"`
Status string `json:"status,omitempty"`
InputType string `json:"input_type,omitempty"`
InputPowerW int `json:"input_power_w,omitempty"`
OutputPowerW int `json:"output_power_w,omitempty"`
InputVoltage float64 `json:"input_voltage,omitempty"`
OutputVoltage float64 `json:"output_voltage,omitempty"`
TemperatureC int `json:"temperature_c,omitempty"`
Details map[string]any `json:"details,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
@@ -353,11 +363,12 @@ type GPU struct {
CurrentLinkSpeed string `json:"current_link_speed,omitempty"`
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}
// NetworkAdapter represents a network adapter with detailed info
@@ -365,6 +376,7 @@ type NetworkAdapter struct {
Slot string `json:"slot"`
Location string `json:"location"`
Present bool `json:"present"`
BDF string `json:"bdf,omitempty"`
Model string `json:"model"`
Description string `json:"description,omitempty"`
Vendor string `json:"vendor,omitempty"`
@@ -376,11 +388,17 @@ type NetworkAdapter struct {
PortCount int `json:"port_count,omitempty"`
PortType string `json:"port_type,omitempty"`
MACAddresses []string `json:"mac_addresses,omitempty"`
LinkWidth int `json:"link_width,omitempty"`
LinkSpeed string `json:"link_speed,omitempty"`
MaxLinkWidth int `json:"max_link_width,omitempty"`
MaxLinkSpeed string `json:"max_link_speed,omitempty"`
NUMANode int `json:"numa_node,omitempty"` // 0 = not reported/N/A
Status string `json:"status,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusCheckedAt *time.Time `json:"status_checked_at,omitempty"`
StatusChangedAt *time.Time `json:"status_changed_at,omitempty"`
StatusAtCollect *StatusAtCollection `json:"status_at_collection,omitempty"`
StatusHistory []StatusHistoryEntry `json:"status_history,omitempty"`
ErrorDescription string `json:"error_description,omitempty"`
Details map[string]any `json:"details,omitempty"`
}

View File

@@ -19,6 +19,7 @@ const maxZipArchiveSize = 50 * 1024 * 1024
const maxGzipDecompressedSize = 50 * 1024 * 1024
var supportedArchiveExt = map[string]struct{}{
".ahs": {},
".gz": {},
".tgz": {},
".tar": {},
@@ -45,6 +46,8 @@ func ExtractArchive(archivePath string) ([]ExtractedFile, error) {
ext := strings.ToLower(filepath.Ext(archivePath))
switch ext {
case ".ahs":
return extractSingleFile(archivePath)
case ".gz", ".tgz":
return extractTarGz(archivePath)
case ".tar", ".sds":
@@ -66,6 +69,8 @@ func ExtractArchiveFromReader(r io.Reader, filename string) ([]ExtractedFile, er
ext := strings.ToLower(filepath.Ext(filename))
switch ext {
case ".ahs":
return extractSingleFileFromReader(r, filename)
case ".gz", ".tgz":
return extractTarGzFromReader(r, filename)
case ".tar", ".sds":

View File

@@ -76,6 +76,7 @@ func TestIsSupportedArchiveFilename(t *testing.T) {
name string
want bool
}{
{name: "HPE_CZ2D1X0GS3_20260330.ahs", want: true},
{name: "dump.tar.gz", want: true},
{name: "nvidia-bug-report-1651124000923.log.gz", want: true},
{name: "snapshot.zip", want: true},
@@ -124,3 +125,20 @@ func TestExtractArchiveFromReaderSDS(t *testing.T) {
t.Fatalf("expected bmc/pack.info, got %q", files[0].Path)
}
}
func TestExtractArchiveFromReaderAHS(t *testing.T) {
payload := []byte("ABJRtest")
files, err := ExtractArchiveFromReader(bytes.NewReader(payload), "sample.ahs")
if err != nil {
t.Fatalf("extract ahs from reader: %v", err)
}
if len(files) != 1 {
t.Fatalf("expected 1 extracted file, got %d", len(files))
}
if files[0].Path != "sample.ahs" {
t.Fatalf("expected sample.ahs, got %q", files[0].Path)
}
if string(files[0].Content) != string(payload) {
t.Fatalf("content mismatch")
}
}

View File

@@ -0,0 +1,135 @@
package parser
import (
"fmt"
"regexp"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
var manufacturedYearWeekPattern = regexp.MustCompile(`^\d{4}-W\d{2}$`)
// NormalizeManufacturedYearWeek converts common FRU manufacturing date formats
// into contract-compatible YYYY-Www values. Unknown or ambiguous inputs return "".
func NormalizeManufacturedYearWeek(raw string) string {
value := strings.TrimSpace(raw)
if value == "" {
return ""
}
upper := strings.ToUpper(value)
if manufacturedYearWeekPattern.MatchString(upper) {
return upper
}
layouts := []string{
time.RFC3339,
"2006-01-02T15:04:05",
"2006-01-02 15:04:05",
"2006-01-02",
"2006/01/02",
"01/02/2006 15:04:05",
"01/02/2006",
"01-02-2006",
"Mon Jan 2 15:04:05 2006",
"Mon Jan _2 15:04:05 2006",
"Jan 2 2006",
"Jan _2 2006",
}
for _, layout := range layouts {
if ts, err := time.Parse(layout, value); err == nil {
year, week := ts.ISOWeek()
return formatYearWeek(year, week)
}
}
return ""
}
func formatYearWeek(year, week int) string {
if year <= 0 || week <= 0 || week > 53 {
return ""
}
return fmt.Sprintf("%04d-W%02d", year, week)
}
// ApplyManufacturedYearWeekFromFRU attaches normalized manufactured_year_week to
// component details by exact serial-number match. Board-level FRU entries are not
// expanded to components.
func ApplyManufacturedYearWeekFromFRU(frus []models.FRUInfo, hw *models.HardwareConfig) {
if hw == nil || len(frus) == 0 {
return
}
bySerial := make(map[string]string, len(frus))
for _, fru := range frus {
serial := normalizeFRUSerial(fru.SerialNumber)
yearWeek := NormalizeManufacturedYearWeek(fru.MfgDate)
if serial == "" || yearWeek == "" {
continue
}
if _, exists := bySerial[serial]; exists {
continue
}
bySerial[serial] = yearWeek
}
if len(bySerial) == 0 {
return
}
for i := range hw.CPUs {
attachYearWeek(&hw.CPUs[i].Details, bySerial[normalizeFRUSerial(hw.CPUs[i].SerialNumber)])
}
for i := range hw.Memory {
attachYearWeek(&hw.Memory[i].Details, bySerial[normalizeFRUSerial(hw.Memory[i].SerialNumber)])
}
for i := range hw.Storage {
attachYearWeek(&hw.Storage[i].Details, bySerial[normalizeFRUSerial(hw.Storage[i].SerialNumber)])
}
for i := range hw.PCIeDevices {
attachYearWeek(&hw.PCIeDevices[i].Details, bySerial[normalizeFRUSerial(hw.PCIeDevices[i].SerialNumber)])
}
for i := range hw.GPUs {
attachYearWeek(&hw.GPUs[i].Details, bySerial[normalizeFRUSerial(hw.GPUs[i].SerialNumber)])
}
for i := range hw.NetworkAdapters {
attachYearWeek(&hw.NetworkAdapters[i].Details, bySerial[normalizeFRUSerial(hw.NetworkAdapters[i].SerialNumber)])
}
for i := range hw.PowerSupply {
attachYearWeek(&hw.PowerSupply[i].Details, bySerial[normalizeFRUSerial(hw.PowerSupply[i].SerialNumber)])
}
}
func attachYearWeek(details *map[string]any, yearWeek string) {
if yearWeek == "" {
return
}
if *details == nil {
*details = map[string]any{}
}
if existing, ok := (*details)["manufactured_year_week"]; ok && strings.TrimSpace(toString(existing)) != "" {
return
}
(*details)["manufactured_year_week"] = yearWeek
}
func normalizeFRUSerial(v string) string {
s := strings.TrimSpace(v)
if s == "" {
return ""
}
switch strings.ToUpper(s) {
case "N/A", "NA", "NULL", "UNKNOWN", "-", "0":
return ""
default:
return strings.ToUpper(s)
}
}
func toString(v any) string {
switch x := v.(type) {
case string:
return x
default:
return strings.TrimSpace(fmt.Sprint(v))
}
}

View File

@@ -0,0 +1,65 @@
package parser
import (
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestNormalizeManufacturedYearWeek(t *testing.T) {
tests := []struct {
in string
want string
}{
{"2024-W07", "2024-W07"},
{"2024-02-13", "2024-W07"},
{"02/13/2024", "2024-W07"},
{"Tue Feb 13 12:00:00 2024", "2024-W07"},
{"", ""},
{"not-a-date", ""},
}
for _, tt := range tests {
if got := NormalizeManufacturedYearWeek(tt.in); got != tt.want {
t.Fatalf("NormalizeManufacturedYearWeek(%q) = %q, want %q", tt.in, got, tt.want)
}
}
}
func TestApplyManufacturedYearWeekFromFRU_AttachesByExactSerial(t *testing.T) {
hw := &models.HardwareConfig{
PowerSupply: []models.PSU{
{
Slot: "PSU0",
SerialNumber: "PSU-SN-001",
},
},
Storage: []models.Storage{
{
Slot: "OB01",
SerialNumber: "DISK-SN-001",
},
},
}
fru := []models.FRUInfo{
{
Description: "PSU0_FRU (ID 30)",
SerialNumber: "PSU-SN-001",
MfgDate: "2024-02-13",
},
{
Description: "Builtin FRU Device (ID 0)",
SerialNumber: "BOARD-SN-001",
MfgDate: "2024-02-01",
},
}
ApplyManufacturedYearWeekFromFRU(fru, hw)
if got := hw.PowerSupply[0].Details["manufactured_year_week"]; got != "2024-W07" {
t.Fatalf("expected PSU year week 2024-W07, got %#v", hw.PowerSupply[0].Details)
}
if hw.Storage[0].Details != nil {
t.Fatalf("expected unmatched storage serial to stay untouched, got %#v", hw.Storage[0].Details)
}
}

View File

@@ -16,6 +16,7 @@ import (
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
"git.mchus.pro/mchus/logpile/internal/parser/vendors/pciids"
)
const parserVersion = "3.0"
@@ -199,7 +200,7 @@ func parseDCIMViewXML(content []byte, result *models.AnalysisResult) {
parsePowerSupplyView(props, result)
case "DCIM_PCIDeviceView":
parsePCIeDeviceView(props, result)
case "DCIM_NICView":
case "DCIM_NICView", "DCIM_InfiniBandView":
parseNICView(props, result)
case "DCIM_VideoView":
parseVideoView(props, result)
@@ -374,6 +375,10 @@ func parsePhysicalDiskView(props map[string]string, result *models.AnalysisResul
Location: strings.TrimSpace(props["devicedescription"]),
Status: normalizeStatus(firstNonEmpty(props["raidstatus"], props["primarystatus"])),
}
if v := strings.TrimSpace(props["remainingratedwriteendurance"]); v != "" {
n := parseIntLoose(v)
st.RemainingEndurancePct = &n
}
result.Hardware.Storage = append(result.Hardware.Storage, st)
}
@@ -424,20 +429,60 @@ func parsePowerSupplyView(props map[string]string, result *models.AnalysisResult
result.Hardware.PowerSupply = append(result.Hardware.PowerSupply, psu)
}
// pcieFQDDNoisePrefix lists FQDD prefixes that represent internal chipset/CPU
// components or devices already captured with richer data elsewhere:
// - HostBridge/P2PBridge/ISABridge/SMBus: AMD EPYC internal fabric, not PCIe slots
// - AHCI.Embedded: AMD FCH SATA, not a slot device
// - Video.Embedded: BMC/iDRAC Matrox graphics chip, not user-visible
// - NIC.Embedded: already parsed from DCIM_NICView with model and MAC addresses
var pcieFQDDNoisePrefix = []string{
"HostBridge.Embedded.",
"P2PBridge.Embedded.",
"ISABridge.Embedded.",
"SMBus.Embedded.",
"AHCI.Embedded.",
"Video.Embedded.",
// All NIC FQDD classes are parsed from DCIM_NICView / DCIM_InfiniBandView into
// NetworkAdapters with model, MAC, firmware, and VendorID/DeviceID. The
// DCIM_PCIDeviceView duplicate carries only DataBusWidth ("Unknown", "16x or x16")
// and no useful extra data, so suppress it here.
"NIC.",
"InfiniBand.",
}
func parsePCIeDeviceView(props map[string]string, result *models.AnalysisResult) {
desc := strings.TrimSpace(firstNonEmpty(props["devicedescription"], props["description"]))
// "description" is the chip/device model (e.g. "MT28908 Family [ConnectX-6]"); prefer
// it over "devicedescription" which is the location string ("InfiniBand in Slot 1 Port 1").
desc := strings.TrimSpace(firstNonEmpty(props["description"], props["devicedescription"]))
fqdd := strings.TrimSpace(firstNonEmpty(props["fqdd"], props["instanceid"]))
if desc == "" && fqdd == "" {
return
}
for _, prefix := range pcieFQDDNoisePrefix {
if strings.HasPrefix(fqdd, prefix) {
return
}
}
vendorID := parseHexOrDec(firstNonEmpty(props["pcivendorid"], props["vendorid"]))
deviceID := parseHexOrDec(firstNonEmpty(props["pcideviceid"], props["deviceid"]))
manufacturer := strings.TrimSpace(props["manufacturer"])
// General rule: if chip model not found in logs but PCI IDs are known, resolve from pci.ids
if desc == "" && vendorID != 0 && deviceID != 0 {
desc = pciids.DeviceName(vendorID, deviceID)
}
if manufacturer == "" && vendorID != 0 {
manufacturer = pciids.VendorName(vendorID)
}
p := models.PCIeDevice{
Slot: fqdd,
Description: desc,
VendorID: parseHexOrDec(firstNonEmpty(props["pcivendorid"], props["vendorid"])),
DeviceID: parseHexOrDec(firstNonEmpty(props["pcideviceid"], props["deviceid"])),
VendorID: vendorID,
DeviceID: deviceID,
BDF: formatBDF(props["busnumber"], props["devicenumber"], props["functionnumber"]),
DeviceClass: strings.TrimSpace(props["databuswidth"]),
Manufacturer: strings.TrimSpace(props["manufacturer"]),
Manufacturer: manufacturer,
NUMANode: parseIntLoose(props["cpuaffinity"]),
Status: normalizeStatus(props["primarystatus"]),
}
result.Hardware.PCIeDevices = append(result.Hardware.PCIeDevices, p)
@@ -450,15 +495,31 @@ func parseNICView(props map[string]string, result *models.AnalysisResult) {
return
}
mac := strings.TrimSpace(firstNonEmpty(props["currentmacaddress"], props["permanentmacaddress"]))
vendorID := parseHexOrDec(firstNonEmpty(props["pcivendorid"], props["vendorid"]))
deviceID := parseHexOrDec(firstNonEmpty(props["pcideviceid"], props["deviceid"]))
vendor := strings.TrimSpace(firstNonEmpty(props["vendorname"], props["manufacturer"]))
// Prefer pci.ids chip model over generic ProductName when PCI IDs are available.
// Dell TSR often reports a marketing name (e.g. "Mellanox Network Adapter") while
// pci.ids has the precise chip identifier (e.g. "MT28908 Family [ConnectX-6]").
if vendorID != 0 && deviceID != 0 {
if chipModel := pciids.DeviceName(vendorID, deviceID); chipModel != "" {
model = chipModel
}
if vendor == "" {
vendor = pciids.VendorName(vendorID)
}
}
n := models.NetworkAdapter{
Slot: fqdd,
Location: strings.TrimSpace(firstNonEmpty(props["devicedescription"], fqdd)),
Present: true,
Model: model,
Description: strings.TrimSpace(props["protocol"]),
Vendor: strings.TrimSpace(firstNonEmpty(props["vendorname"], props["manufacturer"])),
VendorID: parseHexOrDec(firstNonEmpty(props["pcivendorid"], props["vendorid"])),
DeviceID: parseHexOrDec(firstNonEmpty(props["pcideviceid"], props["deviceid"])),
Vendor: vendor,
VendorID: vendorID,
DeviceID: deviceID,
SerialNumber: strings.TrimSpace(props["serialnumber"]),
PartNumber: strings.TrimSpace(props["partnumber"]),
Firmware: strings.TrimSpace(firstNonEmpty(
@@ -468,6 +529,7 @@ func parseNICView(props map[string]string, result *models.AnalysisResult) {
props["controllerbiosversion"],
)),
PortCount: inferPortCountFromFQDD(fqdd),
NUMANode: parseIntLoose(props["cpuaffinity"]),
Status: normalizeStatus(props["primarystatus"]),
}
if mac != "" {
@@ -521,10 +583,11 @@ func parseControllerView(props map[string]string, result *models.AnalysisResult)
DeviceClass: "storage-controller",
Manufacturer: strings.TrimSpace(firstNonEmpty(props["devicecardmanufacturer"], props["manufacturer"])),
PartNumber: strings.TrimSpace(firstNonEmpty(props["ppid"], props["boardpartnumber"])),
NUMANode: parseIntLoose(props["cpuaffinity"]),
Status: normalizeStatus(props["primarystatus"]),
})
addFirmware(result, firstNonEmpty(name, fqdd), props["controllerfirmwareversion"], "storage controller")
addFirmware(result, firstNonEmpty(name, fqdd), props["controllerfirmwareversion"], firstNonEmpty(fqdd, "storage controller"))
}
func parseControllerBatteryView(props map[string]string, result *models.AnalysisResult) {
@@ -1110,6 +1173,7 @@ func mergeStorage(dst *models.Storage, src models.Storage) {
}
setIfEmpty(&dst.Location, src.Location)
setIfEmpty(&dst.Status, src.Status)
dst.Details = mergeDellDetails(dst.Details, src.Details)
}
func dedupeVolumes(items []models.StorageVolume) []models.StorageVolume {
@@ -1181,6 +1245,22 @@ func mergePSU(dst *models.PSU, src models.PSU) {
dst.InputVoltage = src.InputVoltage
}
setIfEmpty(&dst.InputType, src.InputType)
dst.Details = mergeDellDetails(dst.Details, src.Details)
}
func mergeDellDetails(primary, secondary map[string]any) map[string]any {
if len(secondary) == 0 {
return primary
}
if primary == nil {
primary = make(map[string]any, len(secondary))
}
for key, value := range secondary {
if _, ok := primary[key]; !ok {
primary[key] = value
}
}
return primary
}
func dedupeNetworkAdapters(items []models.NetworkAdapter) []models.NetworkAdapter {

View File

@@ -204,6 +204,262 @@ func TestParseNestedTSRZip(t *testing.T) {
}
}
// TestParseDellPhysicalDiskEndurance verifies that RemainingRatedWriteEndurance from
// DCIM_PhysicalDiskView is parsed into Storage.RemainingEndurancePct.
func TestParseDellPhysicalDiskEndurance(t *testing.T) {
const viewXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SystemView">
<PROPERTY NAME="Manufacturer"><VALUE>Dell Inc.</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>PowerEdge R6625</VALUE></PROPERTY>
<PROPERTY NAME="ServiceTag"><VALUE>8VS2LG4</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PhysicalDiskView">
<PROPERTY NAME="FQDD"><VALUE>Disk.Bay.0:Enclosure.Internal.0-1:RAID.SL.3-1</VALUE></PROPERTY>
<PROPERTY NAME="Slot"><VALUE>0</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>HFS480G3H2X069N</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>ESEAN5254I030B26B</VALUE></PROPERTY>
<PROPERTY NAME="SizeInBytes"><VALUE>479559942144</VALUE></PROPERTY>
<PROPERTY NAME="MediaType"><VALUE>Solid State Drive</VALUE></PROPERTY>
<PROPERTY NAME="BusProtocol"><VALUE>SATA</VALUE></PROPERTY>
<PROPERTY NAME="Revision"><VALUE>DZ03</VALUE></PROPERTY>
<PROPERTY NAME="RemainingRatedWriteEndurance"><VALUE>100</VALUE><DisplayValue>100 %</DisplayValue></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>1</VALUE><DisplayValue>OK</DisplayValue></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PhysicalDiskView">
<PROPERTY NAME="FQDD"><VALUE>Disk.Bay.1:Enclosure.Internal.0-1:RAID.SL.3-1</VALUE></PROPERTY>
<PROPERTY NAME="Slot"><VALUE>1</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>TOSHIBA MG08ADA800E</VALUE></PROPERTY>
<PROPERTY NAME="SerialNumber"><VALUE>X1G0A0YXFVVG</VALUE></PROPERTY>
<PROPERTY NAME="SizeInBytes"><VALUE>8001563222016</VALUE></PROPERTY>
<PROPERTY NAME="MediaType"><VALUE>Hard Disk Drive</VALUE></PROPERTY>
<PROPERTY NAME="BusProtocol"><VALUE>SAS</VALUE></PROPERTY>
<PROPERTY NAME="Revision"><VALUE>0104</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{"Make":"Dell Inc.","Model":"PowerEdge R6625","ServiceTag":"8VS2LG4"}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(viewXML),
})
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR20260306141852_8VS2LG4.pl.zip", Content: inner},
})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if len(result.Hardware.Storage) != 2 {
t.Fatalf("expected 2 storage devices, got %d", len(result.Hardware.Storage))
}
ssd := result.Hardware.Storage[0]
if ssd.RemainingEndurancePct == nil {
t.Fatalf("SSD slot 0: expected RemainingEndurancePct to be set")
}
if *ssd.RemainingEndurancePct != 100 {
t.Errorf("SSD slot 0: expected RemainingEndurancePct=100, got %d", *ssd.RemainingEndurancePct)
}
hdd := result.Hardware.Storage[1]
if hdd.RemainingEndurancePct != nil {
t.Errorf("HDD slot 1: expected RemainingEndurancePct absent, got %d", *hdd.RemainingEndurancePct)
}
}
// TestParseDellInfiniBandView verifies that DCIM_InfiniBandView entries are parsed as
// NetworkAdapters (not PCIe devices) and that the corresponding SoftwareIdentity firmware
// entry with FQDD "InfiniBand.Slot.*" does not leak into hardware.firmware.
//
// Regression guard: PowerEdge R6625 (8VS2LG4) — "Mellanox Network Adapter" version
// "20.39.35.60" appeared in hardware.firmware because DCIM_InfiniBandView was ignored
// (device ended up only in PCIeDevices with model "16x or x16") and SoftwareIdentity
// FQDD "InfiniBand.Slot.1-1" was not filtered. (2026-03-15)
func TestParseDellInfiniBandView(t *testing.T) {
const viewXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SystemView">
<PROPERTY NAME="Manufacturer"><VALUE>Dell Inc.</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>PowerEdge R6625</VALUE></PROPERTY>
<PROPERTY NAME="ServiceTag"><VALUE>8VS2LG4</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_InfiniBandView">
<PROPERTY NAME="FQDD"><VALUE>InfiniBand.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="DeviceDescription"><VALUE>InfiniBand in Slot 1 Port 1</VALUE></PROPERTY>
<PROPERTY NAME="CurrentMACAddress"><VALUE>00:1C:FD:D7:5A:E6</VALUE></PROPERTY>
<PROPERTY NAME="FamilyVersion"><VALUE>20.39.35.60</VALUE></PROPERTY>
<PROPERTY NAME="EFIVersion"><VALUE>14.32.17</VALUE></PROPERTY>
<PROPERTY NAME="PCIVendorID"><VALUE>15B3</VALUE></PROPERTY>
<PROPERTY NAME="PCIDeviceID"><VALUE>101B</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PCIDeviceView">
<PROPERTY NAME="FQDD"><VALUE>InfiniBand.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="Description"><VALUE>MT28908 Family [ConnectX-6]</VALUE></PROPERTY>
<PROPERTY NAME="DeviceDescription"><VALUE>InfiniBand in Slot 1 Port 1</VALUE></PROPERTY>
<PROPERTY NAME="Manufacturer"><VALUE>Mellanox Technologies</VALUE></PROPERTY>
<PROPERTY NAME="PCIVendorID"><VALUE>15B3</VALUE></PROPERTY>
<PROPERTY NAME="PCIDeviceID"><VALUE>101B</VALUE></PROPERTY>
<PROPERTY NAME="DataBusWidth"><DisplayValue>16x or x16</DisplayValue></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_ControllerView">
<PROPERTY NAME="FQDD"><VALUE>RAID.SL.3-1</VALUE></PROPERTY>
<PROPERTY NAME="ProductName"><VALUE>PERC H755 Front</VALUE></PROPERTY>
<PROPERTY NAME="ControllerFirmwareVersion"><VALUE>52.30.0-6115</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
const swXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SoftwareIdentity">
<PROPERTY NAME="ElementName"><VALUE>Mellanox Network Adapter - 00:1C:FD:D7:5A:E6</VALUE></PROPERTY>
<PROPERTY NAME="FQDD"><VALUE>InfiniBand.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="VersionString"><VALUE>20.39.35.60</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SoftwareIdentity">
<PROPERTY NAME="ElementName"><VALUE>PERC H755 Front</VALUE></PROPERTY>
<PROPERTY NAME="FQDD"><VALUE>RAID.SL.3-1</VALUE></PROPERTY>
<PROPERTY NAME="VersionString"><VALUE>52.30.0-6115</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SoftwareIdentity">
<PROPERTY NAME="ElementName"><VALUE>BIOS</VALUE></PROPERTY>
<PROPERTY NAME="FQDD"><VALUE>BIOS.Setup.1-1</VALUE></PROPERTY>
<PROPERTY NAME="VersionString"><VALUE>1.15.3</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{"Make":"Dell Inc.","Model":"PowerEdge R6625","ServiceTag":"8VS2LG4"}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(viewXML),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml": []byte(swXML),
})
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR20260306141852_8VS2LG4.pl.zip", Content: inner},
})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
// InfiniBand adapter must appear as a NetworkAdapter, not a PCIe device.
if len(result.Hardware.NetworkAdapters) != 1 {
t.Fatalf("expected 1 network adapter, got %d", len(result.Hardware.NetworkAdapters))
}
nic := result.Hardware.NetworkAdapters[0]
if nic.Slot != "InfiniBand.Slot.1-1" {
t.Errorf("unexpected NIC slot: %q", nic.Slot)
}
if nic.Firmware != "20.39.35.60" {
t.Errorf("unexpected NIC firmware: %q", nic.Firmware)
}
if len(nic.MACAddresses) == 0 || nic.MACAddresses[0] != "00:1C:FD:D7:5A:E6" {
t.Errorf("unexpected NIC MAC: %v", nic.MACAddresses)
}
// pci.ids enrichment: VendorID=0x15B3, DeviceID=0x101B → chip model + vendor name.
if nic.Model != "MT28908 Family [ConnectX-6]" {
t.Errorf("NIC model = %q, want MT28908 Family [ConnectX-6] (from pci.ids)", nic.Model)
}
if nic.Vendor != "Mellanox Technologies" {
t.Errorf("NIC vendor = %q, want Mellanox Technologies (from pci.ids)", nic.Vendor)
}
// InfiniBand FQDD must NOT appear in PCIe devices.
for _, pcie := range result.Hardware.PCIeDevices {
if pcie.Slot == "InfiniBand.Slot.1-1" {
t.Errorf("InfiniBand.Slot.1-1 must not appear in PCIeDevices")
}
}
// Firmware entries from SoftwareIdentity and parseControllerView must carry the FQDD
// as their Description so the exporter's isDeviceBoundFirmwareFQDD filter can remove them.
fqddByName := make(map[string]string)
for _, fw := range result.Hardware.Firmware {
fqddByName[fw.DeviceName] = fw.Description
}
if desc := fqddByName["Mellanox Network Adapter"]; desc != "InfiniBand.Slot.1-1" {
t.Errorf("Mellanox firmware Description = %q, want InfiniBand.Slot.1-1 for FQDD filter", desc)
}
if desc := fqddByName["PERC H755 Front"]; desc != "RAID.SL.3-1" {
t.Errorf("PERC H755 Front firmware Description = %q, want RAID.SL.3-1 for FQDD filter", desc)
}
}
// TestParseDellCPUAffinity verifies that CPUAffinity is parsed into NUMANode for
// NIC, PCIe, and controller views. "Not Applicable" must result in NUMANode=0.
func TestParseDellCPUAffinity(t *testing.T) {
const viewXML = `<CIM><MESSAGE><SIMPLEREQ>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_SystemView">
<PROPERTY NAME="Manufacturer"><VALUE>Dell Inc.</VALUE></PROPERTY>
<PROPERTY NAME="Model"><VALUE>PowerEdge R750</VALUE></PROPERTY>
<PROPERTY NAME="ServiceTag"><VALUE>TESTST1</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_NICView">
<PROPERTY NAME="FQDD"><VALUE>NIC.Slot.2-1-1</VALUE></PROPERTY>
<PROPERTY NAME="ProductName"><VALUE>Some NIC</VALUE></PROPERTY>
<PROPERTY NAME="CPUAffinity"><VALUE>1</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_InfiniBandView">
<PROPERTY NAME="FQDD"><VALUE>InfiniBand.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="DeviceDescription"><VALUE>InfiniBand in Slot 1</VALUE></PROPERTY>
<PROPERTY NAME="CPUAffinity"><VALUE>2</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_ControllerView">
<PROPERTY NAME="FQDD"><VALUE>RAID.Slot.1-1</VALUE></PROPERTY>
<PROPERTY NAME="ProductName"><VALUE>PERC H755</VALUE></PROPERTY>
<PROPERTY NAME="CPUAffinity"><VALUE>Not Applicable</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
<VALUE.NAMEDINSTANCE><INSTANCE CLASSNAME="DCIM_PCIDeviceView">
<PROPERTY NAME="FQDD"><VALUE>Slot.7-1</VALUE></PROPERTY>
<PROPERTY NAME="Description"><VALUE>Some PCIe Card</VALUE></PROPERTY>
<PROPERTY NAME="CPUAffinity"><VALUE>2</VALUE></PROPERTY>
<PROPERTY NAME="PrimaryStatus"><VALUE>0</VALUE></PROPERTY>
</INSTANCE></VALUE.NAMEDINSTANCE>
</SIMPLEREQ></MESSAGE></CIM>`
inner := makeZipArchive(t, map[string][]byte{
"tsr/metadata.json": []byte(`{"Make":"Dell Inc.","Model":"PowerEdge R750","ServiceTag":"TESTST1"}`),
"tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml": []byte(viewXML),
})
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{
{Path: "signature", Content: []byte("ok")},
{Path: "TSR_TESTST1.pl.zip", Content: inner},
})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
// NIC CPUAffinity=1 → NUMANode=1
nicBySlot := make(map[string]int)
for _, nic := range result.Hardware.NetworkAdapters {
nicBySlot[nic.Slot] = nic.NUMANode
}
if nicBySlot["NIC.Slot.2-1-1"] != 1 {
t.Errorf("NIC.Slot.2-1-1 NUMANode = %d, want 1", nicBySlot["NIC.Slot.2-1-1"])
}
if nicBySlot["InfiniBand.Slot.1-1"] != 2 {
t.Errorf("InfiniBand.Slot.1-1 NUMANode = %d, want 2", nicBySlot["InfiniBand.Slot.1-1"])
}
// PCIe device CPUAffinity=2 → NUMANode=2; controller CPUAffinity="Not Applicable" → NUMANode=0
pcieBySlot := make(map[string]int)
for _, pcie := range result.Hardware.PCIeDevices {
pcieBySlot[pcie.Slot] = pcie.NUMANode
}
if pcieBySlot["Slot.7-1"] != 2 {
t.Errorf("Slot.7-1 NUMANode = %d, want 2", pcieBySlot["Slot.7-1"])
}
if pcieBySlot["RAID.Slot.1-1"] != 0 {
t.Errorf("RAID.Slot.1-1 NUMANode = %d, want 0 (Not Applicable)", pcieBySlot["RAID.Slot.1-1"])
}
}
func makeZipArchive(t *testing.T, files map[string][]byte) []byte {
t.Helper()
var buf bytes.Buffer

View File

@@ -0,0 +1,601 @@
package easy_bee
import (
"encoding/json"
"fmt"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
const parserVersion = "1.0"
func init() {
parser.Register(&Parser{})
}
// Parser imports support bundles produced by reanimator-easy-bee.
// These archives embed a ready-to-use hardware snapshot in export/bee-audit.json.
type Parser struct{}
func (p *Parser) Name() string {
return "Reanimator Easy Bee Parser"
}
func (p *Parser) Vendor() string {
return "easy_bee"
}
func (p *Parser) Version() string {
return parserVersion
}
func (p *Parser) Detect(files []parser.ExtractedFile) int {
confidence := 0
hasManifest := false
hasBeeAudit := false
hasRuntimeHealth := false
hasTechdump := false
hasBundlePrefix := false
for _, f := range files {
path := strings.ToLower(strings.TrimSpace(f.Path))
content := strings.ToLower(string(f.Content))
if !hasBundlePrefix && strings.Contains(path, "bee-support-") {
hasBundlePrefix = true
confidence += 5
}
if (strings.HasSuffix(path, "/manifest.txt") || path == "manifest.txt") &&
strings.Contains(content, "bee_version=") {
hasManifest = true
confidence += 35
if strings.Contains(content, "export_dir=") {
confidence += 10
}
}
if strings.HasSuffix(path, "/export/bee-audit.json") || path == "bee-audit.json" {
hasBeeAudit = true
confidence += 55
}
if hasBundlePrefix && (strings.HasSuffix(path, "/export/runtime-health.json") || path == "runtime-health.json") {
hasRuntimeHealth = true
confidence += 10
}
if hasBundlePrefix && !hasTechdump && strings.Contains(path, "/export/techdump/") {
hasTechdump = true
confidence += 10
}
}
if hasManifest && hasBeeAudit {
return 100
}
if hasBeeAudit && (hasRuntimeHealth || hasTechdump) {
confidence += 10
}
if confidence > 100 {
return 100
}
return confidence
}
func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, error) {
snapshotFile := findSnapshotFile(files)
if snapshotFile == nil {
return nil, fmt.Errorf("easy-bee snapshot not found")
}
var snapshot beeSnapshot
if err := json.Unmarshal(snapshotFile.Content, &snapshot); err != nil {
return nil, fmt.Errorf("decode %s: %w", snapshotFile.Path, err)
}
manifest := parseManifest(files)
result := &models.AnalysisResult{
SourceType: strings.TrimSpace(snapshot.SourceType),
Protocol: strings.TrimSpace(snapshot.Protocol),
TargetHost: firstNonEmpty(snapshot.TargetHost, manifest.Host),
SourceTimezone: strings.TrimSpace(snapshot.SourceTimezone),
CollectedAt: chooseCollectedAt(snapshot, manifest),
InventoryLastModifiedAt: snapshot.InventoryLastModifiedAt,
RawPayloads: snapshot.RawPayloads,
Events: make([]models.Event, 0),
FRU: append([]models.FRUInfo(nil), snapshot.FRU...),
Sensors: make([]models.SensorReading, 0),
Hardware: &models.HardwareConfig{
Firmware: append([]models.FirmwareInfo(nil), snapshot.Hardware.Firmware...),
BoardInfo: snapshot.Hardware.Board,
Devices: append([]models.HardwareDevice(nil), snapshot.Hardware.Devices...),
CPUs: append([]models.CPU(nil), snapshot.Hardware.CPUs...),
Memory: append([]models.MemoryDIMM(nil), snapshot.Hardware.Memory...),
Storage: append([]models.Storage(nil), snapshot.Hardware.Storage...),
Volumes: append([]models.StorageVolume(nil), snapshot.Hardware.Volumes...),
PCIeDevices: normalizePCIeDevices(snapshot.Hardware.PCIeDevices),
GPUs: append([]models.GPU(nil), snapshot.Hardware.GPUs...),
NetworkCards: append([]models.NIC(nil), snapshot.Hardware.NetworkCards...),
NetworkAdapters: normalizeNetworkAdapters(snapshot.Hardware.NetworkAdapters),
PowerSupply: append([]models.PSU(nil), snapshot.Hardware.PowerSupply...),
},
}
result.Events = append(result.Events, snapshot.Events...)
result.Events = append(result.Events, convertRuntimeToEvents(snapshot.Runtime, result.CollectedAt)...)
result.Events = append(result.Events, convertEventLogs(snapshot.Hardware.EventLogs)...)
result.Sensors = append(result.Sensors, snapshot.Sensors...)
result.Sensors = append(result.Sensors, flattenSensorGroups(snapshot.Hardware.Sensors)...)
if len(result.FRU) == 0 {
if boardFRU, ok := buildBoardFRU(snapshot.Hardware.Board); ok {
result.FRU = append(result.FRU, boardFRU)
}
}
if result.Hardware == nil || (result.Hardware.BoardInfo.SerialNumber == "" &&
len(result.Hardware.CPUs) == 0 &&
len(result.Hardware.Memory) == 0 &&
len(result.Hardware.Storage) == 0 &&
len(result.Hardware.PCIeDevices) == 0 &&
len(result.Hardware.Devices) == 0) {
return nil, fmt.Errorf("unsupported easy-bee snapshot format")
}
return result, nil
}
type beeSnapshot struct {
SourceType string `json:"source_type,omitempty"`
Protocol string `json:"protocol,omitempty"`
TargetHost string `json:"target_host,omitempty"`
SourceTimezone string `json:"source_timezone,omitempty"`
CollectedAt time.Time `json:"collected_at,omitempty"`
InventoryLastModifiedAt time.Time `json:"inventory_last_modified_at,omitempty"`
RawPayloads map[string]any `json:"raw_payloads,omitempty"`
Events []models.Event `json:"events,omitempty"`
FRU []models.FRUInfo `json:"fru,omitempty"`
Sensors []models.SensorReading `json:"sensors,omitempty"`
Hardware beeHardware `json:"hardware"`
Runtime beeRuntime `json:"runtime,omitempty"`
}
type beeHardware struct {
Board models.BoardInfo `json:"board"`
Firmware []models.FirmwareInfo `json:"firmware,omitempty"`
Devices []models.HardwareDevice `json:"devices,omitempty"`
CPUs []models.CPU `json:"cpus,omitempty"`
Memory []models.MemoryDIMM `json:"memory,omitempty"`
Storage []models.Storage `json:"storage,omitempty"`
Volumes []models.StorageVolume `json:"volumes,omitempty"`
PCIeDevices []models.PCIeDevice `json:"pcie_devices,omitempty"`
GPUs []models.GPU `json:"gpus,omitempty"`
NetworkCards []models.NIC `json:"network_cards,omitempty"`
NetworkAdapters []models.NetworkAdapter `json:"network_adapters,omitempty"`
PowerSupply []models.PSU `json:"power_supplies,omitempty"`
Sensors beeSensorGroups `json:"sensors,omitempty"`
EventLogs []beeEventLog `json:"event_logs,omitempty"`
}
type beeSensorGroups struct {
Fans []beeFanSensor `json:"fans,omitempty"`
Power []beePowerSensor `json:"power,omitempty"`
Temperatures []beeTemperatureSensor `json:"temperatures,omitempty"`
Other []beeOtherSensor `json:"other,omitempty"`
}
type beeFanSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
RPM int `json:"rpm,omitempty"`
Status string `json:"status,omitempty"`
}
type beePowerSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
VoltageV float64 `json:"voltage_v,omitempty"`
CurrentA float64 `json:"current_a,omitempty"`
PowerW float64 `json:"power_w,omitempty"`
Status string `json:"status,omitempty"`
}
type beeTemperatureSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
Celsius float64 `json:"celsius,omitempty"`
ThresholdWarningCelsius float64 `json:"threshold_warning_celsius,omitempty"`
ThresholdCriticalCelsius float64 `json:"threshold_critical_celsius,omitempty"`
Status string `json:"status,omitempty"`
}
type beeOtherSensor struct {
Name string `json:"name"`
Location string `json:"location,omitempty"`
Value float64 `json:"value,omitempty"`
Unit string `json:"unit,omitempty"`
Status string `json:"status,omitempty"`
}
type beeRuntime struct {
Status string `json:"status,omitempty"`
CheckedAt time.Time `json:"checked_at,omitempty"`
NetworkStatus string `json:"network_status,omitempty"`
Issues []beeRuntimeIssue `json:"issues,omitempty"`
Services []beeRuntimeStatus `json:"services,omitempty"`
Interfaces []beeInterface `json:"interfaces,omitempty"`
}
type beeRuntimeIssue struct {
Code string `json:"code,omitempty"`
Severity string `json:"severity,omitempty"`
Description string `json:"description,omitempty"`
}
type beeRuntimeStatus struct {
Name string `json:"name,omitempty"`
Status string `json:"status,omitempty"`
}
type beeInterface struct {
Name string `json:"name,omitempty"`
State string `json:"state,omitempty"`
IPv4 []string `json:"ipv4,omitempty"`
Outcome string `json:"outcome,omitempty"`
}
type beeEventLog struct {
Source string `json:"source,omitempty"`
EventTime string `json:"event_time,omitempty"`
Severity string `json:"severity,omitempty"`
MessageID string `json:"message_id,omitempty"`
Message string `json:"message,omitempty"`
RawPayload map[string]any `json:"raw_payload,omitempty"`
}
type manifestMetadata struct {
Host string
GeneratedAtUTC time.Time
}
func findSnapshotFile(files []parser.ExtractedFile) *parser.ExtractedFile {
for i := range files {
path := strings.ToLower(strings.TrimSpace(files[i].Path))
if strings.HasSuffix(path, "/export/bee-audit.json") || path == "bee-audit.json" {
return &files[i]
}
}
for i := range files {
path := strings.ToLower(strings.TrimSpace(files[i].Path))
if strings.HasSuffix(path, ".json") && strings.Contains(path, "reanimator") {
return &files[i]
}
}
return nil
}
func parseManifest(files []parser.ExtractedFile) manifestMetadata {
var meta manifestMetadata
for _, f := range files {
path := strings.ToLower(strings.TrimSpace(f.Path))
if !(strings.HasSuffix(path, "/manifest.txt") || path == "manifest.txt") {
continue
}
lines := strings.Split(string(f.Content), "\n")
for _, line := range lines {
key, value, ok := strings.Cut(strings.TrimSpace(line), "=")
if !ok {
continue
}
switch strings.TrimSpace(key) {
case "host":
meta.Host = strings.TrimSpace(value)
case "generated_at_utc":
if ts, err := time.Parse(time.RFC3339, strings.TrimSpace(value)); err == nil {
meta.GeneratedAtUTC = ts.UTC()
}
}
}
break
}
return meta
}
func chooseCollectedAt(snapshot beeSnapshot, manifest manifestMetadata) time.Time {
switch {
case !snapshot.CollectedAt.IsZero():
return snapshot.CollectedAt.UTC()
case !snapshot.Runtime.CheckedAt.IsZero():
return snapshot.Runtime.CheckedAt.UTC()
case !manifest.GeneratedAtUTC.IsZero():
return manifest.GeneratedAtUTC.UTC()
default:
return time.Time{}
}
}
func convertRuntimeToEvents(runtime beeRuntime, fallback time.Time) []models.Event {
events := make([]models.Event, 0)
ts := runtime.CheckedAt
if ts.IsZero() {
ts = fallback
}
if status := strings.TrimSpace(runtime.Status); status != "" {
desc := "Bee runtime status: " + status
if networkStatus := strings.TrimSpace(runtime.NetworkStatus); networkStatus != "" {
desc += " (network: " + networkStatus + ")"
}
events = append(events, models.Event{
Timestamp: ts,
Source: "Bee Runtime",
EventType: "Runtime Status",
Severity: mapSeverity(status),
Description: desc,
})
}
for _, issue := range runtime.Issues {
desc := strings.TrimSpace(issue.Description)
if desc == "" {
desc = "Bee runtime issue"
}
events = append(events, models.Event{
Timestamp: ts,
Source: "Bee Runtime",
EventType: "Runtime Issue",
Severity: mapSeverity(issue.Severity),
Description: desc,
RawData: strings.TrimSpace(issue.Code),
})
}
for _, svc := range runtime.Services {
status := strings.TrimSpace(svc.Status)
if status == "" || strings.EqualFold(status, "active") {
continue
}
events = append(events, models.Event{
Timestamp: ts,
Source: "systemd",
EventType: "Service Status",
Severity: mapSeverity(status),
Description: fmt.Sprintf("%s is %s", strings.TrimSpace(svc.Name), status),
})
}
for _, iface := range runtime.Interfaces {
state := strings.TrimSpace(iface.State)
outcome := strings.TrimSpace(iface.Outcome)
if state == "" && outcome == "" {
continue
}
if strings.EqualFold(state, "up") && strings.EqualFold(outcome, "lease_acquired") {
continue
}
desc := fmt.Sprintf("interface %s state=%s outcome=%s", strings.TrimSpace(iface.Name), state, outcome)
events = append(events, models.Event{
Timestamp: ts,
Source: "network",
EventType: "Interface Status",
Severity: models.SeverityWarning,
Description: strings.TrimSpace(desc),
})
}
return events
}
func convertEventLogs(items []beeEventLog) []models.Event {
events := make([]models.Event, 0, len(items))
for _, item := range items {
message := strings.TrimSpace(item.Message)
if message == "" {
continue
}
ts := parseEventTime(item.EventTime)
rawData := strings.TrimSpace(item.MessageID)
events = append(events, models.Event{
Timestamp: ts,
Source: firstNonEmpty(strings.TrimSpace(item.Source), "Reanimator"),
EventType: "Event Log",
Severity: mapSeverity(item.Severity),
Description: message,
RawData: rawData,
})
}
return events
}
func parseEventTime(raw string) time.Time {
raw = strings.TrimSpace(raw)
if raw == "" {
return time.Time{}
}
layouts := []string{time.RFC3339Nano, time.RFC3339}
for _, layout := range layouts {
if ts, err := time.Parse(layout, raw); err == nil {
return ts.UTC()
}
}
return time.Time{}
}
func flattenSensorGroups(groups beeSensorGroups) []models.SensorReading {
result := make([]models.SensorReading, 0, len(groups.Fans)+len(groups.Power)+len(groups.Temperatures)+len(groups.Other))
for _, fan := range groups.Fans {
result = append(result, models.SensorReading{
Name: sensorName(fan.Name, fan.Location),
Type: "fan",
Value: float64(fan.RPM),
Unit: "RPM",
Status: strings.TrimSpace(fan.Status),
})
}
for _, power := range groups.Power {
name := sensorName(power.Name, power.Location)
status := strings.TrimSpace(power.Status)
if power.PowerW != 0 {
result = append(result, models.SensorReading{
Name: name,
Type: "power",
Value: power.PowerW,
Unit: "W",
Status: status,
})
}
if power.VoltageV != 0 {
result = append(result, models.SensorReading{
Name: name + " Voltage",
Type: "voltage",
Value: power.VoltageV,
Unit: "V",
Status: status,
})
}
if power.CurrentA != 0 {
result = append(result, models.SensorReading{
Name: name + " Current",
Type: "current",
Value: power.CurrentA,
Unit: "A",
Status: status,
})
}
}
for _, temp := range groups.Temperatures {
result = append(result, models.SensorReading{
Name: sensorName(temp.Name, temp.Location),
Type: "temperature",
Value: temp.Celsius,
Unit: "C",
Status: strings.TrimSpace(temp.Status),
})
}
for _, other := range groups.Other {
result = append(result, models.SensorReading{
Name: sensorName(other.Name, other.Location),
Type: "other",
Value: other.Value,
Unit: strings.TrimSpace(other.Unit),
Status: strings.TrimSpace(other.Status),
})
}
return result
}
func sensorName(name, location string) string {
name = strings.TrimSpace(name)
location = strings.TrimSpace(location)
if name == "" {
return location
}
if location == "" {
return name
}
return name + " [" + location + "]"
}
func normalizePCIeDevices(items []models.PCIeDevice) []models.PCIeDevice {
out := append([]models.PCIeDevice(nil), items...)
for i := range out {
slot := strings.TrimSpace(out[i].Slot)
if out[i].BDF == "" && looksLikeBDF(slot) {
out[i].BDF = slot
}
if out[i].Slot == "" && out[i].BDF != "" {
out[i].Slot = out[i].BDF
}
}
return out
}
func normalizeNetworkAdapters(items []models.NetworkAdapter) []models.NetworkAdapter {
out := append([]models.NetworkAdapter(nil), items...)
for i := range out {
slot := strings.TrimSpace(out[i].Slot)
if out[i].BDF == "" && looksLikeBDF(slot) {
out[i].BDF = slot
}
if out[i].Slot == "" && out[i].BDF != "" {
out[i].Slot = out[i].BDF
}
}
return out
}
func looksLikeBDF(value string) bool {
value = strings.TrimSpace(value)
if len(value) != len("0000:00:00.0") {
return false
}
for i, r := range value {
switch i {
case 4, 7:
if r != ':' {
return false
}
case 10:
if r != '.' {
return false
}
default:
if !((r >= '0' && r <= '9') || (r >= 'a' && r <= 'f') || (r >= 'A' && r <= 'F')) {
return false
}
}
}
return true
}
func buildBoardFRU(board models.BoardInfo) (models.FRUInfo, bool) {
if strings.TrimSpace(board.SerialNumber) == "" &&
strings.TrimSpace(board.Manufacturer) == "" &&
strings.TrimSpace(board.ProductName) == "" &&
strings.TrimSpace(board.PartNumber) == "" {
return models.FRUInfo{}, false
}
return models.FRUInfo{
Description: "System Board",
Manufacturer: strings.TrimSpace(board.Manufacturer),
ProductName: strings.TrimSpace(board.ProductName),
SerialNumber: strings.TrimSpace(board.SerialNumber),
PartNumber: strings.TrimSpace(board.PartNumber),
}, true
}
func mapSeverity(raw string) models.Severity {
switch strings.ToLower(strings.TrimSpace(raw)) {
case "critical", "crit", "error", "failed", "failure":
return models.SeverityCritical
case "warning", "warn", "partial", "degraded", "inactive", "activating", "deactivating":
return models.SeverityWarning
default:
return models.SeverityInfo
}
}
func firstNonEmpty(values ...string) string {
for _, value := range values {
value = strings.TrimSpace(value)
if value != "" {
return value
}
}
return ""
}

View File

@@ -0,0 +1,219 @@
package easy_bee
import (
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestDetectBeeSupportArchive(t *testing.T) {
p := &Parser{}
files := []parser.ExtractedFile{
{
Path: "bee-support-debian-20260325-162030/manifest.txt",
Content: []byte("bee_version=1.0.0\nhost=debian\ngenerated_at_utc=2026-03-25T16:20:30Z\nexport_dir=/appdata/bee/export\n"),
},
{
Path: "bee-support-debian-20260325-162030/export/bee-audit.json",
Content: []byte(`{"hardware":{"board":{"serial_number":"SN-BEE-001"}}}`),
},
{
Path: "bee-support-debian-20260325-162030/export/runtime-health.json",
Content: []byte(`{"status":"PARTIAL"}`),
},
}
if got := p.Detect(files); got < 90 {
t.Fatalf("expected high confidence detect score, got %d", got)
}
}
func TestDetectRejectsNonBeeArchive(t *testing.T) {
p := &Parser{}
files := []parser.ExtractedFile{
{
Path: "random/manifest.txt",
Content: []byte("host=test\n"),
},
{
Path: "random/export/runtime-health.json",
Content: []byte(`{"status":"OK"}`),
},
}
if got := p.Detect(files); got != 0 {
t.Fatalf("expected detect score 0, got %d", got)
}
}
func TestParseBeeAuditSnapshot(t *testing.T) {
p := &Parser{}
files := []parser.ExtractedFile{
{
Path: "bee-support-debian-20260325-162030/manifest.txt",
Content: []byte("bee_version=1.0.0\nhost=debian\ngenerated_at_utc=2026-03-25T16:20:30Z\nexport_dir=/appdata/bee/export\n"),
},
{
Path: "bee-support-debian-20260325-162030/export/bee-audit.json",
Content: []byte(`{
"source_type": "manual",
"target_host": "debian",
"collected_at": "2026-03-25T16:08:09Z",
"runtime": {
"status": "PARTIAL",
"checked_at": "2026-03-25T16:07:56Z",
"network_status": "OK",
"issues": [
{
"code": "nvidia_kernel_module_missing",
"severity": "warning",
"description": "NVIDIA kernel module is not loaded."
}
],
"services": [
{
"name": "bee-web",
"status": "inactive"
}
]
},
"hardware": {
"board": {
"manufacturer": "Supermicro",
"product_name": "AS-4124GQ-TNMI",
"serial_number": "S490387X4418273",
"part_number": "H12DGQ-NT6",
"uuid": "d868ae00-a61f-11ee-8000-7cc255e10309"
},
"firmware": [
{
"device_name": "BIOS",
"version": "2.8"
}
],
"cpus": [
{
"status": "OK",
"status_checked_at": "2026-03-25T16:08:09Z",
"socket": 1,
"model": "AMD EPYC 7763 64-Core Processor",
"cores": 64,
"threads": 128,
"frequency_mhz": 2450,
"max_frequency_mhz": 3525
}
],
"memory": [
{
"status": "OK",
"status_checked_at": "2026-03-25T16:08:09Z",
"slot": "P1-DIMMA1",
"location": "P0_Node0_Channel0_Dimm0",
"present": true,
"size_mb": 32768,
"type": "DDR4",
"max_speed_mhz": 3200,
"current_speed_mhz": 2933,
"manufacturer": "SK Hynix",
"serial_number": "80AD01224887286666",
"part_number": "HMA84GR7DJR4N-XN"
}
],
"storage": [
{
"status": "Unknown",
"status_checked_at": "2026-03-25T16:08:09Z",
"slot": "nvme0n1",
"type": "NVMe",
"model": "KCD6XLUL960G",
"serial_number": "2470A00XT5M8",
"interface": "NVMe",
"present": true
}
],
"pcie_devices": [
{
"status": "OK",
"status_checked_at": "2026-03-25T16:08:09Z",
"slot": "0000:05:00.0",
"vendor_id": 5555,
"device_id": 4123,
"device_class": "EthernetController",
"manufacturer": "Mellanox Technologies",
"model": "MT28908 Family [ConnectX-6]",
"link_width": 16,
"link_speed": "Gen4",
"max_link_width": 16,
"max_link_speed": "Gen4",
"mac_addresses": ["94:6d:ae:9a:75:4a"],
"present": true
}
],
"sensors": {
"power": [
{
"name": "PPT",
"location": "amdgpu-pci-1100",
"power_w": 95
}
],
"temperatures": [
{
"name": "Composite",
"location": "nvme-pci-0600",
"celsius": 28.85,
"threshold_warning_celsius": 72.85,
"threshold_critical_celsius": 81.85,
"status": "OK"
}
]
}
}
}`),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatal("expected hardware to be populated")
}
if result.TargetHost != "debian" {
t.Fatalf("expected target host debian, got %q", result.TargetHost)
}
wantCollectedAt := time.Date(2026, 3, 25, 16, 8, 9, 0, time.UTC)
if !result.CollectedAt.Equal(wantCollectedAt) {
t.Fatalf("expected collected_at %s, got %s", wantCollectedAt, result.CollectedAt)
}
if result.Hardware.BoardInfo.SerialNumber != "S490387X4418273" {
t.Fatalf("unexpected board serial %q", result.Hardware.BoardInfo.SerialNumber)
}
if len(result.Hardware.CPUs) != 1 {
t.Fatalf("expected 1 cpu, got %d", len(result.Hardware.CPUs))
}
if len(result.Hardware.Memory) != 1 {
t.Fatalf("expected 1 dimm, got %d", len(result.Hardware.Memory))
}
if len(result.Hardware.Storage) != 1 {
t.Fatalf("expected 1 storage device, got %d", len(result.Hardware.Storage))
}
if len(result.Hardware.PCIeDevices) != 1 {
t.Fatalf("expected 1 pcie device, got %d", len(result.Hardware.PCIeDevices))
}
if result.Hardware.PCIeDevices[0].BDF != "0000:05:00.0" {
t.Fatalf("expected BDF to be normalized from slot, got %q", result.Hardware.PCIeDevices[0].BDF)
}
if len(result.Sensors) != 2 {
t.Fatalf("expected 2 flattened sensors, got %d", len(result.Sensors))
}
if len(result.Events) < 3 {
t.Fatalf("expected runtime events to be created, got %d", len(result.Events))
}
if len(result.FRU) == 0 {
t.Fatal("expected board FRU fallback to be populated")
}
}

View File

@@ -216,6 +216,7 @@ func parseH3CG5(files []parser.ExtractedFile) *models.AnalysisResult {
}
result.Hardware.Storage = dedupeStorage(result.Hardware.Storage)
result.Hardware.Volumes = dedupeVolumes(result.Hardware.Volumes)
parser.ApplyManufacturedYearWeekFromFRU(result.FRU, result.Hardware)
return result
}
@@ -286,6 +287,7 @@ func parseH3CG6(files []parser.ExtractedFile) *models.AnalysisResult {
}
result.Hardware.Storage = dedupeStorage(result.Hardware.Storage)
result.Hardware.Volumes = dedupeVolumes(result.Hardware.Volumes)
parser.ApplyManufacturedYearWeekFromFRU(result.FRU, result.Hardware)
return result
}
@@ -3024,6 +3026,7 @@ func mergeStorage(dst *models.Storage, src models.Storage) {
}
setStorageString(&dst.Location, src.Location)
setStorageString(&dst.Status, normalizeStorageStatus(src.Status, src.Present || dst.Present))
dst.Details = mergeH3CDetails(dst.Details, src.Details)
}
func setStorageString(dst *string, value string) {
@@ -3275,6 +3278,22 @@ func mergePSU(dst *models.PSU, src models.PSU) {
setStorageString(&dst.PartNumber, src.PartNumber)
setStorageString(&dst.Firmware, src.Firmware)
setStorageString(&dst.Status, src.Status)
dst.Details = mergeH3CDetails(dst.Details, src.Details)
}
func mergeH3CDetails(primary, secondary map[string]any) map[string]any {
if len(secondary) == 0 {
return primary
}
if primary == nil {
primary = make(map[string]any, len(secondary))
}
for key, value := range secondary {
if _, ok := primary[key]; !ok {
primary[key] = value
}
}
return primary
}
func dedupeVolumes(items []models.StorageVolume) []models.StorageVolume {

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,316 @@
package hpe_ilo_ahs
import (
"bytes"
"compress/gzip"
"encoding/binary"
"os"
"path/filepath"
"testing"
"git.mchus.pro/mchus/logpile/internal/parser"
)
func TestDetectAHS(t *testing.T) {
p := &Parser{}
score := p.Detect([]parser.ExtractedFile{{
Path: "HPE_CZ2D1X0GS3_20260330.ahs",
Content: makeAHSArchive(t, []ahsTestEntry{{Name: "CUST_INFO.DAT", Payload: []byte("x")}}),
}})
if score < 80 {
t.Fatalf("expected high confidence detect, got %d", score)
}
}
func TestParseAHSInventory(t *testing.T) {
p := &Parser{}
content := makeAHSArchive(t, []ahsTestEntry{
{Name: "CUST_INFO.DAT", Payload: make([]byte, 16)},
{Name: "0000088-2026-03-30.zbb", Payload: gzipBytes(t, []byte(sampleInventoryBlob()))},
{Name: "bcert.pkg", Payload: []byte(sampleBCertBlob())},
})
result, err := p.Parse([]parser.ExtractedFile{{
Path: "HPE_CZ2D1X0GS3_20260330.ahs",
Content: content,
}})
if err != nil {
t.Fatalf("parse failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
board := result.Hardware.BoardInfo
if board.Manufacturer != "HPE" {
t.Fatalf("unexpected board manufacturer: %q", board.Manufacturer)
}
if board.ProductName != "ProLiant DL380 Gen11" {
t.Fatalf("unexpected board product: %q", board.ProductName)
}
if board.SerialNumber != "CZ2D1X0GS3" {
t.Fatalf("unexpected board serial: %q", board.SerialNumber)
}
if board.PartNumber != "P52560-421" {
t.Fatalf("unexpected board part number: %q", board.PartNumber)
}
if len(result.Hardware.CPUs) != 1 || result.Hardware.CPUs[0].Model != "Intel(R) Xeon(R) Gold 6444Y" {
t.Fatalf("unexpected CPUs: %+v", result.Hardware.CPUs)
}
if len(result.Hardware.Memory) != 1 {
t.Fatalf("expected one DIMM, got %d", len(result.Hardware.Memory))
}
if result.Hardware.Memory[0].PartNumber != "HMCG88AEBRA115N" {
t.Fatalf("unexpected DIMM part number: %q", result.Hardware.Memory[0].PartNumber)
}
if len(result.Hardware.NetworkAdapters) != 2 {
t.Fatalf("expected two network adapters, got %d", len(result.Hardware.NetworkAdapters))
}
if len(result.Hardware.PowerSupply) != 1 {
t.Fatalf("expected one PSU, got %d", len(result.Hardware.PowerSupply))
}
if result.Hardware.PowerSupply[0].SerialNumber != "5XUWB0C4DJG4BV" {
t.Fatalf("unexpected PSU serial: %q", result.Hardware.PowerSupply[0].SerialNumber)
}
if result.Hardware.PowerSupply[0].Firmware != "2.00" {
t.Fatalf("unexpected PSU firmware: %q", result.Hardware.PowerSupply[0].Firmware)
}
if len(result.Hardware.Storage) != 1 {
t.Fatalf("expected one physical drive, got %d", len(result.Hardware.Storage))
}
drive := result.Hardware.Storage[0]
if drive.Model != "SAMSUNGMZ7L3480HCHQ-00A07" {
t.Fatalf("unexpected drive model: %q", drive.Model)
}
if drive.SerialNumber != "S664NC0Y502720" {
t.Fatalf("unexpected drive serial: %q", drive.SerialNumber)
}
if drive.SizeGB != 480 {
t.Fatalf("unexpected drive size: %d", drive.SizeGB)
}
if len(result.Hardware.Firmware) == 0 {
t.Fatalf("expected firmware inventory")
}
foundILO := false
foundControllerFW := false
foundNICFW := false
foundBackplaneFW := false
for _, item := range result.Hardware.Firmware {
if item.DeviceName == "iLO 6" && item.Version == "v1.63p20" {
foundILO = true
}
if item.DeviceName == "HPE MR408i-o Gen11" && item.Version == "52.26.3-5379" {
foundControllerFW = true
}
if item.DeviceName == "BCM 5719 1Gb 4p BASE-T OCP Adptr" && item.Version == "20.28.41" {
foundNICFW = true
}
if item.DeviceName == "8 SFF 24G x1NVMe/SAS UBM3 BC BP" && item.Version == "1.24" {
foundBackplaneFW = true
}
}
if !foundILO {
t.Fatalf("expected iLO firmware entry")
}
if !foundControllerFW {
t.Fatalf("expected controller firmware entry")
}
if !foundNICFW {
t.Fatalf("expected broadcom firmware entry")
}
if !foundBackplaneFW {
t.Fatalf("expected backplane firmware entry")
}
broadcomFound := false
backplaneFound := false
for _, nic := range result.Hardware.NetworkAdapters {
if nic.SerialNumber == "1CH0150001" && nic.Firmware == "20.28.41" {
broadcomFound = true
}
}
for _, dev := range result.Hardware.Devices {
if dev.DeviceClass == "storage_backplane" && dev.Firmware == "1.24" {
backplaneFound = true
}
}
if !broadcomFound {
t.Fatalf("expected broadcom adapter firmware to be enriched")
}
if !backplaneFound {
t.Fatalf("expected backplane canonical device")
}
if len(result.Hardware.Devices) < 6 {
t.Fatalf("expected canonical devices, got %d", len(result.Hardware.Devices))
}
if len(result.Events) == 0 {
t.Fatalf("expected parsed events")
}
}
func TestParseExampleAHS(t *testing.T) {
path := filepath.Join("..", "..", "..", "..", "example", "HPE_CZ2D1X0GS3_20260330.ahs")
content, err := os.ReadFile(path)
if err != nil {
t.Skipf("example fixture unavailable: %v", err)
}
p := &Parser{}
result, err := p.Parse([]parser.ExtractedFile{{
Path: filepath.Base(path),
Content: content,
}})
if err != nil {
t.Fatalf("parse example failed: %v", err)
}
if result.Hardware == nil {
t.Fatalf("expected hardware section")
}
board := result.Hardware.BoardInfo
if board.ProductName != "ProLiant DL380 Gen11" {
t.Fatalf("unexpected board product: %q", board.ProductName)
}
if board.SerialNumber != "CZ2D1X0GS3" {
t.Fatalf("unexpected board serial: %q", board.SerialNumber)
}
if len(result.Hardware.Storage) < 2 {
t.Fatalf("expected at least two drives, got %d", len(result.Hardware.Storage))
}
if len(result.Hardware.PowerSupply) != 2 {
t.Fatalf("expected exactly two PSUs, got %d: %+v", len(result.Hardware.PowerSupply), result.Hardware.PowerSupply)
}
foundController := false
foundBackplaneFW := false
foundNICFW := false
for _, device := range result.Hardware.Devices {
if device.Model == "HPE MR408i-o Gen11" && device.SerialNumber == "PXSFQ0BBIJY3B3" {
foundController = true
}
if device.DeviceClass == "storage_backplane" && device.Firmware == "1.24" {
foundBackplaneFW = true
}
}
if !foundController {
t.Fatalf("expected MR408i-o controller in canonical devices")
}
for _, fw := range result.Hardware.Firmware {
if fw.DeviceName == "BCM 5719 1Gb 4p BASE-T OCP Adptr" && fw.Version == "20.28.41" {
foundNICFW = true
}
}
if !foundBackplaneFW {
t.Fatalf("expected backplane device in canonical devices")
}
if !foundNICFW {
t.Fatalf("expected broadcom firmware from bcert/pkg lockdown")
}
}
type ahsTestEntry struct {
Name string
Payload []byte
Flag uint32
}
func makeAHSArchive(t *testing.T, entries []ahsTestEntry) []byte {
t.Helper()
var buf bytes.Buffer
for _, entry := range entries {
header := make([]byte, ahsHeaderSize)
copy(header[:4], []byte("ABJR"))
binary.LittleEndian.PutUint16(header[4:6], 0x0300)
binary.LittleEndian.PutUint16(header[6:8], 0x0002)
binary.LittleEndian.PutUint32(header[8:12], uint32(len(entry.Payload)))
flag := entry.Flag
if flag == 0 {
flag = 0x80000002
if len(entry.Payload) >= 2 && entry.Payload[0] == 0x1f && entry.Payload[1] == 0x8b {
flag = 0x80000001
}
}
binary.LittleEndian.PutUint32(header[16:20], flag)
copy(header[20:52], []byte(entry.Name))
buf.Write(header)
buf.Write(entry.Payload)
}
return buf.Bytes()
}
func gzipBytes(t *testing.T, payload []byte) []byte {
t.Helper()
var buf bytes.Buffer
zw := gzip.NewWriter(&buf)
if _, err := zw.Write(payload); err != nil {
t.Fatalf("gzip payload: %v", err)
}
if err := zw.Close(); err != nil {
t.Fatalf("close gzip writer: %v", err)
}
return buf.Bytes()
}
func sampleInventoryBlob() string {
return stringsJoin(
"iLO 6 v1.63p20 built on Sep 13 2024",
"HPE",
"ProLiant DL380 Gen11",
"CZ2D1X0GS3",
"P52560-421",
"Proc 1",
"Intel(R) Corporation",
"Intel(R) Xeon(R) Gold 6444Y",
"PROC 1 DIMM 3",
"Hynix",
"HMCG88AEBRA115N",
"2B5F92C6",
"Power Supply 1",
"5XUWB0C4DJG4BV",
"P03178-B21",
"PciRoot(0x1)/Pci(0x5,0x0)/Pci(0x0,0x0)",
"NIC.Slot.1.1",
"Network Controller",
"Slot 1",
"MCX512A-ACAT",
"MT2230478382",
"PciRoot(0x3)/Pci(0x1,0x0)/Pci(0x0,0x0)",
"OCP.Slot.15.1",
"Broadcom NetXtreme Gigabit Ethernet - NIC",
"OCP Slot 15",
"P51183-001",
"1CH0150001",
"20.28.41",
"System ROM",
"v2.22 (06/19/2024)",
"03/30/2026 09:47:33",
"iLO network link down.",
`{"@odata.id":"/redfish/v1/Systems/1/Storage/DE00A000/Controllers/0","@odata.type":"#StorageController.v1_7_0.StorageController","Id":"0","Name":"HPE MR408i-o Gen11","FirmwareVersion":"52.26.3-5379","Manufacturer":"HPE","Model":"HPE MR408i-o Gen11","PartNumber":"P58543-001","SKU":"P58335-B21","SerialNumber":"PXSFQ0BBIJY3B3","Status":{"State":"Enabled","Health":"OK"},"Location":{"PartLocation":{"ServiceLabel":"Slot=14","LocationType":"Slot","LocationOrdinalValue":14}},"PCIeInterface":{"PCIeType":"Gen4","LanesInUse":8}}`,
`{"@odata.id":"/redfish/v1/Fabrics/DE00A000","@odata.type":"#Fabric.v1_3_0.Fabric","Id":"DE00A000","Name":"8 SFF 24G x1NVMe/SAS UBM3 BC BP","FabricType":"MultiProtocol"}`,
`{"@odata.id":"/redfish/v1/Fabrics/DE00A000/Switches/1","@odata.type":"#Switch.v1_9_1.Switch","Id":"1","Name":"Direct Attached","Model":"UBM3","FirmwareVersion":"1.24","SupportedProtocols":["SAS","SATA","NVMe"],"SwitchType":"MultiProtocol","Status":{"State":"Enabled","Health":"OK"}}`,
`{"@odata.id":"/redfish/v1/Chassis/DE00A000/Drives/0","@odata.type":"#Drive.v1_17_0.Drive","Id":"0","Name":"480GB 6G SATA SSD","Status":{"State":"StandbyOffline","Health":"OK"},"PhysicalLocation":{"PartLocation":{"ServiceLabel":"Slot=14:Port=1:Box=3:Bay=1","LocationType":"Bay","LocationOrdinalValue":1}},"CapacityBytes":480103981056,"MediaType":"SSD","Model":"SAMSUNGMZ7L3480HCHQ-00A07","Protocol":"SATA","Revision":"JXTC604Q","SerialNumber":"S664NC0Y502720","PredictedMediaLifeLeftPercent":100}`,
`{"@odata.id":"/redfish/v1/Chassis/DE00A000/Drives/64515","@odata.type":"#Drive.v1_17_0.Drive","Id":"64515","Name":"Empty Bay","Status":{"State":"Absent","Health":"OK"}}`,
)
}
func sampleBCertBlob() string {
return `<BC><MfgRecord><PowerSupplySlot id="0"><Present>Yes</Present><SerialNumber>5XUWB0C4DJG4BV</SerialNumber><FirmwareVersion>2.00</FirmwareVersion><SparePartNumber>P44412-001</SparePartNumber></PowerSupplySlot><FirmwareLockdown><SystemProgrammableLogicDevice>0x12</SystemProgrammableLogicDevice><ServerPlatformServicesSPSFirmware>6.1.4.47</ServerPlatformServicesSPSFirmware><STMicroGen11TPM>1.512</STMicroGen11TPM><HPEMR408i-oGen11>52.26.3-5379</HPEMR408i-oGen11><UBM3>UBM3/1.24</UBM3><BCM57191Gb4pBASE-TOCP3>20.28.41</BCM57191Gb4pBASE-TOCP3></FirmwareLockdown></MfgRecord></BC>`
}
func stringsJoin(parts ...string) string {
return string(bytes.Join(func() [][]byte {
out := make([][]byte, 0, len(parts))
for _, part := range parts {
out = append(out, []byte(part))
}
return out
}(), []byte{0}))
}

View File

@@ -94,8 +94,12 @@ type AssetJSON struct {
} `json:"PcieInfo"`
}
// ParseAssetJSON parses Inspur asset.json content
func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
// ParseAssetJSON parses Inspur asset.json content.
// - pcieSlotDeviceNames: optional map from integer PCIe slot ID to device name string,
// sourced from devicefrusdr.log PCIe REST section. Fills missing NVMe model names.
// - pcieSlotSerials: optional map from integer PCIe slot ID to serial number string,
// sourced from audit.log SN-changed events. Fills missing NVMe serial numbers.
func ParseAssetJSON(content []byte, pcieSlotDeviceNames map[int]string, pcieSlotSerials map[int]string) (*models.HardwareConfig, error) {
var asset AssetJSON
if err := json.Unmarshal(content, &asset); err != nil {
return nil, err
@@ -175,6 +179,23 @@ func ParseAssetJSON(content []byte) (*models.HardwareConfig, error) {
continue
}
// Enrich model name from PCIe device name (supplied from devicefrusdr.log).
// BMC does not populate HddInfo.ModelName for NVMe drives, but the PCIe REST
// section in devicefrusdr.log carries the drive model as device_name.
if modelName == "" && hdd.PcieSlot > 0 && len(pcieSlotDeviceNames) > 0 {
if devName, ok := pcieSlotDeviceNames[hdd.PcieSlot]; ok && devName != "" {
modelName = devName
}
}
// Enrich serial number from audit.log SN-changed events (supplied via pcieSlotSerials).
// BMC asset.json does not carry NVMe serial numbers; audit.log logs every SN change.
if serial == "" && hdd.PcieSlot > 0 && len(pcieSlotSerials) > 0 {
if sn, ok := pcieSlotSerials[hdd.PcieSlot]; ok && sn != "" {
serial = sn
}
}
storageType := "HDD"
if hdd.DiskInterfaceType == 5 {
storageType = "NVMe"

View File

@@ -28,7 +28,7 @@ func TestParseAssetJSON_NVIDIAGPUModelFromPCIIDs(t *testing.T) {
}]
}`)
hw, err := ParseAssetJSON(raw)
hw, err := ParseAssetJSON(raw, nil, nil)
if err != nil {
t.Fatalf("ParseAssetJSON failed: %v", err)
}

94
internal/parser/vendors/inspur/audit.go vendored Normal file
View File

@@ -0,0 +1,94 @@
package inspur
import (
"fmt"
"regexp"
"strconv"
"strings"
)
// auditSNChangedNVMeRegex matches:
// "Front Back Plane N NVMe DiskM SN changed from X to Y"
// Captures: disk_num, new_serial
var auditSNChangedNVMeRegex = regexp.MustCompile(`NVMe Disk(\d+)\s+SN changed from \S+\s+to\s+(\S+)`)
// auditSNChangedRAIDRegex matches:
// "Raid(Pcie Slot:N) HDD(enclosure id:E slot:S) SN changed from X to Y"
// Captures: pcie_slot, enclosure_id, slot_num, new_serial
var auditSNChangedRAIDRegex = regexp.MustCompile(`Raid\(Pcie Slot:(\d+)\) HDD\(enclosure id:(\d+) slot:(\d+)\)\s+SN changed from \S+\s+to\s+(\S+)`)
// ParseAuditLogNVMeSerials parses audit.log and returns the final (latest) serial number
// per NVMe disk number. The disk number matches the numeric suffix in PCIe location
// strings like "#NVME0", "#NVME2", etc. from devicefrusdr.log.
// Entries where the serial changed to "NULL" are excluded.
func ParseAuditLogNVMeSerials(content []byte) map[int]string {
serials := make(map[int]string)
for _, line := range strings.Split(string(content), "\n") {
m := auditSNChangedNVMeRegex.FindStringSubmatch(line)
if m == nil {
continue
}
diskNum, err := strconv.Atoi(m[1])
if err != nil {
continue
}
serial := strings.TrimSpace(m[2])
if strings.EqualFold(serial, "NULL") || serial == "" {
delete(serials, diskNum)
} else {
serials[diskNum] = serial
}
}
if len(serials) == 0 {
return nil
}
return serials
}
// ParseAuditLogRAIDSerials parses audit.log and returns the final (latest) serial number
// per RAID backplane disk. Key format is "BP{enclosure_id-1}:{slot_num}" (e.g. "BP0:0").
//
// Each disk slot is claimed by a specific RAID controller (Pcie Slot:N). NULL events from
// an old controller do not clear serials assigned by a newer controller, preventing stale
// deletions when disks are migrated between RAID arrays.
func ParseAuditLogRAIDSerials(content []byte) map[string]string {
// owner tracks which PCIe RAID controller slot last assigned a serial to a disk key.
serials := make(map[string]string)
owner := make(map[string]int)
for _, line := range strings.Split(string(content), "\n") {
m := auditSNChangedRAIDRegex.FindStringSubmatch(line)
if m == nil {
continue
}
pcieSlot, err := strconv.Atoi(m[1])
if err != nil {
continue
}
enclosureID, err := strconv.Atoi(m[2])
if err != nil {
continue
}
slotNum, err := strconv.Atoi(m[3])
if err != nil {
continue
}
serial := strings.TrimSpace(m[4])
key := fmt.Sprintf("BP%d:%d", enclosureID-1, slotNum)
if strings.EqualFold(serial, "NULL") || serial == "" {
// Only clear if this controller was the last to set the serial.
if owner[key] == pcieSlot {
delete(serials, key)
delete(owner, key)
}
} else {
serials[key] = serial
owner[key] = pcieSlot
}
}
if len(serials) == 0 {
return nil
}
return serials
}

View File

@@ -100,10 +100,18 @@ func parseMemoryInfo(text string, hw *models.HardwareConfig) {
return
}
// Replace memory data with detailed info from component.log
hw.Memory = nil
var merged []models.MemoryDIMM
seen := make(map[string]int)
for _, existing := range hw.Memory {
key := inspurMemoryKey(existing)
if key == "" {
continue
}
seen[key] = len(merged)
merged = append(merged, existing)
}
for _, mem := range memInfo.MemModules {
hw.Memory = append(hw.Memory, models.MemoryDIMM{
item := models.MemoryDIMM{
Slot: mem.MemModSlot,
Location: mem.MemModSlot,
Present: mem.MemModStatus == 1 && mem.MemModSize > 0,
@@ -117,8 +125,18 @@ func parseMemoryInfo(text string, hw *models.HardwareConfig) {
PartNumber: strings.TrimSpace(mem.MemModPartNum),
Status: mem.Status,
Ranks: mem.MemModRanks,
})
}
key := inspurMemoryKey(item)
if idx, ok := seen[key]; ok {
mergeInspurMemoryDIMM(&merged[idx], item)
continue
}
if key != "" {
seen[key] = len(merged)
}
merged = append(merged, item)
}
hw.Memory = merged
}
// PSURESTInfo represents the RESTful PSU info structure
@@ -159,10 +177,18 @@ func parsePSUInfo(text string, hw *models.HardwareConfig) {
return
}
// Clear existing PSU data and populate with RESTful data
hw.PowerSupply = nil
var merged []models.PSU
seen := make(map[string]int)
for _, existing := range hw.PowerSupply {
key := inspurPSUKey(existing)
if key == "" {
continue
}
seen[key] = len(merged)
merged = append(merged, existing)
}
for _, psu := range psuInfo.PowerSupplies {
hw.PowerSupply = append(hw.PowerSupply, models.PSU{
item := models.PSU{
Slot: fmt.Sprintf("PSU%d", psu.ID),
Present: psu.Present == 1,
Model: strings.TrimSpace(psu.Model),
@@ -178,8 +204,18 @@ func parsePSUInfo(text string, hw *models.HardwareConfig) {
InputVoltage: psu.PSInVolt,
OutputVoltage: psu.PSOutVolt,
TemperatureC: psu.PSUMaxTemp,
})
}
key := inspurPSUKey(item)
if idx, ok := seen[key]; ok {
mergeInspurPSU(&merged[idx], item)
continue
}
if key != "" {
seen[key] = len(merged)
}
merged = append(merged, item)
}
hw.PowerSupply = merged
}
// HDDRESTInfo represents the RESTful HDD info structure
@@ -357,7 +393,16 @@ func parseNetworkAdapterInfo(text string, hw *models.HardwareConfig) {
return
}
hw.NetworkAdapters = nil
var merged []models.NetworkAdapter
seen := make(map[string]int)
for _, existing := range hw.NetworkAdapters {
key := inspurNICKey(existing)
if key == "" {
continue
}
seen[key] = len(merged)
merged = append(merged, existing)
}
for _, adapter := range netInfo.SysAdapters {
var macs []string
for _, port := range adapter.Ports {
@@ -377,7 +422,7 @@ func parseNetworkAdapterInfo(text string, hw *models.HardwareConfig) {
vendor = normalizeModelLabel(pciids.VendorName(adapter.VendorID))
}
hw.NetworkAdapters = append(hw.NetworkAdapters, models.NetworkAdapter{
item := models.NetworkAdapter{
Slot: fmt.Sprintf("Slot %d", adapter.Slot),
Location: adapter.Location,
Present: adapter.Present == 1,
@@ -392,8 +437,231 @@ func parseNetworkAdapterInfo(text string, hw *models.HardwareConfig) {
PortType: adapter.PortType,
MACAddresses: macs,
Status: adapter.Status,
})
}
key := inspurNICKey(item)
if idx, ok := seen[key]; ok {
mergeInspurNIC(&merged[idx], item)
continue
}
if slotIdx := inspurFindNICBySlot(merged, item.Slot); slotIdx >= 0 {
mergeInspurNIC(&merged[slotIdx], item)
if key != "" {
seen[key] = slotIdx
}
continue
}
if key != "" {
seen[key] = len(merged)
}
merged = append(merged, item)
}
hw.NetworkAdapters = merged
}
func inspurMemoryKey(item models.MemoryDIMM) string {
return strings.ToLower(strings.TrimSpace(inspurFirstNonEmpty(item.SerialNumber, item.Slot, item.Location)))
}
func mergeInspurMemoryDIMM(dst *models.MemoryDIMM, src models.MemoryDIMM) {
if dst == nil {
return
}
if strings.TrimSpace(dst.Slot) == "" {
dst.Slot = src.Slot
}
if strings.TrimSpace(dst.Location) == "" {
dst.Location = src.Location
}
dst.Present = dst.Present || src.Present
if dst.SizeMB == 0 {
dst.SizeMB = src.SizeMB
}
if strings.TrimSpace(dst.Type) == "" {
dst.Type = src.Type
}
if strings.TrimSpace(dst.Technology) == "" {
dst.Technology = src.Technology
}
if dst.MaxSpeedMHz == 0 {
dst.MaxSpeedMHz = src.MaxSpeedMHz
}
if dst.CurrentSpeedMHz == 0 {
dst.CurrentSpeedMHz = src.CurrentSpeedMHz
}
if strings.TrimSpace(dst.Manufacturer) == "" {
dst.Manufacturer = src.Manufacturer
}
if strings.TrimSpace(dst.SerialNumber) == "" {
dst.SerialNumber = src.SerialNumber
}
if strings.TrimSpace(dst.PartNumber) == "" {
dst.PartNumber = src.PartNumber
}
if strings.TrimSpace(dst.Status) == "" {
dst.Status = src.Status
}
if dst.Ranks == 0 {
dst.Ranks = src.Ranks
}
}
func inspurPSUKey(item models.PSU) string {
return strings.ToLower(strings.TrimSpace(inspurFirstNonEmpty(item.SerialNumber, item.Slot, item.Model)))
}
func mergeInspurPSU(dst *models.PSU, src models.PSU) {
if dst == nil {
return
}
if strings.TrimSpace(dst.Slot) == "" {
dst.Slot = src.Slot
}
dst.Present = dst.Present || src.Present
if strings.TrimSpace(dst.Model) == "" {
dst.Model = src.Model
}
if strings.TrimSpace(dst.Vendor) == "" {
dst.Vendor = src.Vendor
}
if dst.WattageW == 0 {
dst.WattageW = src.WattageW
}
if strings.TrimSpace(dst.SerialNumber) == "" {
dst.SerialNumber = src.SerialNumber
}
if strings.TrimSpace(dst.PartNumber) == "" {
dst.PartNumber = src.PartNumber
}
if strings.TrimSpace(dst.Firmware) == "" {
dst.Firmware = src.Firmware
}
if strings.TrimSpace(dst.Status) == "" {
dst.Status = src.Status
}
if strings.TrimSpace(dst.InputType) == "" {
dst.InputType = src.InputType
}
if dst.InputPowerW == 0 {
dst.InputPowerW = src.InputPowerW
}
if dst.OutputPowerW == 0 {
dst.OutputPowerW = src.OutputPowerW
}
if dst.InputVoltage == 0 {
dst.InputVoltage = src.InputVoltage
}
if dst.OutputVoltage == 0 {
dst.OutputVoltage = src.OutputVoltage
}
if dst.TemperatureC == 0 {
dst.TemperatureC = src.TemperatureC
}
}
func inspurNICKey(item models.NetworkAdapter) string {
return strings.ToLower(strings.TrimSpace(inspurFirstNonEmpty(item.SerialNumber, strings.Join(item.MACAddresses, ","), item.Slot, item.Location)))
}
func mergeInspurNIC(dst *models.NetworkAdapter, src models.NetworkAdapter) {
if dst == nil {
return
}
if strings.TrimSpace(dst.Slot) == "" {
dst.Slot = src.Slot
}
if strings.TrimSpace(dst.Location) == "" {
dst.Location = src.Location
}
dst.Present = dst.Present || src.Present
if strings.TrimSpace(dst.BDF) == "" {
dst.BDF = src.BDF
}
if strings.TrimSpace(dst.Model) == "" {
dst.Model = src.Model
}
if strings.TrimSpace(dst.Description) == "" {
dst.Description = src.Description
}
if strings.TrimSpace(dst.Vendor) == "" {
dst.Vendor = src.Vendor
}
if dst.VendorID == 0 {
dst.VendorID = src.VendorID
}
if dst.DeviceID == 0 {
dst.DeviceID = src.DeviceID
}
if strings.TrimSpace(dst.SerialNumber) == "" {
dst.SerialNumber = src.SerialNumber
}
if strings.TrimSpace(dst.PartNumber) == "" {
dst.PartNumber = src.PartNumber
}
if strings.TrimSpace(dst.Firmware) == "" {
dst.Firmware = src.Firmware
}
if dst.PortCount == 0 {
dst.PortCount = src.PortCount
}
if strings.TrimSpace(dst.PortType) == "" {
dst.PortType = src.PortType
}
if dst.LinkWidth == 0 {
dst.LinkWidth = src.LinkWidth
}
if strings.TrimSpace(dst.LinkSpeed) == "" {
dst.LinkSpeed = src.LinkSpeed
}
if dst.MaxLinkWidth == 0 {
dst.MaxLinkWidth = src.MaxLinkWidth
}
if strings.TrimSpace(dst.MaxLinkSpeed) == "" {
dst.MaxLinkSpeed = src.MaxLinkSpeed
}
if dst.NUMANode == 0 {
dst.NUMANode = src.NUMANode
}
if strings.TrimSpace(dst.Status) == "" {
dst.Status = src.Status
}
for _, mac := range src.MACAddresses {
mac = strings.TrimSpace(mac)
if mac == "" {
continue
}
found := false
for _, existing := range dst.MACAddresses {
if strings.EqualFold(strings.TrimSpace(existing), mac) {
found = true
break
}
}
if !found {
dst.MACAddresses = append(dst.MACAddresses, mac)
}
}
}
func inspurFindNICBySlot(items []models.NetworkAdapter, slot string) int {
slot = strings.ToLower(strings.TrimSpace(slot))
if slot == "" {
return -1
}
for i := range items {
if strings.ToLower(strings.TrimSpace(items[i].Slot)) == slot {
return i
}
}
return -1
}
func inspurFirstNonEmpty(values ...string) string {
for _, value := range values {
if strings.TrimSpace(value) != "" {
return strings.TrimSpace(value)
}
}
return ""
}
func parseFanSensors(text string) []models.SensorReading {
@@ -713,6 +981,63 @@ func extractComponentFirmware(text string, hw *models.HardwareConfig) {
}
}
}
// Extract BMC, CPLD and VR firmware from RESTful version info section.
// The JSON is a flat array: [{"id":N,"dev_name":"...","dev_version":"..."}, ...]
reVer := regexp.MustCompile(`RESTful version info:\s*(\[[\s\S]*?\])\s*RESTful`)
if match := reVer.FindStringSubmatch(text); match != nil {
type verEntry struct {
DevName string `json:"dev_name"`
DevVersion string `json:"dev_version"`
}
var entries []verEntry
if err := json.Unmarshal([]byte(match[1]), &entries); err == nil {
for _, e := range entries {
name := normalizeVersionInfoName(e.DevName)
if name == "" {
continue
}
version := strings.TrimSpace(e.DevVersion)
if version == "" {
continue
}
if existingFW[name] {
continue
}
hw.Firmware = append(hw.Firmware, models.FirmwareInfo{
DeviceName: name,
Version: version,
})
existingFW[name] = true
}
}
}
}
// normalizeVersionInfoName converts RESTful version info dev_name to a clean label.
// Returns "" for entries that should be skipped (inactive BMC, PSU slots).
func normalizeVersionInfoName(name string) string {
name = strings.TrimSpace(name)
if name == "" {
return ""
}
// Skip PSU_N entries — firmware already extracted from PSU info section.
if regexp.MustCompile(`(?i)^PSU_\d+$`).MatchString(name) {
return ""
}
// Skip the inactive BMC partition.
if strings.HasPrefix(strings.ToLower(name), "inactivate(") {
return ""
}
// Active BMC: "Activate(BMC1)" → "BMC"
if strings.HasPrefix(strings.ToLower(name), "activate(") {
return "BMC"
}
// Strip trailing "Version" suffix (case-insensitive), e.g. "MainBoard0CPLDVersion" → "MainBoard0CPLD"
if strings.HasSuffix(strings.ToLower(name), "version") {
name = name[:len(name)-len("version")]
}
return strings.TrimSpace(name)
}
// DiskBackplaneRESTInfo represents the RESTful diskbackplane info structure

View File

@@ -51,6 +51,64 @@ RESTful fan`
}
}
func TestParseNetworkAdapterInfo_MergesIntoExistingInventory(t *testing.T) {
text := `RESTful Network Adapter info:
{
"sys_adapters": [
{
"id": 1,
"name": "NIC1",
"Location": "#CPU0_PCIE4",
"present": 1,
"slot": 4,
"vendor_id": 32902,
"device_id": 5409,
"vendor": "Mellanox",
"model": "ConnectX-6",
"fw_ver": "22.1.0",
"status": "OK",
"sn": "",
"pn": "",
"port_num": 2,
"port_type": "QSFP",
"ports": [
{ "id": 1, "mac_addr": "00:11:22:33:44:55" }
]
}
]
}
RESTful fan`
hw := &models.HardwareConfig{
NetworkAdapters: []models.NetworkAdapter{
{
Slot: "Slot 4",
BDF: "0000:17:00.0",
SerialNumber: "NIC-SN-1",
Present: true,
},
},
}
parseNetworkAdapterInfo(text, hw)
if len(hw.NetworkAdapters) != 1 {
t.Fatalf("expected merged single adapter, got %d", len(hw.NetworkAdapters))
}
got := hw.NetworkAdapters[0]
if got.BDF != "0000:17:00.0" {
t.Fatalf("expected existing BDF to survive merge, got %q", got.BDF)
}
if got.Model != "ConnectX-6" {
t.Fatalf("expected model from component log, got %q", got.Model)
}
if got.SerialNumber != "NIC-SN-1" {
t.Fatalf("expected serial from existing inventory to survive merge, got %q", got.SerialNumber)
}
if len(got.MACAddresses) != 1 || got.MACAddresses[0] != "00:11:22:33:44:55" {
t.Fatalf("expected MAC addresses from component log, got %#v", got.MACAddresses)
}
}
func TestParseComponentLogSensors_ExtractsFanBackplaneAndPSUSummary(t *testing.T) {
text := `RESTful PSU info:
{

View File

@@ -0,0 +1,33 @@
package inspur
import "testing"
func TestParseIDLLog_UsesBMCSourceForEventLogs(t *testing.T) {
content := []byte(`|2025-12-02T17:54:27+08:00|MEMORY|Assert|Warning|0C180401|CPU1_C4D0 Memory Device Disabled - Assert|`)
events := ParseIDLLog(content)
if len(events) != 1 {
t.Fatalf("expected 1 event, got %d", len(events))
}
if events[0].Source != "BMC" {
t.Fatalf("expected IDL events to use BMC source, got %#v", events[0])
}
if events[0].SensorName != "CPU1_C4D0" {
t.Fatalf("expected extracted DIMM component ref, got %#v", events[0])
}
}
func TestParseSyslog_UsesHostSourceAndProcessAsSensorName(t *testing.T) {
content := []byte(`<13>2026-03-15T14:03:11+00:00 host123 systemd[1]: Started Example Service`)
events := ParseSyslog(content, "syslog/info")
if len(events) != 1 {
t.Fatalf("expected 1 event, got %d", len(events))
}
if events[0].Source != "syslog" {
t.Fatalf("expected syslog source, got %#v", events[0])
}
if events[0].SensorName != "systemd[1]" {
t.Fatalf("expected process name in sensor/component slot, got %#v", events[0])
}
}

View File

@@ -165,7 +165,10 @@ func TestParseIDLLog_ParsesStructuredJSONLine(t *testing.T) {
if events[0].ID != "17FFB002" {
t.Fatalf("expected event ID 17FFB002, got %q", events[0].ID)
}
if events[0].Source != "PCIE" {
t.Fatalf("expected source PCIE, got %q", events[0].Source)
if events[0].Source != "BMC" {
t.Fatalf("expected BMC source for IDL event, got %q", events[0].Source)
}
if events[0].SensorType != "pcie" {
t.Fatalf("expected component type pcie, got %#v", events[0])
}
}

View File

@@ -60,7 +60,7 @@ func ParseIDLLog(content []byte) []models.Event {
events = append(events, models.Event{
ID: eventID,
Timestamp: ts,
Source: component,
Source: "BMC",
SensorType: strings.ToLower(component),
SensorName: sensorName,
EventType: eventType,

View File

@@ -16,7 +16,7 @@ import (
// parserVersion - version of this parser module
// IMPORTANT: Increment this version when making changes to parser logic!
const parserVersion = "1.5"
const parserVersion = "1.8"
func init() {
parser.Register(&Parser{})
@@ -95,9 +95,41 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
Sensors: make([]models.SensorReading, 0),
}
// Pre-parse enrichment maps from devicefrusdr.log for use inside ParseAssetJSON.
// BMC does not populate HddInfo.ModelName or SerialNumber for NVMe drives.
var pcieSlotDeviceNames map[int]string
var nvmeLocToSlot map[int]int
if f := parser.FindFileByName(files, "devicefrusdr.log"); f != nil {
pcieSlotDeviceNames = ParsePCIeSlotDeviceNames(f.Content)
nvmeLocToSlot = ParsePCIeNVMeLocToSlot(f.Content)
}
// Parse NVMe serial numbers from audit.log: every disk SN change is logged there.
// Combine with the NVMe loc→slot mapping to build pcieSlot→serial map.
// Also parse RAID disk serials by backplane slot key (e.g. "BP0:0").
var pcieSlotSerials map[int]string
var raidSlotSerials map[string]string
if f := parser.FindFileByName(files, "audit.log"); f != nil {
if len(nvmeLocToSlot) > 0 {
nvmeDiskSerials := ParseAuditLogNVMeSerials(f.Content)
if len(nvmeDiskSerials) > 0 {
pcieSlotSerials = make(map[int]string, len(nvmeDiskSerials))
for diskNum, serial := range nvmeDiskSerials {
if slot, ok := nvmeLocToSlot[diskNum]; ok {
pcieSlotSerials[slot] = serial
}
}
if len(pcieSlotSerials) == 0 {
pcieSlotSerials = nil
}
}
}
raidSlotSerials = ParseAuditLogRAIDSerials(f.Content)
}
// Parse asset.json first (base hardware info)
if f := parser.FindFileByName(files, "asset.json"); f != nil {
if hw, err := ParseAssetJSON(f.Content); err == nil {
if hw, err := ParseAssetJSON(f.Content, pcieSlotDeviceNames, pcieSlotSerials); err == nil {
result.Hardware = hw
}
}
@@ -182,6 +214,10 @@ func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, er
if result.Hardware != nil {
applyGPUStatusFromEvents(result.Hardware, result.Events)
enrichStorageFromSerialFallbackFiles(files, result.Hardware)
// Apply RAID disk serials from audit.log (authoritative: last non-NULL SN change).
// These override redis/component.log serials which may be stale after disk replacement.
applyRAIDSlotSerials(result.Hardware, raidSlotSerials)
parser.ApplyManufacturedYearWeekFromFRU(result.FRU, result.Hardware)
}
return result, nil

View File

@@ -4,6 +4,7 @@ import (
"encoding/json"
"fmt"
"regexp"
"strconv"
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
@@ -37,6 +38,84 @@ type PCIeRESTInfo []struct {
FwVer string `json:"fw_ver"`
}
// ParsePCIeSlotDeviceNames parses devicefrusdr.log and returns a map from integer PCIe slot ID
// to device name string. Used to enrich HddInfo entries in asset.json that lack model names.
func ParsePCIeSlotDeviceNames(content []byte) map[int]string {
info, ok := parsePCIeRESTJSON(content)
if !ok {
return nil
}
result := make(map[int]string, len(info))
for _, entry := range info {
if entry.Slot <= 0 {
continue
}
name := sanitizePCIeDeviceName(entry.DeviceName)
if name != "" {
result[entry.Slot] = name
}
}
if len(result) == 0 {
return nil
}
return result
}
// parsePCIeRESTJSON parses the RESTful PCIE Device info JSON from devicefrusdr.log content.
func parsePCIeRESTJSON(content []byte) (PCIeRESTInfo, bool) {
text := string(content)
startMarker := "RESTful PCIE Device info:"
endMarker := "BMC sdr Info:"
startIdx := strings.Index(text, startMarker)
if startIdx == -1 {
return nil, false
}
endIdx := strings.Index(text[startIdx:], endMarker)
if endIdx == -1 {
endIdx = len(text) - startIdx
}
jsonText := strings.TrimSpace(text[startIdx+len(startMarker) : startIdx+endIdx])
var info PCIeRESTInfo
if err := json.Unmarshal([]byte(jsonText), &info); err != nil {
return nil, false
}
return info, true
}
// ParsePCIeNVMeLocToSlot parses devicefrusdr.log and returns a map from NVMe location number
// (the numeric suffix in "#NVME0", "#NVME2", etc.) to the integer PCIe slot ID.
// This is used to correlate audit.log NVMe disk numbers with HddInfo PcieSlot values.
func ParsePCIeNVMeLocToSlot(content []byte) map[int]int {
info, ok := parsePCIeRESTJSON(content)
if !ok {
return nil
}
nvmeLocRegex := regexp.MustCompile(`(?i)^#NVME(\d+)$`)
result := make(map[int]int)
for _, entry := range info {
if entry.Slot <= 0 {
continue
}
loc := strings.TrimSpace(entry.Location)
m := nvmeLocRegex.FindStringSubmatch(loc)
if m == nil {
continue
}
locNum, err := strconv.Atoi(m[1])
if err != nil {
continue
}
result[locNum] = entry.Slot
}
if len(result) == 0 {
return nil
}
return result
}
// ParsePCIeDevices parses RESTful PCIE Device info from devicefrusdr.log
func ParsePCIeDevices(content []byte) []models.PCIeDevice {
text := string(content)

View File

@@ -73,6 +73,24 @@ func looksLikeStorageSerial(v string) bool {
return hasLetter && hasDigit
}
// applyRAIDSlotSerials updates storage serial numbers using the slot→serial map
// derived from audit.log RAID SN change events. Overwrites existing serials since
// audit.log represents the authoritative current state after all disk replacements.
func applyRAIDSlotSerials(hw *models.HardwareConfig, serials map[string]string) {
if hw == nil || len(serials) == 0 {
return
}
for i := range hw.Storage {
slot := strings.TrimSpace(hw.Storage[i].Slot)
if slot == "" {
continue
}
if sn, ok := serials[slot]; ok && sn != "" {
hw.Storage[i].SerialNumber = sn
}
}
}
func applyStorageSerialFallback(hw *models.HardwareConfig, serials []string) {
if hw == nil || len(hw.Storage) == 0 || len(serials) == 0 {
return

View File

@@ -26,7 +26,7 @@ func TestParseAssetJSON_HddSlotFallbackAndPresence(t *testing.T) {
]
}`)
hw, err := ParseAssetJSON(content)
hw, err := ParseAssetJSON(content, nil, nil)
if err != nil {
t.Fatalf("ParseAssetJSON failed: %v", err)
}

View File

@@ -48,9 +48,9 @@ func ParseSyslog(content []byte, sourcePath string) []models.Event {
event := models.Event{
ID: generateEventID(sourcePath, lineNum),
Timestamp: timestamp,
Source: matches[4],
Source: "syslog",
SensorType: "syslog",
SensorName: matches[3],
SensorName: matches[4],
Description: matches[5],
Severity: severity,
RawData: line,

View File

@@ -5,11 +5,14 @@ package vendors
import (
// Import vendor modules to trigger their init() registration
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/easy_bee"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/h3c"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/hpe_ilo_ahs"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/nvidia"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/nvidia_bug_report"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/unraid"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/xfusion"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/xigmanas"
// Generic fallback parser (must be last for lowest priority)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,157 @@
// Package xfusion provides parser for xFusion iBMC diagnostic dump archives.
// Tested with: xFusion G5500 V7 iBMC dump (tar.gz format, exported via iBMC UI)
//
// Archive structure: dump_info/AppDump/... and dump_info/LogDump/...
//
// IMPORTANT: Increment parserVersion when modifying parser logic!
package xfusion
import (
"strings"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
const parserVersion = "1.1"
func init() {
parser.Register(&Parser{})
}
// Parser implements VendorParser for xFusion iBMC dump archives.
type Parser struct{}
func (p *Parser) Name() string { return "xFusion iBMC Dump Parser" }
func (p *Parser) Vendor() string { return "xfusion" }
func (p *Parser) Version() string { return parserVersion }
// Detect checks if files match the xFusion iBMC dump format.
// Returns confidence score 0-100.
func (p *Parser) Detect(files []parser.ExtractedFile) int {
confidence := 0
for _, f := range files {
path := strings.ToLower(f.Path)
switch {
case strings.Contains(path, "appdump/frudata/fruinfo.txt"):
confidence += 50
case strings.Contains(path, "rtosdump/versioninfo/app_revision.txt"):
confidence += 30
case strings.Contains(path, "appdump/sensor_alarm/sensor_info.txt"):
confidence += 10
case strings.Contains(path, "appdump/card_manage/card_info"):
confidence += 20
case strings.Contains(path, "logdump/netcard/netcard_info.txt"):
confidence += 20
}
if confidence >= 100 {
return 100
}
}
return confidence
}
// Parse parses xFusion iBMC dump and returns an analysis result.
func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, error) {
result := &models.AnalysisResult{
Events: make([]models.Event, 0),
FRU: make([]models.FRUInfo, 0),
Sensors: make([]models.SensorReading, 0),
Hardware: &models.HardwareConfig{
Firmware: make([]models.FirmwareInfo, 0),
Devices: make([]models.HardwareDevice, 0),
CPUs: make([]models.CPU, 0),
Memory: make([]models.MemoryDIMM, 0),
Storage: make([]models.Storage, 0),
Volumes: make([]models.StorageVolume, 0),
PCIeDevices: make([]models.PCIeDevice, 0),
GPUs: make([]models.GPU, 0),
NetworkCards: make([]models.NIC, 0),
NetworkAdapters: make([]models.NetworkAdapter, 0),
PowerSupply: make([]models.PSU, 0),
},
}
if f := findByAnyPath(files, "appdump/frudata/fruinfo.txt", "rtosdump/versioninfo/fruinfo.txt"); f != nil {
parseFRUInfo(f.Content, result)
}
if f := findByPath(files, "appdump/sensor_alarm/sensor_info.txt"); f != nil {
result.Sensors = parseSensorInfo(f.Content)
}
if f := findByPath(files, "appdump/cpumem/cpu_info"); f != nil {
result.Hardware.CPUs = parseCPUInfo(f.Content)
}
if f := findByPath(files, "appdump/cpumem/mem_info"); f != nil {
result.Hardware.Memory = parseMemInfo(f.Content)
}
var nicCards []xfusionNICCard
if f := findByPath(files, "appdump/card_manage/card_info"); f != nil {
gpus, cards := parseCardInfo(f.Content)
result.Hardware.GPUs = gpus
nicCards = cards
}
if f := findByPath(files, "logdump/netcard/netcard_info.txt"); f != nil || len(nicCards) > 0 {
var content []byte
if f != nil {
content = f.Content
}
adapters, legacyNICs := mergeNetworkAdapters(nicCards, parseNetcardInfo(content))
result.Hardware.NetworkAdapters = adapters
result.Hardware.NetworkCards = legacyNICs
}
if f := findByPath(files, "appdump/bmc/psu_info.txt"); f != nil {
result.Hardware.PowerSupply = parsePSUInfo(f.Content)
}
if f := findByPath(files, "appdump/storagemgnt/raid_controller_info.txt"); f != nil {
parseStorageControllerInfo(f.Content, result)
}
if f := findByPath(files, "rtosdump/versioninfo/app_revision.txt"); f != nil {
parseAppRevision(f.Content, result)
}
for _, f := range findDiskInfoFiles(files) {
disk := parseDiskInfo(f.Content)
if disk != nil {
result.Hardware.Storage = append(result.Hardware.Storage, *disk)
}
}
if f := findByPath(files, "logdump/maintenance_log"); f != nil {
result.Events = parseMaintenanceLog(f.Content)
}
result.Protocol = "ipmi"
result.SourceType = models.SourceTypeArchive
parser.ApplyManufacturedYearWeekFromFRU(result.FRU, result.Hardware)
return result, nil
}
// findByPath returns the first file whose lowercased path contains the given substring.
func findByPath(files []parser.ExtractedFile, substring string) *parser.ExtractedFile {
for i := range files {
if strings.Contains(strings.ToLower(files[i].Path), substring) {
return &files[i]
}
}
return nil
}
func findByAnyPath(files []parser.ExtractedFile, substrings ...string) *parser.ExtractedFile {
for _, substring := range substrings {
if f := findByPath(files, substring); f != nil {
return f
}
}
return nil
}
// findDiskInfoFiles returns all PhysicalDrivesInfo disk_info files.
func findDiskInfoFiles(files []parser.ExtractedFile) []parser.ExtractedFile {
var out []parser.ExtractedFile
for _, f := range files {
path := strings.ToLower(f.Path)
if strings.Contains(path, "physicaldrivesinfo/") && strings.HasSuffix(path, "/disk_info") {
out = append(out, f)
}
}
return out
}

View File

@@ -0,0 +1,332 @@
package xfusion
import (
"strings"
"testing"
"git.mchus.pro/mchus/logpile/internal/models"
"git.mchus.pro/mchus/logpile/internal/parser"
)
// loadTestArchive extracts the given archive path for use in tests.
// Skips the test if the file is not found (CI environments without testdata).
func loadTestArchive(t *testing.T, path string) []parser.ExtractedFile {
t.Helper()
files, err := parser.ExtractArchive(path)
if err != nil {
t.Skipf("cannot load test archive %s: %v", path, err)
}
return files
}
func TestDetect_G5500V7(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
score := p.Detect(files)
if score < 80 {
t.Fatalf("expected Detect score >= 80, got %d", score)
}
}
func TestDetect_ServerFileExportMarkers(t *testing.T) {
p := &Parser{}
score := p.Detect([]parser.ExtractedFile{
{Path: "dump_info/RTOSDump/versioninfo/app_revision.txt", Content: []byte("Product Name: G5500 V7")},
{Path: "dump_info/LogDump/netcard/netcard_info.txt", Content: []byte("2026-02-04 03:54:06 UTC")},
{Path: "dump_info/AppDump/card_manage/card_info", Content: []byte("OCP Card Info")},
})
if score < 70 {
t.Fatalf("expected Detect score >= 70 for xFusion file export markers, got %d", score)
}
}
func TestDetect_Negative(t *testing.T) {
p := &Parser{}
score := p.Detect([]parser.ExtractedFile{
{Path: "logs/messages.txt", Content: []byte("plain text")},
{Path: "inventory.json", Content: []byte(`{"vendor":"other"}`)},
})
if score != 0 {
t.Fatalf("expected Detect score 0 for non-xFusion input, got %d", score)
}
}
func TestParse_G5500V7_BoardInfo(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if result.Hardware == nil {
t.Fatal("Hardware is nil")
}
board := result.Hardware.BoardInfo
if board.SerialNumber != "210619KUGGXGS2000015" {
t.Errorf("BoardInfo.SerialNumber = %q, want 210619KUGGXGS2000015", board.SerialNumber)
}
if board.ProductName != "G5500 V7" {
t.Errorf("BoardInfo.ProductName = %q, want G5500 V7", board.ProductName)
}
}
func TestParse_G5500V7_CPUs(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if len(result.Hardware.CPUs) != 2 {
t.Fatalf("expected 2 CPUs, got %d", len(result.Hardware.CPUs))
}
cpu1 := result.Hardware.CPUs[0]
if cpu1.Cores != 32 {
t.Errorf("CPU1 cores = %d, want 32", cpu1.Cores)
}
if cpu1.Threads != 64 {
t.Errorf("CPU1 threads = %d, want 64", cpu1.Threads)
}
if cpu1.SerialNumber == "" {
t.Error("CPU1 SerialNumber is empty")
}
}
func TestParse_G5500V7_Memory(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
// Only 2 DIMMs are populated (rest are "NO DIMM")
if len(result.Hardware.Memory) != 2 {
t.Fatalf("expected 2 populated DIMMs, got %d", len(result.Hardware.Memory))
}
dimm := result.Hardware.Memory[0]
if dimm.SizeMB != 65536 {
t.Errorf("DIMM0 SizeMB = %d, want 65536", dimm.SizeMB)
}
if dimm.Type != "DDR5" {
t.Errorf("DIMM0 Type = %q, want DDR5", dimm.Type)
}
}
func TestParse_G5500V7_GPUs(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if len(result.Hardware.GPUs) != 8 {
t.Fatalf("expected 8 GPUs, got %d", len(result.Hardware.GPUs))
}
for _, gpu := range result.Hardware.GPUs {
if gpu.SerialNumber == "" {
t.Errorf("GPU slot %s has empty SerialNumber", gpu.Slot)
}
if gpu.Model == "" {
t.Errorf("GPU slot %s has empty Model", gpu.Slot)
}
if gpu.Firmware == "" {
t.Errorf("GPU slot %s has empty Firmware", gpu.Slot)
}
}
}
func TestParse_G5500V7_NICs(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if len(result.Hardware.NetworkCards) < 1 {
t.Fatal("expected at least 1 NIC (OCP CX6), got 0")
}
nic := result.Hardware.NetworkCards[0]
if nic.SerialNumber == "" {
t.Errorf("NIC SerialNumber is empty")
}
}
func TestParse_ServerFileExport_NetworkAdaptersAndFirmware(t *testing.T) {
p := &Parser{}
files := []parser.ExtractedFile{
{
Path: "dump_info/AppDump/card_manage/card_info",
Content: []byte(strings.TrimSpace(`
Pcie Card Info
Slot | Vender Id | Device Id | Sub Vender Id | Sub Device Id | Segment Number | Bus Number | Device Number | Function Number | Card Desc | Board Id | PCB Version | CPLD Version | Sub Card Bom Id | PartNum | SerialNumber | OriginalPartNum
1 | 0x15b3 | 0x101f | 0x1f24 | 0x2011 | 0x00 | 0x27 | 0x00 | 0x00 | MT2894 Family [ConnectX-6 Lx] | N/A | N/A | N/A | N/A | 0302Y238 | 02Y238X6RC000058 |
OCP Card Info
Slot | Vender Id | Device Id | Sub Vender Id | Sub Device Id | Segment Number | Bus Number | Device Number | Function Number | Card Desc | Board Id | PCB Version | CPLD Version | Sub Card Bom Id | PartNum | SerialNumber | OriginalPartNum
1 | 0x15b3 | 0x101f | 0x1f24 | 0x2011 | 0x00 | 0x27 | 0x00 | 0x00 | MT2894 Family [ConnectX-6 Lx] | N/A | N/A | N/A | N/A | 0302Y238 | 02Y238X6RC000058 |
`)),
},
{
Path: "dump_info/LogDump/netcard/netcard_info.txt",
Content: []byte(strings.TrimSpace(`
2026-02-04 03:54:06 UTC
ProductName :XC385
Manufacture :XFUSION
FirmwareVersion :26.39.2048
SlotId :1
Port0 BDF:0000:27:00.0
MacAddr:44:1A:4C:16:E8:03
ActualMac:44:1A:4C:16:E8:03
Port1 BDF:0000:27:00.1
MacAddr:00:00:00:00:00:00
ActualMac:44:1A:4C:16:E8:04
`)),
},
{
Path: "dump_info/RTOSDump/versioninfo/app_revision.txt",
Content: []byte(strings.TrimSpace(`
------------------- iBMC INFO -------------------
Active iBMC Version: (U68)3.08.05.85
Active iBMC Built: 16:46:26 Jan 4 2026
SDK Version: 13.16.30.16
SDK Built: 07:55:18 Dec 12 2025
Active BIOS Version: (U6216)01.02.08.17
Active BIOS Built: 00:00:00 Jan 05 2026
Product Name: G5500 V7
`)),
},
}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if result.Protocol != "ipmi" || result.SourceType != models.SourceTypeArchive {
t.Fatalf("unexpected source metadata: protocol=%q source_type=%q", result.Protocol, result.SourceType)
}
if result.Hardware == nil {
t.Fatal("Hardware is nil")
}
if len(result.Hardware.NetworkAdapters) != 1 {
t.Fatalf("expected 1 network adapter, got %d", len(result.Hardware.NetworkAdapters))
}
adapter := result.Hardware.NetworkAdapters[0]
if adapter.BDF != "0000:27:00.0" {
t.Fatalf("expected network adapter BDF 0000:27:00.0, got %q", adapter.BDF)
}
if adapter.Firmware != "26.39.2048" {
t.Fatalf("expected network adapter firmware 26.39.2048, got %q", adapter.Firmware)
}
if adapter.SerialNumber != "02Y238X6RC000058" {
t.Fatalf("expected network adapter serial from card_info, got %q", adapter.SerialNumber)
}
if len(adapter.MACAddresses) != 2 || adapter.MACAddresses[0] != "44:1A:4C:16:E8:03" || adapter.MACAddresses[1] != "44:1A:4C:16:E8:04" {
t.Fatalf("unexpected MAC addresses: %#v", adapter.MACAddresses)
}
fwByDevice := make(map[string]models.FirmwareInfo)
for _, fw := range result.Hardware.Firmware {
fwByDevice[fw.DeviceName] = fw
}
if fwByDevice["iBMC"].Version != "(U68)3.08.05.85" {
t.Fatalf("expected iBMC firmware from app_revision.txt, got %#v", fwByDevice["iBMC"])
}
if fwByDevice["BIOS"].Version != "(U6216)01.02.08.17" {
t.Fatalf("expected BIOS firmware from app_revision.txt, got %#v", fwByDevice["BIOS"])
}
if result.Hardware.BoardInfo.ProductName != "G5500 V7" {
t.Fatalf("expected board product fallback from app_revision.txt, got %q", result.Hardware.BoardInfo.ProductName)
}
}
func TestParse_G5500V7_PSUs(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if len(result.Hardware.PowerSupply) != 4 {
t.Fatalf("expected 4 PSUs, got %d", len(result.Hardware.PowerSupply))
}
for _, psu := range result.Hardware.PowerSupply {
if psu.WattageW != 3000 {
t.Errorf("PSU slot %s wattage = %d, want 3000", psu.Slot, psu.WattageW)
}
if psu.SerialNumber == "" {
t.Errorf("PSU slot %s has empty SerialNumber", psu.Slot)
}
}
}
func TestParse_G5500V7_Storage(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if len(result.Hardware.Storage) != 2 {
t.Fatalf("expected 2 storage devices, got %d", len(result.Hardware.Storage))
}
for _, disk := range result.Hardware.Storage {
if disk.SerialNumber == "" {
t.Errorf("disk slot %s has empty SerialNumber", disk.Slot)
}
if disk.Model == "" {
t.Errorf("disk slot %s has empty Model", disk.Slot)
}
}
}
func TestParse_G5500V7_Sensors(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if len(result.Sensors) < 20 {
t.Fatalf("expected at least 20 sensors, got %d", len(result.Sensors))
}
}
func TestParse_G5500V7_Events(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if len(result.Events) < 5 {
t.Fatalf("expected at least 5 events, got %d", len(result.Events))
}
// All events should have real timestamps (not epoch 0)
for _, ev := range result.Events {
if ev.Timestamp.Year() <= 1970 {
t.Errorf("event has epoch timestamp: %v %s", ev.Timestamp, ev.Description)
}
}
}
func TestParse_G5500V7_FRU(t *testing.T) {
files := loadTestArchive(t, "../../../../example/G5500V7_210619KUGGXGS2000015_20260318-1128.tar.gz")
p := &Parser{}
result, err := p.Parse(files)
if err != nil {
t.Fatalf("Parse: %v", err)
}
if len(result.FRU) < 3 {
t.Fatalf("expected at least 3 FRU entries, got %d", len(result.FRU))
}
// Check mainboard FRU serial
found := false
for _, f := range result.FRU {
if f.SerialNumber == "210619KUGGXGS2000015" {
found = true
}
}
if !found {
t.Error("mainboard serial 210619KUGGXGS2000015 not found in FRU")
}
}

View File

@@ -44,6 +44,9 @@ func TestParserParseExample(t *testing.T) {
examplePath := filepath.Join("..", "..", "..", "..", "example", "xigmanas.txt")
raw, err := os.ReadFile(examplePath)
if err != nil {
if os.IsNotExist(err) {
t.Skipf("example file %s not present", examplePath)
}
t.Fatalf("read example file: %v", err)
}

View File

@@ -0,0 +1,69 @@
package server
import (
"net/http"
"net/http/httptest"
"strings"
"testing"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
func TestHandleChartCurrent_RendersCurrentReanimatorSnapshot(t *testing.T) {
s := New(Config{})
s.SetResult(&models.AnalysisResult{
SourceType: models.SourceTypeArchive,
Filename: "example.zip",
CollectedAt: time.Date(2026, 3, 16, 10, 0, 0, 0, time.UTC),
Hardware: &models.HardwareConfig{
BoardInfo: models.BoardInfo{
ProductName: "SYS-TEST",
SerialNumber: "SN123",
},
CPUs: []models.CPU{
{
Socket: 1,
Model: "Xeon Gold",
Cores: 32,
},
},
},
})
req := httptest.NewRequest(http.MethodGet, "/chart/current", nil)
rec := httptest.NewRecorder()
s.mux.ServeHTTP(rec, req)
if rec.Code != http.StatusOK {
t.Fatalf("expected 200, got %d", rec.Code)
}
body := rec.Body.String()
if !strings.Contains(body, "SYS-TEST - SN123") {
t.Fatalf("expected chart title in body, got %q", body)
}
if !strings.Contains(body, `/chart/static/view.css`) {
t.Fatalf("expected rewritten chart static path, got %q", body)
}
if !strings.Contains(body, "Snapshot Metadata") {
t.Fatalf("expected rendered chart output, got %q", body)
}
}
func TestHandleChartCurrent_RendersEmptyViewerWithoutResult(t *testing.T) {
s := New(Config{})
req := httptest.NewRequest(http.MethodGet, "/chart/current", nil)
rec := httptest.NewRecorder()
s.mux.ServeHTTP(rec, req)
if rec.Code != http.StatusOK {
t.Fatalf("expected 200, got %d", rec.Code)
}
body := rec.Body.String()
if !strings.Contains(body, "Snapshot Viewer") {
t.Fatalf("expected empty chart viewer, got %q", body)
}
}

View File

@@ -3,6 +3,8 @@ package server
import (
"bytes"
"encoding/json"
"fmt"
"net"
"net/http"
"net/http/httptest"
"strings"
@@ -14,16 +16,58 @@ import (
func newCollectTestServer() (*Server, *httptest.Server) {
s := &Server{
jobManager: NewJobManager(),
jobManager: NewJobManager(),
collectors: testCollectorRegistry(),
}
mux := http.NewServeMux()
mux.HandleFunc("POST /api/collect/probe", s.handleCollectProbe)
mux.HandleFunc("POST /api/collect", s.handleCollectStart)
mux.HandleFunc("GET /api/collect/{id}", s.handleCollectStatus)
mux.HandleFunc("POST /api/collect/{id}/cancel", s.handleCollectCancel)
mux.HandleFunc("POST /api/collect/{id}/skip", s.handleCollectSkip)
return s, httptest.NewServer(mux)
}
func TestCollectProbe(t *testing.T) {
_, ts := newCollectTestServer()
defer ts.Close()
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("listen probe target: %v", err)
}
defer ln.Close()
addr, ok := ln.Addr().(*net.TCPAddr)
if !ok {
t.Fatalf("unexpected listener address type: %T", ln.Addr())
}
body := fmt.Sprintf(`{"host":"127.0.0.1","protocol":"redfish","port":%d,"username":"admin-off","auth_type":"password","password":"secret","tls_mode":"strict"}`, addr.Port)
resp, err := http.Post(ts.URL+"/api/collect/probe", "application/json", bytes.NewBufferString(body))
if err != nil {
t.Fatalf("post collect probe failed: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Fatalf("expected 200, got %d", resp.StatusCode)
}
var payload CollectProbeResponse
if err := json.NewDecoder(resp.Body).Decode(&payload); err != nil {
t.Fatalf("decode probe response: %v", err)
}
if !payload.Reachable {
t.Fatalf("expected reachable=true, got false")
}
if payload.HostPoweredOn {
t.Fatalf("expected host powered off in probe response")
}
if payload.HostPowerState != "Off" {
t.Fatalf("expected host power state Off, got %q", payload.HostPowerState)
}
}
func TestCollectLifecycleToTerminal(t *testing.T) {
_, ts := newCollectTestServer()
defer ts.Close()
@@ -57,6 +101,21 @@ func TestCollectLifecycleToTerminal(t *testing.T) {
if len(status.Logs) < 4 {
t.Fatalf("expected detailed logs, got %v", status.Logs)
}
if len(status.ActiveModules) == 0 {
t.Fatal("expected active modules in collect status")
}
if status.ActiveModules[0].Name == "" {
t.Fatal("expected active module name")
}
if len(status.ModuleScores) == 0 {
t.Fatal("expected module scores in collect status")
}
if status.DebugInfo == nil {
t.Fatal("expected debug info in collect status")
}
if len(status.DebugInfo.PhaseTelemetry) == 0 {
t.Fatal("expected phase telemetry in collect debug info")
}
}
func TestCollectCancel(t *testing.T) {

View File

@@ -17,8 +17,47 @@ func (c *mockConnector) Protocol() string {
return c.protocol
}
func (c *mockConnector) Probe(ctx context.Context, req collector.Request) (*collector.ProbeResult, error) {
if strings.Contains(strings.ToLower(req.Host), "fail") {
return nil, context.DeadlineExceeded
}
hostPoweredOn := true
if strings.Contains(strings.ToLower(req.Host), "off") || strings.Contains(strings.ToLower(req.Username), "off") {
hostPoweredOn = false
}
return &collector.ProbeResult{
Reachable: true,
Protocol: c.protocol,
HostPowerState: map[bool]string{true: "On", false: "Off"}[hostPoweredOn],
HostPoweredOn: hostPoweredOn,
SystemPath: "/redfish/v1/Systems/1",
}, nil
}
func (c *mockConnector) Collect(ctx context.Context, req collector.Request, emit collector.ProgressFn) (*models.AnalysisResult, error) {
steps := []collector.Progress{
{
Status: CollectStatusRunning,
Progress: 10,
Message: "Подбор модулей Redfish...",
ActiveModules: []collector.ModuleActivation{
{Name: "supermicro", Score: 80},
{Name: "generic", Score: 10},
},
ModuleScores: []collector.ModuleScore{
{Name: "supermicro", Score: 80, Active: true, Priority: 20},
{Name: "generic", Score: 10, Active: true, Priority: 100},
{Name: "hgx-topology", Score: 0, Active: false, Priority: 30},
},
DebugInfo: &collector.CollectDebugInfo{
AdaptiveThrottled: false,
SnapshotWorkers: 6,
PrefetchWorkers: 4,
PhaseTelemetry: []collector.PhaseTelemetry{
{Phase: "discovery", Requests: 6, Errors: 0, ErrorRate: 0, AvgMS: 120, P95MS: 180},
},
},
},
{Status: CollectStatusRunning, Progress: 20, Message: "Подключение..."},
{Status: CollectStatusRunning, Progress: 50, Message: "Сбор инвентаря..."},
{Status: CollectStatusRunning, Progress: 80, Message: "Нормализация..."},

View File

@@ -11,14 +11,23 @@ const (
)
type CollectRequest struct {
Host string `json:"host"`
Protocol string `json:"protocol"`
Port int `json:"port"`
Username string `json:"username"`
AuthType string `json:"auth_type"`
Password string `json:"password,omitempty"`
Token string `json:"token,omitempty"`
TLSMode string `json:"tls_mode"`
Host string `json:"host"`
Protocol string `json:"protocol"`
Port int `json:"port"`
Username string `json:"username"`
AuthType string `json:"auth_type"`
Password string `json:"password,omitempty"`
Token string `json:"token,omitempty"`
TLSMode string `json:"tls_mode"`
DebugPayloads bool `json:"debug_payloads,omitempty"`
}
type CollectProbeResponse struct {
Reachable bool `json:"reachable"`
Protocol string `json:"protocol,omitempty"`
HostPowerState string `json:"host_power_state,omitempty"`
HostPoweredOn bool `json:"host_powered_on"`
Message string `json:"message,omitempty"`
}
type CollectJobResponse struct {
@@ -29,13 +38,18 @@ type CollectJobResponse struct {
}
type CollectJobStatusResponse struct {
JobID string `json:"job_id"`
Status string `json:"status"`
Progress *int `json:"progress,omitempty"`
Logs []string `json:"logs,omitempty"`
Error string `json:"error,omitempty"`
CreatedAt time.Time `json:"created_at,omitempty"`
UpdatedAt time.Time `json:"updated_at"`
JobID string `json:"job_id"`
Status string `json:"status"`
Progress *int `json:"progress,omitempty"`
CurrentPhase string `json:"current_phase,omitempty"`
ETASeconds *int `json:"eta_seconds,omitempty"`
Logs []string `json:"logs,omitempty"`
Error string `json:"error,omitempty"`
ActiveModules []CollectModuleStatus `json:"active_modules,omitempty"`
ModuleScores []CollectModuleStatus `json:"module_scores,omitempty"`
DebugInfo *CollectDebugInfo `json:"debug_info,omitempty"`
CreatedAt time.Time `json:"created_at,omitempty"`
UpdatedAt time.Time `json:"updated_at"`
}
type CollectRequestMeta struct {
@@ -48,27 +62,65 @@ type CollectRequestMeta struct {
}
type Job struct {
ID string
Status string
Progress int
Logs []string
Error string
CreatedAt time.Time
UpdatedAt time.Time
RequestMeta CollectRequestMeta
cancel func()
ID string
Status string
Progress int
CurrentPhase string
ETASeconds int
Logs []string
Error string
ActiveModules []CollectModuleStatus
ModuleScores []CollectModuleStatus
DebugInfo *CollectDebugInfo
CreatedAt time.Time
UpdatedAt time.Time
RequestMeta CollectRequestMeta
cancel func()
skipFn func()
}
type CollectModuleStatus struct {
Name string `json:"name"`
Score int `json:"score"`
Active bool `json:"active,omitempty"`
Priority int `json:"priority,omitempty"`
}
type CollectDebugInfo struct {
AdaptiveThrottled bool `json:"adaptive_throttled"`
SnapshotWorkers int `json:"snapshot_workers,omitempty"`
PrefetchWorkers int `json:"prefetch_workers,omitempty"`
PrefetchEnabled *bool `json:"prefetch_enabled,omitempty"`
PhaseTelemetry []CollectPhaseTelemetry `json:"phase_telemetry,omitempty"`
}
type CollectPhaseTelemetry struct {
Phase string `json:"phase"`
Requests int `json:"requests,omitempty"`
Errors int `json:"errors,omitempty"`
ErrorRate float64 `json:"error_rate,omitempty"`
AvgMS int64 `json:"avg_ms,omitempty"`
P95MS int64 `json:"p95_ms,omitempty"`
}
func (j *Job) toStatusResponse() CollectJobStatusResponse {
progress := j.Progress
resp := CollectJobStatusResponse{
JobID: j.ID,
Status: j.Status,
Progress: &progress,
Logs: append([]string(nil), j.Logs...),
Error: j.Error,
CreatedAt: j.CreatedAt,
UpdatedAt: j.UpdatedAt,
JobID: j.ID,
Status: j.Status,
Progress: &progress,
CurrentPhase: j.CurrentPhase,
Logs: append([]string(nil), j.Logs...),
Error: j.Error,
ActiveModules: append([]CollectModuleStatus(nil), j.ActiveModules...),
ModuleScores: append([]CollectModuleStatus(nil), j.ModuleScores...),
DebugInfo: cloneCollectDebugInfo(j.DebugInfo),
CreatedAt: j.CreatedAt,
UpdatedAt: j.UpdatedAt,
}
if j.ETASeconds > 0 {
eta := j.ETASeconds
resp.ETASeconds = &eta
}
return resp
}
@@ -81,3 +133,16 @@ func (j *Job) toJobResponse(message string) CollectJobResponse {
CreatedAt: j.CreatedAt,
}
}
func cloneCollectDebugInfo(in *CollectDebugInfo) *CollectDebugInfo {
if in == nil {
return nil
}
out := *in
out.PhaseTelemetry = append([]CollectPhaseTelemetry(nil), in.PhaseTelemetry...)
if in.PrefetchEnabled != nil {
value := *in.PrefetchEnabled
out.PrefetchEnabled = &value
}
return &out
}

View File

@@ -81,7 +81,7 @@ func BuildHardwareDevices(hw *models.HardwareConfig) []models.HardwareDevice {
}
for _, mem := range hw.Memory {
if !mem.Present || mem.SizeMB == 0 {
if !mem.IsInstalledInventory() {
continue
}
present := mem.Present
@@ -243,6 +243,8 @@ func BuildHardwareDevices(hw *models.HardwareConfig) []models.HardwareDevice {
Source: "network_adapters",
Slot: nic.Slot,
Location: nic.Location,
BDF: nic.BDF,
DeviceClass: "NetworkController",
VendorID: nic.VendorID,
DeviceID: nic.DeviceID,
Model: nic.Model,
@@ -253,6 +255,11 @@ func BuildHardwareDevices(hw *models.HardwareConfig) []models.HardwareDevice {
PortCount: nic.PortCount,
PortType: nic.PortType,
MACAddresses: nic.MACAddresses,
LinkWidth: nic.LinkWidth,
LinkSpeed: nic.LinkSpeed,
MaxLinkWidth: nic.MaxLinkWidth,
MaxLinkSpeed: nic.MaxLinkSpeed,
NUMANode: nic.NUMANode,
Present: &present,
Status: nic.Status,
StatusCheckedAt: nic.StatusCheckedAt,

Some files were not shown because too many files have changed in this diff Show More