dell: strip MAC from model names; fix device-bound firmware in dell/inspur
- Dell NICView: strip " - XX:XX:XX:XX:XX:XX" suffix from ProductName (Dell TSR embeds MAC in this field for every NIC port) - Dell SoftwareIdentity: same strip applied to ElementName; store FQDD in FirmwareInfo.Description so exporter can filter device-bound entries - Exporter: add isDeviceBoundFirmwareFQDD() to filter firmware entries whose Description matches NIC./PSU./Disk./RAID.Backplane./GPU. FQDD prefixes (prevents device firmware from appearing in hardware.firmware) - Exporter: extend isDeviceBoundFirmwareName() to filter HGX GPU/NVSwitch firmware inventory IDs (_fw_gpu_, _fw_nvswitch_, _inforom_gpu_) - Inspur: remove HDD firmware from Hardware.Firmware — already present in Storage.Firmware, duplicating it violates ADL-016 - bible-local/06-parsers.md: document firmware and MAC stripping rules - bible-local/10-decisions.md: add ADL-016 (device-bound firmware) and ADL-017 (vendor-embedded MAC in model name fields) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -10,7 +10,7 @@ All registrations are collected in `internal/parser/vendors/vendors.go`:
|
||||
```go
|
||||
import (
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/supermicro"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
|
||||
// etc.
|
||||
)
|
||||
```
|
||||
@@ -53,6 +53,42 @@ version produced a given result.
|
||||
|
||||
---
|
||||
|
||||
## Parser data quality rules
|
||||
|
||||
### FirmwareInfo — system-level only
|
||||
|
||||
`Hardware.Firmware` must contain **only system-level firmware**: BIOS, BMC/iDRAC,
|
||||
Lifecycle Controller, CPLD, storage controllers, BOSS adapters.
|
||||
|
||||
**Device-bound firmware** (NIC, GPU, PSU, disk, backplane) **must NOT be added to
|
||||
`Hardware.Firmware`**. It belongs to the device's own `Firmware` field and is already
|
||||
present there. Duplicating it in `Hardware.Firmware` causes double entries in Reanimator.
|
||||
|
||||
The Reanimator exporter filters by `FirmwareInfo.DeviceName` prefix and by
|
||||
`FirmwareInfo.Description` (FQDD prefix). Parsers must cooperate:
|
||||
|
||||
- Store the device's FQDD (or equivalent slot identifier) in `FirmwareInfo.Description`
|
||||
for all firmware entries that come from a per-device inventory source (e.g. Dell
|
||||
`DCIM_SoftwareIdentity`).
|
||||
- FQDD prefixes that are device-bound: `NIC.`, `PSU.`, `Disk.`, `RAID.Backplane.`, `GPU.`
|
||||
|
||||
### NIC/device model names — strip embedded MAC addresses
|
||||
|
||||
Some vendors (confirmed: Dell TSR) embed the MAC address in the device model name field,
|
||||
e.g. `ProductName = "NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08"`.
|
||||
|
||||
**Rule:** Strip any ` - XX:XX:XX:XX:XX:XX` suffix from model/name strings before storing
|
||||
them in `FirmwareInfo.DeviceName`, `NetworkAdapter.Model`, or any other model field.
|
||||
|
||||
Use `nicMACInModelRE` (defined in the Dell parser) or an equivalent regex:
|
||||
```
|
||||
\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$
|
||||
```
|
||||
|
||||
This applies to **all** string fields used as device names or model identifiers.
|
||||
|
||||
---
|
||||
|
||||
## Vendor parsers
|
||||
|
||||
### Inspur / Kaytus (`inspur`)
|
||||
@@ -86,29 +122,26 @@ inspur/
|
||||
|
||||
---
|
||||
|
||||
### Supermicro (`supermicro`)
|
||||
### Dell TSR (`dell`)
|
||||
|
||||
**Status:** Ready (v1.0.0). Tested on SYS-821GE-TNHR crash dumps.
|
||||
**Status:** Ready (v3.0). Tested on nested TSR archives with embedded `*.pl.zip`.
|
||||
|
||||
**Archive format:** `.tgz` / `.tar.gz` / `.tar`
|
||||
**Archive format:** `.zip` (outer archive + nested `*.pl.zip`)
|
||||
|
||||
**Primary source file:** `CDump.txt` — JSON crashdump file
|
||||
|
||||
**Confidence:** +80 when `CDump.txt` contains `crash_data`, `METADATA`, `bmc_fw_ver` markers.
|
||||
**Primary source files:**
|
||||
- `tsr/metadata.json`
|
||||
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml`
|
||||
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml`
|
||||
- `tsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xml`
|
||||
- `tsr/hardware/sysinfo/lcfiles/curr_lclog.xml`
|
||||
|
||||
**Extracted data:**
|
||||
- CPUs: CPUID, core count, manufacturer (Intel), microcode version (as firmware field)
|
||||
- FRU: BMC firmware version, BIOS version, ME firmware version, CPU PPIN
|
||||
- Events: crashdump collection event + MCA errors
|
||||
|
||||
**MCA error detection:**
|
||||
- Bit 63 (Valid), Bit 61 (UC — uncorrected), Bit 60 (EN — enabled)
|
||||
- Corrected MCA errors → `Warning` severity
|
||||
- Uncorrected MCA errors → `Critical` severity
|
||||
|
||||
**Known limitations:**
|
||||
- TOR dump and extended MCA register data not yet parsed.
|
||||
- No CPU model name (only CPUID hex code available in crashdump format).
|
||||
- Board/system identity and BIOS/iDRAC firmware
|
||||
- CPU, memory, physical disks, virtual disks, PSU, NIC, PCIe
|
||||
- GPU inventory (`DCIM_VideoView`) + GPU sensor enrichment (`DCIM_GPUSensor`)
|
||||
- Controller/backplane inventory (`DCIM_ControllerView`, `DCIM_EnclosureView`)
|
||||
- Sensor readings (temperature/voltage/current/power/fan/utilization)
|
||||
- Lifecycle events (`curr_lclog.xml`)
|
||||
|
||||
---
|
||||
|
||||
@@ -272,8 +305,8 @@ with content markers (e.g. `Unraid kernel build`, parity data markers).
|
||||
|
||||
| Vendor | ID | Status | Tested on |
|
||||
|--------|----|--------|-----------|
|
||||
| Dell TSR | `dell` | Ready | TSR nested zip archives |
|
||||
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
|
||||
| Supermicro | `supermicro` | Ready | SYS-821GE-TNHR crashdump |
|
||||
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
|
||||
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
|
||||
| Unraid | `unraid` | Ready | Unraid diagnostics archives |
|
||||
|
||||
@@ -201,4 +201,56 @@ the server; frontend only renders status/flags returned by the API.
|
||||
|
||||
---
|
||||
|
||||
## ADL-015 — Supermicro crashdump archive parser removed from active registry
|
||||
|
||||
**Date:** 2026-03-01
|
||||
**Context:** The Supermicro crashdump parser (`SMC Crash Dump Parser`) produced low-value
|
||||
results for current workflows and was explicitly rejected as a supported archive path.
|
||||
**Decision:** Remove `supermicro` vendor parser from active registration and project source.
|
||||
Do not include it in `/api/parsers` output or parser documentation matrix.
|
||||
**Consequences:**
|
||||
- Supermicro crashdump archives (`CDump.txt` format) are no longer parsed by a dedicated vendor parser.
|
||||
- Such archives fall back to other matching parsers (typically `generic`) unless a new replacement parser is added.
|
||||
- Reintroduction requires a new parser package and an explicit registry import in `vendors/vendors.go`.
|
||||
|
||||
---
|
||||
|
||||
## ADL-016 — Device-bound firmware must not appear in hardware.firmware
|
||||
|
||||
**Date:** 2026-03-01
|
||||
**Context:** Dell TSR `DCIM_SoftwareIdentity` lists firmware for every component (NICs,
|
||||
PSUs, disks, backplanes) in addition to system-level firmware. Naively importing all entries
|
||||
into `Hardware.Firmware` caused device firmware to appear twice in Reanimator: once in the
|
||||
device's own record and again in the top-level firmware list.
|
||||
**Decision:**
|
||||
- `Hardware.Firmware` contains only system-level firmware (BIOS, BMC/iDRAC, CPLD,
|
||||
Lifecycle Controller, storage controllers, BOSS).
|
||||
- Device-bound entries (NIC, PSU, Disk, Backplane, GPU) must not be added to
|
||||
`Hardware.Firmware`.
|
||||
- Parsers must store the FQDD (or equivalent slot identifier) in `FirmwareInfo.Description`
|
||||
so the Reanimator exporter can filter by FQDD prefix.
|
||||
- The exporter's `isDeviceBoundFirmwareFQDD()` function performs this filter.
|
||||
**Consequences:**
|
||||
- Any new parser that ingests a per-device firmware inventory must follow the same rule.
|
||||
- Device firmware is accessible only via the device's own record, not the firmware list.
|
||||
|
||||
---
|
||||
|
||||
## ADL-017 — Vendor-embedded MAC addresses must be stripped from model name fields
|
||||
|
||||
**Date:** 2026-03-01
|
||||
**Context:** Dell TSR embeds MAC addresses directly in `ProductName` and `ElementName`
|
||||
fields (e.g. `"NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08"`).
|
||||
This caused model names to contain MAC addresses in NIC model, NIC firmware device name,
|
||||
and potentially other fields.
|
||||
**Decision:** Strip any ` - XX:XX:XX:XX:XX:XX` suffix from all model/name string fields
|
||||
at parse time before storing in any model struct. Use the regex
|
||||
`\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$`.
|
||||
**Consequences:**
|
||||
- Model names are clean and consistent across all devices.
|
||||
- All parsers must apply this stripping to any field used as a device name or model.
|
||||
- Confirmed affected fields in Dell: `DCIM_NICView.ProductName`, `DCIM_SoftwareIdentity.ElementName`.
|
||||
|
||||
---
|
||||
|
||||
<!-- Add new decisions below this line using the format above -->
|
||||
|
||||
Reference in New Issue
Block a user