Parser / archive: - Add .sds extension as tar-format alias (archive.go) - Add tests for multipart upload size limits (multipart_limits_test.go) - Remove supermicro crashdump parser (ADL-015) Dell parser: - Remove GPU duplicates from PCIeDevices (DCIM_VideoView vs DCIM_PCIDeviceView both list the same GPU; VideoView record is authoritative) Server: - Add LOGPILE_CONVERT_MAX_MB env var for independent convert batch size limit - Improve "file too large" error message with current limit value Web: - Add CONVERT_MAX_FILES_PER_BATCH = 1000 cap - Minor UI copy and CSS fixes Bible: - bible-local/06-parsers.md: add pci.ids enrichment rule (enrich model from pciids when name is empty but vendor_id+device_id are present) - Sync bible submodule and local overview/architecture docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
06 — Parsers
Framework
Registration
Each vendor parser registers itself via Go's init() side-effect import pattern.
All registrations are collected in internal/parser/vendors/vendors.go:
import (
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
// etc.
)
VendorParser interface
type VendorParser interface {
Name() string // human-readable name
Vendor() string // vendor identifier string
Version() string // parser version (increment on logic changes)
Detect(files []ExtractedFile) int // confidence 0–100
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
}
Selection logic
All registered parsers run Detect() against the uploaded archive's file list.
The parser with the highest confidence score is selected.
Multiple parsers may return >0; only the top scorer is used.
Adding a new vendor parser
mkdir -p internal/parser/vendors/VENDORNAME- Copy
internal/parser/vendors/template/parser.go.templateas starting point. - Implement
Detect()andParse(). - Add blank import to
vendors/vendors.go.
Detect() tips:
- Look for unique filenames or directory names.
- Check file content for vendor-specific markers.
- Return 70+ only when confident; return 0 if clearly not a match.
Parser versioning
Each parser file contains a parserVersion constant.
Increment the version whenever parsing logic changes — this helps trace which
version produced a given result.
Parser data quality rules
FirmwareInfo — system-level only
Hardware.Firmware must contain only system-level firmware: BIOS, BMC/iDRAC,
Lifecycle Controller, CPLD, storage controllers, BOSS adapters.
Device-bound firmware (NIC, GPU, PSU, disk, backplane) must NOT be added to
Hardware.Firmware. It belongs to the device's own Firmware field and is already
present there. Duplicating it in Hardware.Firmware causes double entries in Reanimator.
The Reanimator exporter filters by FirmwareInfo.DeviceName prefix and by
FirmwareInfo.Description (FQDD prefix). Parsers must cooperate:
- Store the device's FQDD (or equivalent slot identifier) in
FirmwareInfo.Descriptionfor all firmware entries that come from a per-device inventory source (e.g. DellDCIM_SoftwareIdentity). - FQDD prefixes that are device-bound:
NIC.,PSU.,Disk.,RAID.Backplane.,GPU.
NIC/device model names — strip embedded MAC addresses
Some vendors (confirmed: Dell TSR) embed the MAC address in the device model name field,
e.g. ProductName = "NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08".
Rule: Strip any - XX:XX:XX:XX:XX:XX suffix from model/name strings before storing
them in FirmwareInfo.DeviceName, NetworkAdapter.Model, or any other model field.
Use nicMACInModelRE (defined in the Dell parser) or an equivalent regex:
\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$
This applies to all string fields used as device names or model identifiers.
PCI device name enrichment via pci.ids
If a PCIe device, GPU, NIC, or any hardware component has a vendor_id + device_id
but its model/name field is empty or generic (e.g. blank, equals the description,
or is just a raw hex ID), the parser must attempt to resolve the human-readable
model name from the embedded pci.ids database before storing the result.
Rule: When Model (or equivalent name field) is empty and both VendorID and
DeviceID are non-zero, call the pciids lookup and use the result as the model name.
// Example pattern — use in any parser that handles PCIe/GPU/NIC devices:
if strings.TrimSpace(device.Model) == "" && device.VendorID != 0 && device.DeviceID != 0 {
if name := pciids.Lookup(device.VendorID, device.DeviceID); name != "" {
device.Model = name
}
}
This rule applies to all vendor parsers. The pciids package is available at
internal/parser/vendors/pciids. See ADL-005 for the rationale.
Do not hardcode model name strings. If a device is unknown today, it will be
resolved automatically once pci.ids is updated.
Vendor parsers
Inspur / Kaytus (inspur)
Status: Ready. Tested on KR4268X2 (onekeylog format).
Archive format: .tar.gz onekeylog
Primary source files:
| File | Content |
|---|---|
asset.json |
Base hardware inventory |
component.log |
Component list |
devicefrusdr.log |
FRU and SDR data |
onekeylog/runningdata/redis-dump.rdb |
Runtime enrichment (optional) |
Redis RDB enrichment (applied conservatively — fills missing fields only):
- GPU:
serial_number,firmware(VBIOS/FW), runtime telemetry - NIC: firmware, serial, part number (when text logs leave fields empty)
Module structure:
inspur/
parser.go — main parser + registration
sdr.go — sensor/SDR parsing
fru.go — FRU serial parsing
asset.go — asset.json parsing
syslog.go — syslog parsing
Dell TSR (dell)
Status: Ready (v3.0). Tested on nested TSR archives with embedded *.pl.zip.
Archive format: .zip (outer archive + nested *.pl.zip)
Primary source files:
tsr/metadata.jsontsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xmltsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xmltsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xmltsr/hardware/sysinfo/lcfiles/curr_lclog.xml
Extracted data:
- Board/system identity and BIOS/iDRAC firmware
- CPU, memory, physical disks, virtual disks, PSU, NIC, PCIe
- GPU inventory (
DCIM_VideoView) + GPU sensor enrichment (DCIM_GPUSensor) - Controller/backplane inventory (
DCIM_ControllerView,DCIM_EnclosureView) - Sensor readings (temperature/voltage/current/power/fan/utilization)
- Lifecycle events (
curr_lclog.xml)
NVIDIA HGX Field Diagnostics (nvidia)
Status: Ready (v1.1.0). Works with any server vendor.
Archive format: .tar / .tar.gz
Confidence scoring:
| File | Score |
|---|---|
unified_summary.json with "HGX Field Diag" marker |
+40 |
summary.json |
+20 |
summary.csv |
+15 |
gpu_fieldiag/ directory |
+15 |
Source files:
| File | Content |
|---|---|
output.log |
dmidecode — server manufacturer, model, serial number |
unified_summary.json |
GPU details, NVSwitch devices, PCI addresses |
summary.json |
Diagnostic test results and error codes |
summary.csv |
Alternative test results format |
Extracted data:
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
Severity mapping:
info— tests passedwarning— e.g. "Row remapping failed"critical— error codes 300+
Known limitations:
- Detailed logs in
gpu_fieldiag/*.logare not parsed. - No CPU, memory, or storage extraction (not present in field diag archives).
NVIDIA Bug Report (nvidia_bug_report)
Status: Ready (v1.0.0).
File format: nvidia-bug-report-*.log.gz (gzip-compressed text)
Confidence: 85 (high priority for matching filename pattern)
Source sections parsed:
| dmidecode section | Extracts |
|---|---|
| System Information | server serial, UUID, manufacturer, product name |
| Processor Information | CPU model, serial, core/thread count, frequency |
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
| Other source | Extracts |
|---|---|
lspci -vvv (Ethernet/Network/IB) |
NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
/proc/driver/nvidia/gpus/*/information |
GPU model, BDF, UUID, VBIOS version, IRQ |
| NVRM version line | NVIDIA driver version |
Known limitations:
- Driver error/warning log lines not yet extracted.
- GPU temperature/utilization metrics require additional parsing sections.
XigmaNAS (xigmanas)
Status: Ready.
Archive format: Plain log files (FreeBSD-based NAS system)
Detection: Files named xigmanas, system, or dmesg; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
Extracted data:
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
- Populates:
Hardware.Firmware,Hardware.CPUs,Hardware.Memory,Hardware.Storage,Sensors
Unraid (unraid)
Status: Ready (v1.0.0).
Archive format: Unraid diagnostics archive contents (text-heavy diagnostics directories).
Detection: Combines filename/path markers (diagnostics-*, unraid-*.txt, vars.txt)
with content markers (e.g. Unraid kernel build, parity data markers).
Extracted data (current):
- Board / BIOS metadata (from motherboard/system files)
- CPU summary (from
lscpu.txt) - Memory modules (from diagnostics memory file)
- Storage devices (from
vars.txt+ SMART files) - Syslog events
H3C SDS G5 (h3c_g5)
Status: Ready (v1.0.0). Tested on H3C UniServer R4900 G5 SDS archives.
Archive format: .sds (tar archive)
Detection: hardware_info.ini, hardware.info, firmware_version.ini, user/test*.csv, plus H3C markers.
Extracted data (current):
- Board/FRU inventory (
FRUInfo.ini,board_info.ini) - Firmware list (
firmware_version.ini) - CPU inventory (
hardware_info.ini) - Memory DIMM inventory (
hardware_info.ini) - Storage inventory (
hardware.info,storage_disk.ini,NVMe_info.txt, RAID text enrichments) - Logical RAID volumes (
raid.json,Storage_RAID-*.txt) - Sensor snapshot (
sensor_info.ini) - SEL events (
user/test.csv,user/test1.csv, fallbackSel.json/sel_list.txt)
H3C SDS G6 (h3c_g6)
Status: Ready (v1.0.0). Tested on H3C UniServer R4700 G6 SDS archives.
Archive format: .sds (tar archive)
Detection: CPUDetailInfo.xml, MemoryDetailInfo.xml, firmware_version.json, Sel.json, plus H3C markers.
Extracted data (current):
- Board/FRU inventory (
FRUInfo.ini,board_info.ini) - Firmware list (
firmware_version.json) - CPU inventory (
CPUDetailInfo.xml) - Memory DIMM inventory (
MemoryDetailInfo.xml) - Storage inventory + capacity/model/interface (
storage_disk.ini,Storage_RAID-*.txt,NVMe_info.txt) - Logical RAID volumes (
raid.json, fallback fromStorage_RAID-*.txtwhen available) - Sensor snapshot (
sensor_info.ini) - SEL events (
user/Sel.json, fallbackuser/sel_list.txt)
Generic text fallback (generic)
Status: Ready (v1.0.0).
Confidence: 15 (lowest — only matches if no other parser scores higher)
Purpose: Fallback for any text file or single .gz file not matching a specific vendor.
Behavior:
- If filename matches
nvidia-bug-report-*.log.gz: extracts driver version and GPU list. - Otherwise: confirms file is text (not binary) and records a basic "Text File" event.
Supported vendor matrix
| Vendor | ID | Status | Tested on |
|---|---|---|---|
| Dell TSR | dell |
Ready | TSR nested zip archives |
| Inspur / Kaytus | inspur |
Ready | KR4268X2 onekeylog |
| NVIDIA HGX Field Diag | nvidia |
Ready | Various HGX servers |
| NVIDIA Bug Report | nvidia_bug_report |
Ready | H100 systems |
| Unraid | unraid |
Ready | Unraid diagnostics archives |
| XigmaNAS | xigmanas |
Ready | FreeBSD NAS logs |
| H3C SDS G5 | h3c_g5 |
Ready | H3C UniServer R4900 G5 SDS archives |
| H3C SDS G6 | h3c_g6 |
Ready | H3C UniServer R4700 G6 SDS archives |
| Generic fallback | generic |
Ready | Any text file |