docs: introduce project Bible and consolidate all architecture documentation
- Create docs/bible/ with 10 structured chapters (overview, architecture, API, data models, collectors, parsers, exporters, build, testing, decisions) - All documentation in English per ADL-007 - Record all existing architectural decisions in docs/bible/10-decisions.md - Slim README.md to user-facing quick start only - Replace CLAUDE.md with a single directive to read and follow the Bible - Remove absorbed files: REANIMATOR_EXPORT.md, docs/INTEGRATION_GUIDE.md, and all vendor parser README.md files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
224
docs/bible/06-parsers.md
Normal file
224
docs/bible/06-parsers.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# 06 — Parsers
|
||||
|
||||
## Framework
|
||||
|
||||
### Registration
|
||||
|
||||
Each vendor parser registers itself via Go's `init()` side-effect import pattern.
|
||||
|
||||
All registrations are collected in `internal/parser/vendors/vendors.go`:
|
||||
```go
|
||||
import (
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/supermicro"
|
||||
// etc.
|
||||
)
|
||||
```
|
||||
|
||||
### VendorParser interface
|
||||
|
||||
```go
|
||||
type VendorParser interface {
|
||||
Name() string // human-readable name
|
||||
Vendor() string // vendor identifier string
|
||||
Detect(files []ExtractedFile) int // confidence 0–100
|
||||
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
|
||||
}
|
||||
```
|
||||
|
||||
### Selection logic
|
||||
|
||||
All registered parsers run `Detect()` against the uploaded archive's file list.
|
||||
The parser with the **highest confidence score** is selected.
|
||||
Multiple parsers may return >0; only the top scorer is used.
|
||||
|
||||
### Adding a new vendor parser
|
||||
|
||||
1. `mkdir -p internal/parser/vendors/VENDORNAME`
|
||||
2. Copy `internal/parser/vendors/template/parser.go.template` as starting point.
|
||||
3. Implement `Detect()` and `Parse()`.
|
||||
4. Add blank import to `vendors/vendors.go`.
|
||||
|
||||
`Detect()` tips:
|
||||
- Look for unique filenames or directory names.
|
||||
- Check file content for vendor-specific markers.
|
||||
- Return 70+ only when confident; return 0 if clearly not a match.
|
||||
|
||||
### Parser versioning
|
||||
|
||||
Each parser file contains a `parserVersion` constant.
|
||||
Increment the version whenever parsing logic changes — this helps trace which
|
||||
version produced a given result.
|
||||
|
||||
---
|
||||
|
||||
## Vendor parsers
|
||||
|
||||
### Inspur / Kaytus (`inspur`)
|
||||
|
||||
**Status:** Ready. Tested on KR4268X2 (onekeylog format).
|
||||
|
||||
**Archive format:** `.tar.gz` onekeylog
|
||||
|
||||
**Primary source files:**
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `asset.json` | Base hardware inventory |
|
||||
| `component.log` | Component list |
|
||||
| `devicefrusdr.log` | FRU and SDR data |
|
||||
| `onekeylog/runningdata/redis-dump.rdb` | Runtime enrichment (optional) |
|
||||
|
||||
**Redis RDB enrichment** (applied conservatively — fills missing fields only):
|
||||
- GPU: `serial_number`, `firmware` (VBIOS/FW), runtime telemetry
|
||||
- NIC: firmware, serial, part number (when text logs leave fields empty)
|
||||
|
||||
**Module structure:**
|
||||
```
|
||||
inspur/
|
||||
parser.go — main parser + registration
|
||||
sdr.go — sensor/SDR parsing
|
||||
fru.go — FRU serial parsing
|
||||
asset.go — asset.json parsing
|
||||
syslog.go — syslog parsing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Supermicro (`supermicro`)
|
||||
|
||||
**Status:** Ready (v1.0.0). Tested on SYS-821GE-TNHR crash dumps.
|
||||
|
||||
**Archive format:** `.tgz` / `.tar.gz` / `.tar`
|
||||
|
||||
**Primary source file:** `CDump.txt` — JSON crashdump file
|
||||
|
||||
**Confidence:** +80 when `CDump.txt` contains `crash_data`, `METADATA`, `bmc_fw_ver` markers.
|
||||
|
||||
**Extracted data:**
|
||||
- CPUs: CPUID, core count, manufacturer (Intel), microcode version (as firmware field)
|
||||
- FRU: BMC firmware version, BIOS version, ME firmware version, CPU PPIN
|
||||
- Events: crashdump collection event + MCA errors
|
||||
|
||||
**MCA error detection:**
|
||||
- Bit 63 (Valid), Bit 61 (UC — uncorrected), Bit 60 (EN — enabled)
|
||||
- Corrected MCA errors → `Warning` severity
|
||||
- Uncorrected MCA errors → `Critical` severity
|
||||
|
||||
**Known limitations:**
|
||||
- TOR dump and extended MCA register data not yet parsed.
|
||||
- No CPU model name (only CPUID hex code available in crashdump format).
|
||||
|
||||
---
|
||||
|
||||
### NVIDIA HGX Field Diagnostics (`nvidia`)
|
||||
|
||||
**Status:** Ready (v1.1.0). Works with any server vendor.
|
||||
|
||||
**Archive format:** `.tar` / `.tar.gz`
|
||||
|
||||
**Confidence scoring:**
|
||||
|
||||
| File | Score |
|
||||
|------|-------|
|
||||
| `unified_summary.json` with "HGX Field Diag" marker | +40 |
|
||||
| `summary.json` | +20 |
|
||||
| `summary.csv` | +15 |
|
||||
| `gpu_fieldiag/` directory | +15 |
|
||||
|
||||
**Source files:**
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `output.log` | dmidecode — server manufacturer, model, serial number |
|
||||
| `unified_summary.json` | GPU details, NVSwitch devices, PCI addresses |
|
||||
| `summary.json` | Diagnostic test results and error codes |
|
||||
| `summary.csv` | Alternative test results format |
|
||||
|
||||
**Extracted data:**
|
||||
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
|
||||
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
|
||||
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
|
||||
|
||||
**Severity mapping:**
|
||||
- `info` — tests passed
|
||||
- `warning` — e.g. "Row remapping failed"
|
||||
- `critical` — error codes 300+
|
||||
|
||||
**Known limitations:**
|
||||
- Detailed logs in `gpu_fieldiag/*.log` are not parsed.
|
||||
- No CPU, memory, or storage extraction (not present in field diag archives).
|
||||
|
||||
---
|
||||
|
||||
### NVIDIA Bug Report (`nvidia_bug_report`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
|
||||
**File format:** `nvidia-bug-report-*.log.gz` (gzip-compressed text)
|
||||
|
||||
**Confidence:** 85 (high priority for matching filename pattern)
|
||||
|
||||
**Source sections parsed:**
|
||||
|
||||
| dmidecode section | Extracts |
|
||||
|-------------------|---------|
|
||||
| System Information | server serial, UUID, manufacturer, product name |
|
||||
| Processor Information | CPU model, serial, core/thread count, frequency |
|
||||
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
|
||||
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
|
||||
|
||||
| Other source | Extracts |
|
||||
|--------------|---------|
|
||||
| `lspci -vvv` (Ethernet/Network/IB) | NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
|
||||
| `/proc/driver/nvidia/gpus/*/information` | GPU model, BDF, UUID, VBIOS version, IRQ |
|
||||
| NVRM version line | NVIDIA driver version |
|
||||
|
||||
**Known limitations:**
|
||||
- Driver error/warning log lines not yet extracted.
|
||||
- GPU temperature/utilization metrics require additional parsing sections.
|
||||
|
||||
---
|
||||
|
||||
### XigmaNAS (`xigmanas`)
|
||||
|
||||
**Status:** Ready.
|
||||
|
||||
**Archive format:** Plain log files (FreeBSD-based NAS system)
|
||||
|
||||
**Detection:** Files named `xigmanas`, `system`, or `dmesg`; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
|
||||
|
||||
**Extracted data:**
|
||||
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
|
||||
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
|
||||
- Populates: `Hardware.Firmware`, `Hardware.CPUs`, `Hardware.Memory`, `Hardware.Storage`, `Sensors`
|
||||
|
||||
---
|
||||
|
||||
### Generic text fallback (`generic`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
|
||||
**Confidence:** 15 (lowest — only matches if no other parser scores higher)
|
||||
|
||||
**Purpose:** Fallback for any text file or single `.gz` file not matching a specific vendor.
|
||||
|
||||
**Behavior:**
|
||||
- If filename matches `nvidia-bug-report-*.log.gz`: extracts driver version and GPU list.
|
||||
- Otherwise: confirms file is text (not binary) and records a basic "Text File" event.
|
||||
|
||||
---
|
||||
|
||||
## Supported vendor matrix
|
||||
|
||||
| Vendor | ID | Status | Tested on |
|
||||
|--------|----|--------|-----------|
|
||||
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
|
||||
| Supermicro | `supermicro` | Ready | SYS-821GE-TNHR crashdump |
|
||||
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
|
||||
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
|
||||
| XigmaNAS | `xigmanas` | Ready | FreeBSD NAS logs |
|
||||
| Generic fallback | `generic` | Ready | Any text file |
|
||||
| Dell iDRAC | `dell` | Planned | — |
|
||||
| HPE iLO | `hpe` | Planned | — |
|
||||
| Lenovo XCC | `lenovo` | Planned | — |
|
||||
Reference in New Issue
Block a user