docs: introduce project Bible and consolidate all architecture documentation

- Create docs/bible/ with 10 structured chapters (overview, architecture,
  API, data models, collectors, parsers, exporters, build, testing, decisions)
- All documentation in English per ADL-007
- Record all existing architectural decisions in docs/bible/10-decisions.md
- Slim README.md to user-facing quick start only
- Replace CLAUDE.md with a single directive to read and follow the Bible
- Remove absorbed files: REANIMATOR_EXPORT.md, docs/INTEGRATION_GUIDE.md,
  and all vendor parser README.md files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mikhail Chusavitin
2026-02-20 14:15:35 +03:00
parent 82ee513835
commit fcd57c1ba9
21 changed files with 1289 additions and 2391 deletions

224
docs/bible/06-parsers.md Normal file
View File

@@ -0,0 +1,224 @@
# 06 — Parsers
## Framework
### Registration
Each vendor parser registers itself via Go's `init()` side-effect import pattern.
All registrations are collected in `internal/parser/vendors/vendors.go`:
```go
import (
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/supermicro"
// etc.
)
```
### VendorParser interface
```go
type VendorParser interface {
Name() string // human-readable name
Vendor() string // vendor identifier string
Detect(files []ExtractedFile) int // confidence 0100
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
}
```
### Selection logic
All registered parsers run `Detect()` against the uploaded archive's file list.
The parser with the **highest confidence score** is selected.
Multiple parsers may return >0; only the top scorer is used.
### Adding a new vendor parser
1. `mkdir -p internal/parser/vendors/VENDORNAME`
2. Copy `internal/parser/vendors/template/parser.go.template` as starting point.
3. Implement `Detect()` and `Parse()`.
4. Add blank import to `vendors/vendors.go`.
`Detect()` tips:
- Look for unique filenames or directory names.
- Check file content for vendor-specific markers.
- Return 70+ only when confident; return 0 if clearly not a match.
### Parser versioning
Each parser file contains a `parserVersion` constant.
Increment the version whenever parsing logic changes — this helps trace which
version produced a given result.
---
## Vendor parsers
### Inspur / Kaytus (`inspur`)
**Status:** Ready. Tested on KR4268X2 (onekeylog format).
**Archive format:** `.tar.gz` onekeylog
**Primary source files:**
| File | Content |
|------|---------|
| `asset.json` | Base hardware inventory |
| `component.log` | Component list |
| `devicefrusdr.log` | FRU and SDR data |
| `onekeylog/runningdata/redis-dump.rdb` | Runtime enrichment (optional) |
**Redis RDB enrichment** (applied conservatively — fills missing fields only):
- GPU: `serial_number`, `firmware` (VBIOS/FW), runtime telemetry
- NIC: firmware, serial, part number (when text logs leave fields empty)
**Module structure:**
```
inspur/
parser.go — main parser + registration
sdr.go — sensor/SDR parsing
fru.go — FRU serial parsing
asset.go — asset.json parsing
syslog.go — syslog parsing
```
---
### Supermicro (`supermicro`)
**Status:** Ready (v1.0.0). Tested on SYS-821GE-TNHR crash dumps.
**Archive format:** `.tgz` / `.tar.gz` / `.tar`
**Primary source file:** `CDump.txt` — JSON crashdump file
**Confidence:** +80 when `CDump.txt` contains `crash_data`, `METADATA`, `bmc_fw_ver` markers.
**Extracted data:**
- CPUs: CPUID, core count, manufacturer (Intel), microcode version (as firmware field)
- FRU: BMC firmware version, BIOS version, ME firmware version, CPU PPIN
- Events: crashdump collection event + MCA errors
**MCA error detection:**
- Bit 63 (Valid), Bit 61 (UC — uncorrected), Bit 60 (EN — enabled)
- Corrected MCA errors → `Warning` severity
- Uncorrected MCA errors → `Critical` severity
**Known limitations:**
- TOR dump and extended MCA register data not yet parsed.
- No CPU model name (only CPUID hex code available in crashdump format).
---
### NVIDIA HGX Field Diagnostics (`nvidia`)
**Status:** Ready (v1.1.0). Works with any server vendor.
**Archive format:** `.tar` / `.tar.gz`
**Confidence scoring:**
| File | Score |
|------|-------|
| `unified_summary.json` with "HGX Field Diag" marker | +40 |
| `summary.json` | +20 |
| `summary.csv` | +15 |
| `gpu_fieldiag/` directory | +15 |
**Source files:**
| File | Content |
|------|---------|
| `output.log` | dmidecode — server manufacturer, model, serial number |
| `unified_summary.json` | GPU details, NVSwitch devices, PCI addresses |
| `summary.json` | Diagnostic test results and error codes |
| `summary.csv` | Alternative test results format |
**Extracted data:**
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
**Severity mapping:**
- `info` — tests passed
- `warning` — e.g. "Row remapping failed"
- `critical` — error codes 300+
**Known limitations:**
- Detailed logs in `gpu_fieldiag/*.log` are not parsed.
- No CPU, memory, or storage extraction (not present in field diag archives).
---
### NVIDIA Bug Report (`nvidia_bug_report`)
**Status:** Ready (v1.0.0).
**File format:** `nvidia-bug-report-*.log.gz` (gzip-compressed text)
**Confidence:** 85 (high priority for matching filename pattern)
**Source sections parsed:**
| dmidecode section | Extracts |
|-------------------|---------|
| System Information | server serial, UUID, manufacturer, product name |
| Processor Information | CPU model, serial, core/thread count, frequency |
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
| Other source | Extracts |
|--------------|---------|
| `lspci -vvv` (Ethernet/Network/IB) | NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
| `/proc/driver/nvidia/gpus/*/information` | GPU model, BDF, UUID, VBIOS version, IRQ |
| NVRM version line | NVIDIA driver version |
**Known limitations:**
- Driver error/warning log lines not yet extracted.
- GPU temperature/utilization metrics require additional parsing sections.
---
### XigmaNAS (`xigmanas`)
**Status:** Ready.
**Archive format:** Plain log files (FreeBSD-based NAS system)
**Detection:** Files named `xigmanas`, `system`, or `dmesg`; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
**Extracted data:**
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
- Populates: `Hardware.Firmware`, `Hardware.CPUs`, `Hardware.Memory`, `Hardware.Storage`, `Sensors`
---
### Generic text fallback (`generic`)
**Status:** Ready (v1.0.0).
**Confidence:** 15 (lowest — only matches if no other parser scores higher)
**Purpose:** Fallback for any text file or single `.gz` file not matching a specific vendor.
**Behavior:**
- If filename matches `nvidia-bug-report-*.log.gz`: extracts driver version and GPU list.
- Otherwise: confirms file is text (not binary) and records a basic "Text File" event.
---
## Supported vendor matrix
| Vendor | ID | Status | Tested on |
|--------|----|--------|-----------|
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
| Supermicro | `supermicro` | Ready | SYS-821GE-TNHR crashdump |
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
| XigmaNAS | `xigmanas` | Ready | FreeBSD NAS logs |
| Generic fallback | `generic` | Ready | Any text file |
| Dell iDRAC | `dell` | Planned | — |
| HPE iLO | `hpe` | Planned | — |
| Lenovo XCC | `lenovo` | Planned | — |