Add shared bible submodule, rename local bible to bible-local
- Add bible.git as submodule at bible/ - Move docs/bible/ → bible-local/ (project-specific architecture) - Update CLAUDE.md to reference both bible/ and bible-local/ - Add AGENTS.md for Codex with same structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,35 +0,0 @@
|
||||
# 01 — Overview
|
||||
|
||||
## What is LOGPile?
|
||||
|
||||
LOGPile is a standalone Go application for BMC (Baseboard Management Controller)
|
||||
diagnostics analysis with an embedded web UI.
|
||||
It runs as a single binary with no external file dependencies.
|
||||
|
||||
## Operating modes
|
||||
|
||||
| Mode | Entry point | Description |
|
||||
|------|-------------|-------------|
|
||||
| **Offline / archive** | `POST /api/upload` | Upload a vendor diagnostic archive or a JSON snapshot; parse and display in UI |
|
||||
| **Live / Redfish** | `POST /api/collect` | Connect to a live BMC via Redfish API, collect hardware inventory, display and export |
|
||||
|
||||
Both modes produce the same in-memory `AnalysisResult` structure and expose it
|
||||
through the same API and UI.
|
||||
|
||||
## Key capabilities
|
||||
|
||||
- Single self-contained binary with embedded HTML/JS/CSS (no static file serving required).
|
||||
- Vendor archive parsing: Inspur/Kaytus, Supermicro, NVIDIA HGX Field Diagnostics,
|
||||
NVIDIA Bug Report, Unraid, XigmaNAS, Generic text fallback.
|
||||
- Live Redfish collection with async progress tracking.
|
||||
- Normalized hardware inventory: CPU / RAM / Storage / GPU / PSU / NIC / PCIe / Firmware.
|
||||
- Raw `redfish_tree` snapshot stored in `RawPayloads` for future offline re-analysis.
|
||||
- Re-upload of a JSON snapshot for offline work (`/api/upload` accepts `AnalysisResult` JSON).
|
||||
- Export in CSV, JSON (full `AnalysisResult`), and Reanimator format.
|
||||
- PCI device model resolution via embedded `pci.ids` (no hardcoded model strings).
|
||||
|
||||
## Non-goals (current scope)
|
||||
|
||||
- No persistent storage — all state is in-memory per process lifetime.
|
||||
- IPMI collector is a mock scaffold only; real IPMI support is not implemented.
|
||||
- No authentication layer on the HTTP server.
|
||||
@@ -1,115 +0,0 @@
|
||||
# 02 — Architecture
|
||||
|
||||
## Runtime stack
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|------------|
|
||||
| Language | Go 1.22+ |
|
||||
| HTTP | `net/http`, `http.ServeMux` |
|
||||
| UI | Embedded via `//go:embed` in `web/embed.go` (templates + static assets) |
|
||||
| State | In-memory only — no database |
|
||||
| Build | `CGO_ENABLED=0`, single static binary |
|
||||
|
||||
Default port: **8082**
|
||||
|
||||
## Directory structure
|
||||
|
||||
```
|
||||
cmd/logpile/main.go # Binary entry point, CLI flag parsing
|
||||
internal/
|
||||
collector/ # Live data collectors
|
||||
registry.go # Collector registration
|
||||
redfish.go # Redfish connector (real implementation)
|
||||
ipmi_mock.go # IPMI mock connector (scaffold)
|
||||
types.go # Connector request/progress contracts
|
||||
parser/ # Archive parsers
|
||||
parser.go # BMCParser (dispatcher) + parse orchestration
|
||||
archive.go # Archive extraction helpers
|
||||
registry.go # Parser registry + detect/selection
|
||||
interface.go # VendorParser interface
|
||||
vendors/ # Vendor-specific parser modules
|
||||
vendors.go # Import-side-effect registrations
|
||||
inspur/
|
||||
supermicro/
|
||||
nvidia/
|
||||
nvidia_bug_report/
|
||||
unraid/
|
||||
xigmanas/
|
||||
generic/
|
||||
pciids/ # PCI IDs lookup (embedded pci.ids)
|
||||
server/ # HTTP layer
|
||||
server.go # Server struct, route registration
|
||||
handlers.go # All HTTP handler functions
|
||||
exporter/ # Export formatters
|
||||
exporter.go # CSV + JSON exporters
|
||||
reanimator_models.go
|
||||
reanimator_converter.go
|
||||
models/ # Shared data contracts
|
||||
web/
|
||||
embed.go # go:embed directive
|
||||
templates/ # HTML templates
|
||||
static/ # JS / CSS
|
||||
js/app.js # Frontend — API contract consumer
|
||||
```
|
||||
|
||||
## In-memory state
|
||||
|
||||
The `Server` struct in `internal/server/server.go` holds:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `result` | `*models.AnalysisResult` | Current parsed/collected dataset |
|
||||
| `detectedVendor` | `string` | Vendor identifier from last parse |
|
||||
| `jobManager` | `*JobManager` | Tracks live collect job status/logs |
|
||||
| `collectors` | `*collector.Registry` | Registered live collection connectors |
|
||||
|
||||
State is replaced atomically on successful upload or collect.
|
||||
On a failed/canceled collect, the previous `result` is preserved unchanged.
|
||||
|
||||
## Upload flow (`POST /api/upload`)
|
||||
|
||||
```
|
||||
multipart form field: "archive"
|
||||
│
|
||||
├─ file looks like JSON?
|
||||
│ └─ parse as models.AnalysisResult snapshot → store in Server.result
|
||||
│
|
||||
└─ otherwise
|
||||
└─ parser.NewBMCParser().ParseFromReader(...)
|
||||
│
|
||||
├─ try all registered vendor parsers (highest confidence wins)
|
||||
└─ result → store in Server.result
|
||||
```
|
||||
|
||||
## Live collect flow (`POST /api/collect`)
|
||||
|
||||
```
|
||||
validate request (host / protocol / port / username / auth_type / tls_mode)
|
||||
│
|
||||
└─ launch async job
|
||||
│
|
||||
├─ progress callback → job log (queryable via GET /api/collect/{id})
|
||||
│
|
||||
├─ success:
|
||||
│ set source metadata (source_type=api, protocol, host, date)
|
||||
│ store result in Server.result
|
||||
│
|
||||
└─ failure / cancel:
|
||||
previous Server.result unchanged
|
||||
```
|
||||
|
||||
Job lifecycle states: `queued → running → success | failed | canceled`
|
||||
|
||||
## PCI IDs lookup
|
||||
|
||||
Load/override order (`LOGPILE_PCI_IDS_PATH` has highest priority because it is loaded last):
|
||||
|
||||
1. Embedded `internal/parser/vendors/pciids/pci.ids` (base dataset compiled into binary)
|
||||
2. `./pci.ids`
|
||||
3. `/usr/share/hwdata/pci.ids`
|
||||
4. `/usr/share/misc/pci.ids`
|
||||
5. `/opt/homebrew/share/pciids/pci.ids`
|
||||
6. Paths from `LOGPILE_PCI_IDS_PATH` (colon-separated on Unix; later loaded, override same IDs)
|
||||
|
||||
This means unknown GPU/NIC model strings can be updated by refreshing `pci.ids`
|
||||
without any code change.
|
||||
@@ -1,184 +0,0 @@
|
||||
# 03 — API Reference
|
||||
|
||||
## Conventions
|
||||
|
||||
- All endpoints under `/api/`.
|
||||
- Request bodies: `application/json` or `multipart/form-data` where noted.
|
||||
- Responses: `application/json` unless file download.
|
||||
- Export filenames follow pattern: `YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
|
||||
|
||||
---
|
||||
|
||||
## Upload & Data Input
|
||||
|
||||
### `POST /api/upload`
|
||||
|
||||
Upload a vendor diagnostic archive or a JSON snapshot.
|
||||
|
||||
**Request:** `multipart/form-data`, field name `archive`.
|
||||
Server-side multipart limit: **100 MiB**.
|
||||
|
||||
Accepted inputs:
|
||||
- `.tar`, `.tar.gz`, `.tgz` — vendor diagnostic archives
|
||||
- `.txt` — plain text files
|
||||
- JSON file containing a serialized `AnalysisResult` — re-loaded as-is
|
||||
|
||||
**Response:** `200 OK` with parsed result summary, or `4xx`/`5xx` on error.
|
||||
|
||||
---
|
||||
|
||||
## Live Collection
|
||||
|
||||
### `POST /api/collect`
|
||||
|
||||
Start a live collection job (`redfish` or `ipmi`).
|
||||
|
||||
**Request body:**
|
||||
```json
|
||||
{
|
||||
"host": "bmc01.example.local",
|
||||
"protocol": "redfish",
|
||||
"port": 443,
|
||||
"username": "admin",
|
||||
"auth_type": "password",
|
||||
"password": "secret",
|
||||
"tls_mode": "insecure"
|
||||
}
|
||||
```
|
||||
|
||||
Supported values:
|
||||
- `protocol`: `redfish` | `ipmi`
|
||||
- `auth_type`: `password` | `token`
|
||||
- `tls_mode`: `strict` | `insecure`
|
||||
|
||||
**Response:** `202 Accepted`
|
||||
```json
|
||||
{
|
||||
"job_id": "job_a1b2c3d4e5f6",
|
||||
"status": "queued",
|
||||
"message": "Collection job accepted",
|
||||
"created_at": "2026-02-23T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
Validation behavior:
|
||||
- `400 Bad Request` for invalid JSON
|
||||
- `422 Unprocessable Entity` for semantic validation errors (missing/invalid fields)
|
||||
|
||||
### `GET /api/collect/{id}`
|
||||
|
||||
Poll job status and progress log.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"job_id": "job_a1b2c3d4e5f6",
|
||||
"status": "running",
|
||||
"progress": 55,
|
||||
"logs": ["..."],
|
||||
"created_at": "2026-02-23T12:00:00Z",
|
||||
"updated_at": "2026-02-23T12:00:10Z"
|
||||
}
|
||||
```
|
||||
|
||||
Status values: `queued` | `running` | `success` | `failed` | `canceled`
|
||||
|
||||
### `POST /api/collect/{id}/cancel`
|
||||
|
||||
Cancel a running job.
|
||||
|
||||
---
|
||||
|
||||
## Data Queries
|
||||
|
||||
### `GET /api/status`
|
||||
|
||||
Returns source metadata for the current dataset.
|
||||
|
||||
```json
|
||||
{
|
||||
"loaded": true,
|
||||
"filename": "redfish://bmc01.example.local",
|
||||
"vendor": "redfish",
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "bmc01.example.local",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"stats": { "events": 0, "sensors": 0, "fru": 0 }
|
||||
}
|
||||
```
|
||||
|
||||
`source_type`: `archive` | `api`
|
||||
|
||||
When no dataset is loaded, response is `{ "loaded": false }`.
|
||||
|
||||
### `GET /api/config`
|
||||
|
||||
Returns source metadata plus:
|
||||
- `hardware.board`
|
||||
- `hardware.firmware`
|
||||
- canonical `hardware.devices`
|
||||
- computed `specification` summary lines
|
||||
|
||||
### `GET /api/events`
|
||||
|
||||
Returns parsed diagnostic events.
|
||||
|
||||
### `GET /api/sensors`
|
||||
|
||||
Returns sensor readings (temperatures, voltages, fan speeds).
|
||||
|
||||
### `GET /api/serials`
|
||||
|
||||
Returns serial numbers built from canonical `hardware.devices`.
|
||||
|
||||
### `GET /api/firmware`
|
||||
|
||||
Returns firmware versions built from canonical `hardware.devices`.
|
||||
|
||||
### `GET /api/parsers`
|
||||
|
||||
Returns list of registered vendor parsers with their identifiers.
|
||||
|
||||
---
|
||||
|
||||
## Export
|
||||
|
||||
### `GET /api/export/csv`
|
||||
|
||||
Download serial numbers as CSV.
|
||||
|
||||
### `GET /api/export/json`
|
||||
|
||||
Download full `AnalysisResult` as JSON (includes `raw_payloads`).
|
||||
|
||||
### `GET /api/export/reanimator`
|
||||
|
||||
Download hardware data in Reanimator format for asset tracking integration.
|
||||
See [`07-exporters.md`](07-exporters.md) for full format spec.
|
||||
|
||||
---
|
||||
|
||||
## Management
|
||||
|
||||
### `DELETE /api/clear`
|
||||
|
||||
Clear current in-memory dataset.
|
||||
|
||||
### `POST /api/shutdown`
|
||||
|
||||
Gracefully shut down the server process.
|
||||
This endpoint terminates the current process after responding.
|
||||
|
||||
---
|
||||
|
||||
## Source metadata fields
|
||||
|
||||
Fields present in `/api/status` and `/api/config`:
|
||||
|
||||
| Field | Values |
|
||||
|-------|--------|
|
||||
| `source_type` | `archive` \| `api` |
|
||||
| `protocol` | `redfish` \| `ipmi` (may be empty for archive uploads) |
|
||||
| `target_host` | IP or hostname |
|
||||
| `collected_at` | RFC3339 timestamp |
|
||||
@@ -1,104 +0,0 @@
|
||||
# 04 — Data Models
|
||||
|
||||
## AnalysisResult
|
||||
|
||||
`internal/models/` — the central data contract shared by parsers, collectors, exporters, and the HTTP layer.
|
||||
|
||||
**Stability rule:** Never break the JSON shape of `AnalysisResult`.
|
||||
Backward-compatible additions are allowed; removals or renames are not.
|
||||
|
||||
Key top-level fields:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `filename` | `string` | Uploaded filename or generated live source identifier |
|
||||
| `source_type` | `string` | `archive` or `api` |
|
||||
| `protocol` | `string` | `redfish`, `ipmi`, or empty for archive uploads |
|
||||
| `target_host` | `string` | BMC host for live collection |
|
||||
| `collected_at` | `time.Time` | Upload/collection timestamp |
|
||||
| `hardware` | `*HardwareConfig` | All normalized hardware inventory |
|
||||
| `events` | `[]Event` | Diagnostic events from parsers |
|
||||
| `fru` | `[]FRUInfo` | FRU/SDR-derived inventory details |
|
||||
| `sensors` | `[]SensorReading` | Sensor readings |
|
||||
| `raw_payloads` | `map[string]any` | Raw vendor data (e.g. `redfish_tree`) |
|
||||
|
||||
`raw_payloads` is the durable source for offline re-analysis (especially for Redfish).
|
||||
Normalized fields should be treated as derivable output from raw source data.
|
||||
|
||||
### Hardware sub-structure
|
||||
|
||||
```
|
||||
HardwareConfig
|
||||
├── board BoardInfo — server/motherboard identity
|
||||
├── devices []HardwareDevice — CANONICAL INVENTORY (see below)
|
||||
├── cpus []CPU
|
||||
├── memory []MemoryDIMM
|
||||
├── storage []Storage
|
||||
├── volumes []StorageVolume — logical RAID/VROC volumes
|
||||
├── pcie_devices []PCIeDevice
|
||||
├── gpus []GPU
|
||||
├── network_adapters []NetworkAdapter
|
||||
├── network_cards []NIC (legacy/alternate source field)
|
||||
├── power_supplies []PSU
|
||||
└── firmware []FirmwareInfo
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Canonical Device Repository (`hardware.devices`)
|
||||
|
||||
`hardware.devices` is the **single source of truth** for hardware inventory.
|
||||
|
||||
### Rules — must not be violated
|
||||
|
||||
1. All UI tabs displaying hardware components **must read from `hardware.devices`**.
|
||||
2. The Device Inventory tab shows kinds: `pcie`, `storage`, `gpu`, `network`.
|
||||
3. The Reanimator exporter **must use the same `hardware.devices`** as the UI.
|
||||
4. Any discrepancy between UI data and Reanimator export data is a **bug**.
|
||||
5. New hardware attributes must be added to the canonical device schema **first**,
|
||||
then mapped to Reanimator/UI — never the other way around.
|
||||
6. The exporter should group/filter canonical records by section, not rebuild data
|
||||
from multiple sources.
|
||||
|
||||
### Deduplication logic (applied once by repository builder)
|
||||
|
||||
| Priority | Key used |
|
||||
|----------|----------|
|
||||
| 1 | `serial_number` — usable (not empty, not `N/A`, `NA`, `NONE`, `NULL`, `UNKNOWN`, `-`) |
|
||||
| 2 | `bdf` — PCI Bus:Device.Function address |
|
||||
| 3 | No merge — records remain distinct if both serial and bdf are absent |
|
||||
|
||||
### Device schema alignment
|
||||
|
||||
Keep `hardware.devices` schema as close as possible to Reanimator JSON field names.
|
||||
This minimizes translation logic in the exporter and prevents drift.
|
||||
|
||||
---
|
||||
|
||||
## Source metadata fields (stored directly on `AnalysisResult`)
|
||||
|
||||
Carried by both `/api/status` and `/api/config`:
|
||||
|
||||
```json
|
||||
{
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "10.0.0.1",
|
||||
"collected_at": "2026-02-10T15:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
Valid `source_type` values: `archive`, `api`
|
||||
Valid `protocol` values: `redfish`, `ipmi` (empty is allowed for archive uploads)
|
||||
|
||||
---
|
||||
|
||||
## Raw Export Package (reopenable artifact)
|
||||
|
||||
`Export Raw Data` does not merely dump `AnalysisResult`; it emits a reopenable raw package
|
||||
(JSON or ZIP bundle) that carries source data required for re-analysis.
|
||||
|
||||
Design rules:
|
||||
- raw source is authoritative (`redfish_tree` or original file bytes)
|
||||
- imports must re-analyze from raw source
|
||||
- parsed field snapshots included in bundles are diagnostic artifacts, not the source of truth
|
||||
@@ -1,109 +0,0 @@
|
||||
# 05 — Collectors
|
||||
|
||||
Collectors live in `internal/collector/`.
|
||||
|
||||
Core files:
|
||||
- `internal/collector/registry.go` — connector registry (`redfish`, `ipmi`)
|
||||
- `internal/collector/redfish.go` — real Redfish connector
|
||||
- `internal/collector/ipmi_mock.go` — IPMI mock connector scaffold
|
||||
- `internal/collector/types.go` — request/progress contracts
|
||||
|
||||
---
|
||||
|
||||
## Redfish Collector (`redfish`)
|
||||
|
||||
**Status:** Production-ready.
|
||||
|
||||
### Request contract (from server)
|
||||
|
||||
Passed through from `/api/collect` after validation:
|
||||
- `host`, `port`, `username`
|
||||
- `auth_type=password|token` (+ matching credential field)
|
||||
- `tls_mode=strict|insecure`
|
||||
|
||||
### Discovery
|
||||
|
||||
Dynamic — does not assume fixed paths. Discovers:
|
||||
- `Systems` collection → per-system resources
|
||||
- `Chassis` collection → enclosure/board data
|
||||
- `Managers` collection → BMC/firmware info
|
||||
|
||||
### Collected data
|
||||
|
||||
| Category | Notes |
|
||||
|----------|-------|
|
||||
| CPU | Model, cores, threads, socket, status |
|
||||
| Memory | DIMM slot, size, type, speed, serial, manufacturer |
|
||||
| Storage | Slot, type, model, serial, firmware, interface, status |
|
||||
| GPU | Detected via PCIe class + NVIDIA vendor ID |
|
||||
| PSU | Model, serial, wattage, firmware, telemetry (input/output power, voltage) |
|
||||
| NIC | Model, serial, port count, BDF |
|
||||
| PCIe | Slot, vendor_id, device_id, BDF, link width/speed |
|
||||
| Firmware | BIOS, BMC versions |
|
||||
|
||||
### Raw snapshot
|
||||
|
||||
Full Redfish response tree is stored in `result.RawPayloads["redfish_tree"]`.
|
||||
This allows future offline re-analysis without re-collecting from a live BMC.
|
||||
|
||||
### Unified Redfish analysis pipeline (live == replay)
|
||||
|
||||
LOGPile uses a **single Redfish analyzer path**:
|
||||
|
||||
1. Live collector crawls the Redfish API and builds `raw_payloads.redfish_tree`
|
||||
2. Parsed result is produced by replaying that tree through the same analyzer used by raw import
|
||||
|
||||
This guarantees that live collection and `Export Raw Data` re-open/re-analyze produce the same
|
||||
normalized output for the same `redfish_tree`.
|
||||
|
||||
### Snapshot crawler behavior (important)
|
||||
|
||||
The Redfish snapshot crawler is intentionally:
|
||||
- **bounded** (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
|
||||
- **prioritized** (PCIe, Fabrics, FirmwareInventory, Storage, PowerSubsystem, ThermalSubsystem)
|
||||
- **tolerant** (skips noisy expected failures, strips `#fragment` from `@odata.id`)
|
||||
|
||||
Design notes:
|
||||
- Queue capacity is sized to snapshot cap to avoid worker deadlocks on large trees.
|
||||
- UI progress is coarse and human-readable; detailed per-request diagnostics are available via debug logs.
|
||||
- `LOGPILE_REDFISH_DEBUG=1` and `LOGPILE_REDFISH_SNAPSHOT_DEBUG=1` enable console diagnostics.
|
||||
|
||||
### Parsing guidelines
|
||||
|
||||
When adding Redfish mappings, follow these principles:
|
||||
- Support alternate collection paths (resources may appear at different odata URLs).
|
||||
- Follow `@odata.id` references and handle embedded `Members` arrays.
|
||||
- Prefer **raw-tree replay compatibility**: if live collector adds a fallback/probe, replay analyzer must mirror it.
|
||||
- Deduplicate by serial / BDF / slot+model (in that priority order).
|
||||
- Prefer tolerant/fallback parsing — missing fields should be silently skipped,
|
||||
not cause the whole collection to fail.
|
||||
|
||||
### Vendor-specific storage fallbacks (Supermicro and similar)
|
||||
|
||||
When standard `Storage/.../Drives` collections are empty, collector/replay may recover drives via:
|
||||
- `Storage.Links.Enclosures[*] -> .../Drives`
|
||||
- direct probing of finite `Disk.Bay` candidates (`Disk.Bay.0`, `Disk.Bay0`, `.../0`)
|
||||
|
||||
This is required for some BMCs that publish drive inventory in vendor-specific paths while leaving
|
||||
standard collections empty.
|
||||
|
||||
### PSU source preference (newer Redfish)
|
||||
|
||||
PSU inventory source order:
|
||||
1. `Chassis/*/PowerSubsystem/PowerSupplies` (preferred on X14+/newer Redfish)
|
||||
2. `Chassis/*/Power` (legacy fallback)
|
||||
|
||||
### Progress reporting
|
||||
|
||||
The collector emits progress log entries at each stage (connecting, enumerating systems,
|
||||
collecting CPUs, etc.) so the UI can display meaningful status.
|
||||
Current progress message strings are user-facing and may be localized.
|
||||
|
||||
---
|
||||
|
||||
## IPMI Collector (`ipmi`)
|
||||
|
||||
**Status:** Mock scaffold only — not implemented.
|
||||
|
||||
Registered in the collector registry but returns placeholder data.
|
||||
Real IPMI support is a future work item.
|
||||
@@ -1,241 +0,0 @@
|
||||
# 06 — Parsers
|
||||
|
||||
## Framework
|
||||
|
||||
### Registration
|
||||
|
||||
Each vendor parser registers itself via Go's `init()` side-effect import pattern.
|
||||
|
||||
All registrations are collected in `internal/parser/vendors/vendors.go`:
|
||||
```go
|
||||
import (
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/supermicro"
|
||||
// etc.
|
||||
)
|
||||
```
|
||||
|
||||
### VendorParser interface
|
||||
|
||||
```go
|
||||
type VendorParser interface {
|
||||
Name() string // human-readable name
|
||||
Vendor() string // vendor identifier string
|
||||
Version() string // parser version (increment on logic changes)
|
||||
Detect(files []ExtractedFile) int // confidence 0–100
|
||||
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
|
||||
}
|
||||
```
|
||||
|
||||
### Selection logic
|
||||
|
||||
All registered parsers run `Detect()` against the uploaded archive's file list.
|
||||
The parser with the **highest confidence score** is selected.
|
||||
Multiple parsers may return >0; only the top scorer is used.
|
||||
|
||||
### Adding a new vendor parser
|
||||
|
||||
1. `mkdir -p internal/parser/vendors/VENDORNAME`
|
||||
2. Copy `internal/parser/vendors/template/parser.go.template` as starting point.
|
||||
3. Implement `Detect()` and `Parse()`.
|
||||
4. Add blank import to `vendors/vendors.go`.
|
||||
|
||||
`Detect()` tips:
|
||||
- Look for unique filenames or directory names.
|
||||
- Check file content for vendor-specific markers.
|
||||
- Return 70+ only when confident; return 0 if clearly not a match.
|
||||
|
||||
### Parser versioning
|
||||
|
||||
Each parser file contains a `parserVersion` constant.
|
||||
Increment the version whenever parsing logic changes — this helps trace which
|
||||
version produced a given result.
|
||||
|
||||
---
|
||||
|
||||
## Vendor parsers
|
||||
|
||||
### Inspur / Kaytus (`inspur`)
|
||||
|
||||
**Status:** Ready. Tested on KR4268X2 (onekeylog format).
|
||||
|
||||
**Archive format:** `.tar.gz` onekeylog
|
||||
|
||||
**Primary source files:**
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `asset.json` | Base hardware inventory |
|
||||
| `component.log` | Component list |
|
||||
| `devicefrusdr.log` | FRU and SDR data |
|
||||
| `onekeylog/runningdata/redis-dump.rdb` | Runtime enrichment (optional) |
|
||||
|
||||
**Redis RDB enrichment** (applied conservatively — fills missing fields only):
|
||||
- GPU: `serial_number`, `firmware` (VBIOS/FW), runtime telemetry
|
||||
- NIC: firmware, serial, part number (when text logs leave fields empty)
|
||||
|
||||
**Module structure:**
|
||||
```
|
||||
inspur/
|
||||
parser.go — main parser + registration
|
||||
sdr.go — sensor/SDR parsing
|
||||
fru.go — FRU serial parsing
|
||||
asset.go — asset.json parsing
|
||||
syslog.go — syslog parsing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Supermicro (`supermicro`)
|
||||
|
||||
**Status:** Ready (v1.0.0). Tested on SYS-821GE-TNHR crash dumps.
|
||||
|
||||
**Archive format:** `.tgz` / `.tar.gz` / `.tar`
|
||||
|
||||
**Primary source file:** `CDump.txt` — JSON crashdump file
|
||||
|
||||
**Confidence:** +80 when `CDump.txt` contains `crash_data`, `METADATA`, `bmc_fw_ver` markers.
|
||||
|
||||
**Extracted data:**
|
||||
- CPUs: CPUID, core count, manufacturer (Intel), microcode version (as firmware field)
|
||||
- FRU: BMC firmware version, BIOS version, ME firmware version, CPU PPIN
|
||||
- Events: crashdump collection event + MCA errors
|
||||
|
||||
**MCA error detection:**
|
||||
- Bit 63 (Valid), Bit 61 (UC — uncorrected), Bit 60 (EN — enabled)
|
||||
- Corrected MCA errors → `Warning` severity
|
||||
- Uncorrected MCA errors → `Critical` severity
|
||||
|
||||
**Known limitations:**
|
||||
- TOR dump and extended MCA register data not yet parsed.
|
||||
- No CPU model name (only CPUID hex code available in crashdump format).
|
||||
|
||||
---
|
||||
|
||||
### NVIDIA HGX Field Diagnostics (`nvidia`)
|
||||
|
||||
**Status:** Ready (v1.1.0). Works with any server vendor.
|
||||
|
||||
**Archive format:** `.tar` / `.tar.gz`
|
||||
|
||||
**Confidence scoring:**
|
||||
|
||||
| File | Score |
|
||||
|------|-------|
|
||||
| `unified_summary.json` with "HGX Field Diag" marker | +40 |
|
||||
| `summary.json` | +20 |
|
||||
| `summary.csv` | +15 |
|
||||
| `gpu_fieldiag/` directory | +15 |
|
||||
|
||||
**Source files:**
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `output.log` | dmidecode — server manufacturer, model, serial number |
|
||||
| `unified_summary.json` | GPU details, NVSwitch devices, PCI addresses |
|
||||
| `summary.json` | Diagnostic test results and error codes |
|
||||
| `summary.csv` | Alternative test results format |
|
||||
|
||||
**Extracted data:**
|
||||
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
|
||||
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
|
||||
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
|
||||
|
||||
**Severity mapping:**
|
||||
- `info` — tests passed
|
||||
- `warning` — e.g. "Row remapping failed"
|
||||
- `critical` — error codes 300+
|
||||
|
||||
**Known limitations:**
|
||||
- Detailed logs in `gpu_fieldiag/*.log` are not parsed.
|
||||
- No CPU, memory, or storage extraction (not present in field diag archives).
|
||||
|
||||
---
|
||||
|
||||
### NVIDIA Bug Report (`nvidia_bug_report`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
|
||||
**File format:** `nvidia-bug-report-*.log.gz` (gzip-compressed text)
|
||||
|
||||
**Confidence:** 85 (high priority for matching filename pattern)
|
||||
|
||||
**Source sections parsed:**
|
||||
|
||||
| dmidecode section | Extracts |
|
||||
|-------------------|---------|
|
||||
| System Information | server serial, UUID, manufacturer, product name |
|
||||
| Processor Information | CPU model, serial, core/thread count, frequency |
|
||||
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
|
||||
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
|
||||
|
||||
| Other source | Extracts |
|
||||
|--------------|---------|
|
||||
| `lspci -vvv` (Ethernet/Network/IB) | NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
|
||||
| `/proc/driver/nvidia/gpus/*/information` | GPU model, BDF, UUID, VBIOS version, IRQ |
|
||||
| NVRM version line | NVIDIA driver version |
|
||||
|
||||
**Known limitations:**
|
||||
- Driver error/warning log lines not yet extracted.
|
||||
- GPU temperature/utilization metrics require additional parsing sections.
|
||||
|
||||
---
|
||||
|
||||
### XigmaNAS (`xigmanas`)
|
||||
|
||||
**Status:** Ready.
|
||||
|
||||
**Archive format:** Plain log files (FreeBSD-based NAS system)
|
||||
|
||||
**Detection:** Files named `xigmanas`, `system`, or `dmesg`; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
|
||||
|
||||
**Extracted data:**
|
||||
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
|
||||
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
|
||||
- Populates: `Hardware.Firmware`, `Hardware.CPUs`, `Hardware.Memory`, `Hardware.Storage`, `Sensors`
|
||||
|
||||
---
|
||||
|
||||
### Unraid (`unraid`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
|
||||
**Archive format:** Unraid diagnostics archive contents (text-heavy diagnostics directories).
|
||||
|
||||
**Detection:** Combines filename/path markers (`diagnostics-*`, `unraid-*.txt`, `vars.txt`)
|
||||
with content markers (e.g. `Unraid kernel build`, parity data markers).
|
||||
|
||||
**Extracted data (current):**
|
||||
- Board / BIOS metadata (from motherboard/system files)
|
||||
- CPU summary (from `lscpu.txt`)
|
||||
- Memory modules (from diagnostics memory file)
|
||||
- Storage devices (from `vars.txt` + SMART files)
|
||||
- Syslog events
|
||||
|
||||
---
|
||||
|
||||
### Generic text fallback (`generic`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
|
||||
**Confidence:** 15 (lowest — only matches if no other parser scores higher)
|
||||
|
||||
**Purpose:** Fallback for any text file or single `.gz` file not matching a specific vendor.
|
||||
|
||||
**Behavior:**
|
||||
- If filename matches `nvidia-bug-report-*.log.gz`: extracts driver version and GPU list.
|
||||
- Otherwise: confirms file is text (not binary) and records a basic "Text File" event.
|
||||
|
||||
---
|
||||
|
||||
## Supported vendor matrix
|
||||
|
||||
| Vendor | ID | Status | Tested on |
|
||||
|--------|----|--------|-----------|
|
||||
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
|
||||
| Supermicro | `supermicro` | Ready | SYS-821GE-TNHR crashdump |
|
||||
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
|
||||
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
|
||||
| Unraid | `unraid` | Ready | Unraid diagnostics archives |
|
||||
| XigmaNAS | `xigmanas` | Ready | FreeBSD NAS logs |
|
||||
| Generic fallback | `generic` | Ready | Any text file |
|
||||
@@ -1,366 +0,0 @@
|
||||
# 07 — Exporters & Reanimator Integration
|
||||
|
||||
## Export endpoints summary
|
||||
|
||||
| Endpoint | Format | Filename pattern |
|
||||
|----------|--------|-----------------|
|
||||
| `GET /api/export/csv` | CSV — serial numbers | `YYYY-MM-DD (MODEL) - SN.csv` |
|
||||
| `GET /api/export/json` | **Raw export package** (JSON or ZIP bundle) for reopen/re-analysis | `YYYY-MM-DD (MODEL) - SN.(json|zip)` |
|
||||
| `GET /api/export/reanimator` | Reanimator hardware JSON | `YYYY-MM-DD (MODEL) - SN.json` |
|
||||
|
||||
---
|
||||
|
||||
## Raw Export (`Export Raw Data`)
|
||||
|
||||
### Purpose
|
||||
|
||||
Preserve enough source data to reproduce parsing later after parser fixes, without requiring
|
||||
another live collection from the target system.
|
||||
|
||||
### Format
|
||||
|
||||
`/api/export/json` returns a **raw export package**:
|
||||
- JSON package (machine-readable), or
|
||||
- ZIP bundle containing:
|
||||
- `raw_export.json` — machine-readable package
|
||||
- `collect.log` — human-readable collection + parsing summary
|
||||
- `parser_fields.json` — structured parsed field snapshot for diffs between parser versions
|
||||
|
||||
### Import / reopen behavior
|
||||
|
||||
When a raw export package is uploaded back into LOGPile:
|
||||
- the app **re-analyzes from raw source**
|
||||
- it does **not** trust embedded parsed output as source of truth
|
||||
|
||||
For Redfish, this means replay from `raw_payloads.redfish_tree`.
|
||||
|
||||
### Design rule
|
||||
|
||||
Raw export is a **re-analysis artifact**, not a final report dump. Keep it self-contained and
|
||||
forward-compatible where possible (versioned package format, additive fields only).
|
||||
|
||||
---
|
||||
|
||||
## Reanimator Export
|
||||
|
||||
### Purpose
|
||||
|
||||
Exports hardware inventory data in the format expected by the Reanimator asset tracking
|
||||
system. Enables one-click push from LOGPile to an external asset management platform.
|
||||
|
||||
### Implementation files
|
||||
|
||||
| File | Role |
|
||||
|------|------|
|
||||
| `internal/exporter/reanimator_models.go` | Go structs for Reanimator JSON |
|
||||
| `internal/exporter/reanimator_converter.go` | `ConvertToReanimator()` and helpers |
|
||||
| `internal/server/handlers.go` | `handleExportReanimator()` HTTP handler |
|
||||
|
||||
### Conversion rules
|
||||
|
||||
- Source: canonical `hardware.devices` repository (see [`04-data-models.md`](04-data-models.md))
|
||||
- CPU manufacturer inferred from model string (Intel / AMD / ARM / Ampere)
|
||||
- PCIe serial number generated when absent: `{board_serial}-PCIE-{slot}`
|
||||
- Status values normalized to: `OK`, `Warning`, `Critical`, `Unknown` (`Empty` only for memory slots)
|
||||
- Timestamps in RFC3339 format
|
||||
- `target_host` derived from `filename` field (`redfish://…`, `ipmi://…`) if not in source; omitted if undeterminable
|
||||
- `board.manufacturer` and `board.product_name` values of `"NULL"` treated as absent
|
||||
|
||||
### LOGPile → Reanimator field mapping
|
||||
|
||||
| LOGPile type | Reanimator section | Notes |
|
||||
|---|---|---|
|
||||
| `BoardInfo` | `board` | Direct mapping |
|
||||
| `CPU` | `cpus` | + manufacturer (inferred) |
|
||||
| `MemoryDIMM` | `memory` | Direct; empty slots included (`present=false`) |
|
||||
| `Storage` | `storage` | Excluded if no `serial_number` |
|
||||
| `PCIeDevice` | `pcie_devices` | Serial generated if missing |
|
||||
| `GPU` | `pcie_devices` | `device_class=DisplayController` |
|
||||
| `NetworkAdapter` | `pcie_devices` | `device_class=NetworkController` |
|
||||
| `PSU` | `power_supplies` | Excluded if no serial or `present=false` |
|
||||
| `FirmwareInfo` | `firmware` | Direct mapping |
|
||||
|
||||
### Inclusion / exclusion rules
|
||||
|
||||
**Included:**
|
||||
- Memory slots with `present=false` (as Empty slots)
|
||||
- PCIe devices without serial number (serial is generated)
|
||||
|
||||
**Excluded:**
|
||||
- Storage without `serial_number`
|
||||
- PSU without `serial_number` or with `present=false`
|
||||
- NetworkAdapters with `present=false`
|
||||
|
||||
---
|
||||
|
||||
## Reanimator Integration Guide
|
||||
|
||||
This section documents the Reanimator receiver-side JSON format (what the Reanimator
|
||||
system expects when it ingests a LOGPile export).
|
||||
|
||||
> **Important:** The Reanimator endpoint uses a strict JSON decoder (`DisallowUnknownFields`).
|
||||
> Any unknown field — including nested ones — causes `400 Bad Request`.
|
||||
> Use only `snake_case` keys listed here.
|
||||
|
||||
### Top-level structure
|
||||
|
||||
```json
|
||||
{
|
||||
"filename": "redfish://10.10.10.103",
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "10.10.10.103",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"hardware": {
|
||||
"board": {...},
|
||||
"firmware": [...],
|
||||
"cpus": [...],
|
||||
"memory": [...],
|
||||
"storage": [...],
|
||||
"pcie_devices": [...],
|
||||
"power_supplies": [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Required:** `collected_at`, `hardware.board.serial_number`
|
||||
**Optional:** `target_host`, `source_type`, `protocol`, `filename`
|
||||
|
||||
`source_type` values: `api`, `logfile`, `manual`
|
||||
`protocol` values: `redfish`, `ipmi`, `snmp`, `ssh`
|
||||
|
||||
### Component status fields (all component sections)
|
||||
|
||||
Each component may carry:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `status` | string | `OK`, `Warning`, `Critical`, `Unknown`, `Empty` |
|
||||
| `status_checked_at` | RFC3339 | When status was last verified |
|
||||
| `status_changed_at` | RFC3339 | When status last changed |
|
||||
| `status_at_collection` | object | `{ "status": "...", "at": "..." }` — snapshot-time status |
|
||||
| `status_history` | array | `[{ "status": "...", "changed_at": "...", "details": "..." }]` |
|
||||
| `error_description` | string | Human-readable error for Warning/Critical |
|
||||
|
||||
### Board
|
||||
|
||||
```json
|
||||
{
|
||||
"board": {
|
||||
"manufacturer": "Supermicro",
|
||||
"product_name": "X12DPG-QT6",
|
||||
"serial_number": "21D634101",
|
||||
"part_number": "X12DPG-QT6-REV1.01",
|
||||
"uuid": "d7ef2fe5-2fd0-11f0-910a-346f11040868"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`serial_number` required. `manufacturer` / `product_name` of `"NULL"` treated as absent.
|
||||
|
||||
### CPUs
|
||||
|
||||
```json
|
||||
{
|
||||
"socket": 0,
|
||||
"model": "INTEL(R) XEON(R) GOLD 6530",
|
||||
"cores": 32,
|
||||
"threads": 64,
|
||||
"frequency_mhz": 2100,
|
||||
"max_frequency_mhz": 4000,
|
||||
"manufacturer": "Intel",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`socket` (int) and `model` required. Serial generated: `{board_serial}-CPU-{socket}`.
|
||||
|
||||
LOT format: `CPU_{VENDOR}_{MODEL_NORMALIZED}` → e.g. `CPU_INTEL_XEON_GOLD_6530`
|
||||
|
||||
### Memory
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "CPU0_C0D0",
|
||||
"location": "CPU0_C0D0",
|
||||
"present": true,
|
||||
"size_mb": 32768,
|
||||
"type": "DDR5",
|
||||
"max_speed_mhz": 4800,
|
||||
"current_speed_mhz": 4800,
|
||||
"manufacturer": "Hynix",
|
||||
"serial_number": "80AD032419E17CEEC1",
|
||||
"part_number": "HMCG88AGBRA191N",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot` and `present` required. `serial_number` required when `present=true`.
|
||||
Empty slots (`present=false`, `status="Empty"`) are included but no component created.
|
||||
|
||||
LOT format: `DIMM_{TYPE}_{SIZE_GB}GB` → e.g. `DIMM_DDR5_32GB`
|
||||
|
||||
### Storage
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "OB01",
|
||||
"type": "NVMe",
|
||||
"model": "INTEL SSDPF2KX076T1",
|
||||
"size_gb": 7680,
|
||||
"serial_number": "BTAX41900GF87P6DGN",
|
||||
"manufacturer": "Intel",
|
||||
"firmware": "9CV10510",
|
||||
"interface": "NVMe",
|
||||
"present": true,
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot`, `model`, `serial_number`, `present` required.
|
||||
|
||||
LOT format: `{TYPE}_{INTERFACE}_{SIZE_TB}TB` → e.g. `SSD_NVME_07.68TB`
|
||||
|
||||
### Power Supplies
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "0",
|
||||
"present": true,
|
||||
"model": "GW-CRPS3000LW",
|
||||
"vendor": "Great Wall",
|
||||
"wattage_w": 3000,
|
||||
"serial_number": "2P06C102610",
|
||||
"part_number": "V0310C9000000000",
|
||||
"firmware": "00.03.05",
|
||||
"status": "OK",
|
||||
"input_power_w": 137,
|
||||
"output_power_w": 104,
|
||||
"input_voltage": 215.25
|
||||
}
|
||||
```
|
||||
|
||||
`slot`, `present` required. `serial_number` required when `present=true`.
|
||||
Telemetry fields (`input_power_w`, `output_power_w`, `input_voltage`) stored in observation only.
|
||||
|
||||
LOT format: `PSU_{WATTAGE}W_{VENDOR_NORMALIZED}` → e.g. `PSU_3000W_GREAT_WALL`
|
||||
|
||||
### PCIe Devices
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "PCIeCard1",
|
||||
"vendor_id": 32902,
|
||||
"device_id": 2912,
|
||||
"bdf": "0000:18:00.0",
|
||||
"device_class": "MassStorageController",
|
||||
"manufacturer": "Intel",
|
||||
"model": "RAID Controller RSP3DD080F",
|
||||
"link_width": 8,
|
||||
"link_speed": "Gen3",
|
||||
"max_link_width": 8,
|
||||
"max_link_speed": "Gen3",
|
||||
"serial_number": "RAID-001-12345",
|
||||
"firmware": "50.9.1-4296",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot` required. Serial generated if absent: `{board_serial}-PCIE-{slot}`.
|
||||
|
||||
`device_class` values: `NetworkController`, `MassStorageController`, `DisplayController`, etc.
|
||||
|
||||
LOT format: `PCIE_{DEVICE_CLASS}_{MODEL_NORMALIZED}` → e.g. `PCIE_NETWORK_CONNECTX5`
|
||||
|
||||
### Firmware
|
||||
|
||||
```json
|
||||
[
|
||||
{ "device_name": "BIOS", "version": "06.08.05" },
|
||||
{ "device_name": "BMC", "version": "5.17.00" }
|
||||
]
|
||||
```
|
||||
|
||||
Both fields required. Changes trigger `FIRMWARE_CHANGED` timeline events.
|
||||
|
||||
---
|
||||
|
||||
### Import process (Reanimator side)
|
||||
|
||||
1. Validate `collected_at` (RFC3339) and `hardware.board.serial_number`.
|
||||
2. Find or create Asset by `board.serial_number` → `vendor_serial`.
|
||||
3. For each component: filter `present=false`, auto-determine LOT, find or create Component,
|
||||
create Observation, update Installations.
|
||||
4. Detect removed components (present in previous snapshot, absent in current) → close Installation.
|
||||
5. Generate timeline events: `LOG_COLLECTED`, `INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`.
|
||||
|
||||
**Idempotency:** Repeated import of the same snapshot (same content hash) returns `200 OK`
|
||||
with `"duplicate": true` and does not create duplicate records.
|
||||
|
||||
### Reanimator API endpoint
|
||||
|
||||
```http
|
||||
POST /ingest/hardware
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
**Success (201):**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"bundle_id": "lb_01J...",
|
||||
"asset_id": "mach_01J...",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"duplicate": false,
|
||||
"summary": {
|
||||
"parts_observed": 15,
|
||||
"parts_created": 2,
|
||||
"installations_created": 2,
|
||||
"timeline_events_created": 9
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Duplicate (200):**
|
||||
```json
|
||||
{ "status": "success", "duplicate": true, "message": "LogBundle with this content hash already exists" }
|
||||
```
|
||||
|
||||
**Error (400):**
|
||||
```json
|
||||
{ "status": "error", "error": "validation_failed", "details": { "field": "...", "message": "..." } }
|
||||
```
|
||||
|
||||
Common `400` causes:
|
||||
- Unknown JSON field (strict decoder)
|
||||
- Wrong key name (e.g. `targetHost` instead of `target_host`)
|
||||
- Invalid `collected_at` format (must be RFC3339)
|
||||
- Empty `hardware.board.serial_number`
|
||||
|
||||
### LOT normalization rules
|
||||
|
||||
1. Remove special chars `( ) - ® ™`; replace spaces with `_`
|
||||
2. Uppercase all
|
||||
3. Collapse multiple underscores to one
|
||||
4. Strip common prefixes like `MODEL:`, `PN:`
|
||||
|
||||
### Status values
|
||||
|
||||
| Value | Meaning | Action |
|
||||
|-------|---------|--------|
|
||||
| `OK` | Normal | — |
|
||||
| `Warning` | Degraded | Create `COMPONENT_WARNING` event (optional) |
|
||||
| `Critical` | Failed | Auto-create `failure_event`, create `COMPONENT_FAILED` event |
|
||||
| `Unknown` | Not determinable | Treat as working |
|
||||
| `Empty` | Slot unpopulated | No component created (memory/PCIe only) |
|
||||
|
||||
### Missing field handling
|
||||
|
||||
| Field | Fallback |
|
||||
|-------|---------|
|
||||
| CPU serial | Generated: `{board_serial}-CPU-{socket}` |
|
||||
| PCIe serial | Generated: `{board_serial}-PCIE-{slot}` |
|
||||
| Other serial | Component skipped if absent |
|
||||
| manufacturer (PCIe) | Looked up from `vendor_id` (8086→Intel, 10de→NVIDIA, 15b3→Mellanox…) |
|
||||
| status | Treated as `Unknown` |
|
||||
| firmware | No `FIRMWARE_CHANGED` event |
|
||||
@@ -1,89 +0,0 @@
|
||||
# 08 — Build & Release
|
||||
|
||||
## CLI flags
|
||||
|
||||
Defined in `cmd/logpile/main.go`:
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--port` | `8082` | HTTP server port |
|
||||
| `--file` | — | Reserved for archive preload (not active) |
|
||||
| `--version` | — | Print version and exit |
|
||||
| `--no-browser` | — | Do not open browser on start |
|
||||
| `--hold-on-crash` | `true` on Windows | Keep console open on fatal crash for debugging |
|
||||
|
||||
## Build
|
||||
|
||||
```bash
|
||||
# Local binary (current OS/arch)
|
||||
make build
|
||||
# Output: bin/logpile
|
||||
|
||||
# Cross-platform binaries
|
||||
make build-all
|
||||
# Output:
|
||||
# bin/logpile-linux-amd64
|
||||
# bin/logpile-linux-arm64
|
||||
# bin/logpile-darwin-amd64
|
||||
# bin/logpile-darwin-arm64
|
||||
# bin/logpile-windows-amd64.exe
|
||||
```
|
||||
|
||||
Both `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort`
|
||||
before compilation to sync `pci.ids` from the submodule.
|
||||
|
||||
To skip PCI IDs update:
|
||||
```bash
|
||||
SKIP_PCI_IDS_UPDATE=1 make build
|
||||
```
|
||||
|
||||
Build flags: `CGO_ENABLED=0` — fully static binary, no C runtime dependency.
|
||||
|
||||
## PCI IDs submodule
|
||||
|
||||
Source: `third_party/pciids` (git submodule → `github.com/pciutils/pciids`)
|
||||
Local copy embedded at build time: `internal/parser/vendors/pciids/pci.ids`
|
||||
|
||||
```bash
|
||||
# Manual update
|
||||
make update-pci-ids
|
||||
|
||||
# Init submodule after fresh clone
|
||||
git submodule update --init third_party/pciids
|
||||
```
|
||||
|
||||
## Release process
|
||||
|
||||
```bash
|
||||
scripts/release.sh
|
||||
```
|
||||
|
||||
What it does:
|
||||
1. Reads version from `git describe --tags`
|
||||
2. Validates clean working tree (override: `ALLOW_DIRTY=1`)
|
||||
3. Sets stable `GOPATH` / `GOCACHE` / `GOTOOLCHAIN` env
|
||||
4. Creates `releases/{VERSION}/` directory
|
||||
5. Generates `RELEASE_NOTES.md` template if not present
|
||||
6. Builds `darwin-arm64` and `windows-amd64` binaries
|
||||
7. Packages all binaries found in `bin/` as `.tar.gz` / `.zip`
|
||||
8. Generates `SHA256SUMS.txt`
|
||||
9. Prints next steps (tag, push, create release manually)
|
||||
|
||||
Release notes template is created in `releases/{VERSION}/RELEASE_NOTES.md`.
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
./bin/logpile
|
||||
./bin/logpile --port 9090
|
||||
./bin/logpile --no-browser
|
||||
./bin/logpile --version
|
||||
./bin/logpile --hold-on-crash # keep console open on crash (default on Windows)
|
||||
```
|
||||
|
||||
## macOS Gatekeeper
|
||||
|
||||
After downloading a binary, remove the quarantine attribute:
|
||||
```bash
|
||||
xattr -d com.apple.quarantine /path/to/logpile-darwin-arm64
|
||||
```
|
||||
@@ -1,43 +0,0 @@
|
||||
# 09 — Testing
|
||||
|
||||
## Required before merge
|
||||
|
||||
```bash
|
||||
go test ./...
|
||||
```
|
||||
|
||||
All tests must pass before any change is merged.
|
||||
|
||||
## Where to add tests
|
||||
|
||||
| Change area | Test location |
|
||||
|-------------|---------------|
|
||||
| Collectors | `internal/collector/*_test.go` |
|
||||
| HTTP handlers | `internal/server/*_test.go` |
|
||||
| Exporters | `internal/exporter/*_test.go` |
|
||||
| Parsers | `internal/parser/vendors/<vendor>/*_test.go` |
|
||||
|
||||
## Exporter tests
|
||||
|
||||
The Reanimator exporter has comprehensive coverage:
|
||||
|
||||
| Test file | Coverage |
|
||||
|-----------|----------|
|
||||
| `reanimator_converter_test.go` | Unit tests per conversion function |
|
||||
| `reanimator_integration_test.go` | Full export with realistic `AnalysisResult` |
|
||||
|
||||
Run exporter tests only:
|
||||
```bash
|
||||
go test ./internal/exporter/...
|
||||
go test ./internal/exporter/... -v -run Reanimator
|
||||
go test ./internal/exporter/... -cover
|
||||
```
|
||||
|
||||
## Guidelines
|
||||
|
||||
- Prefer table-driven tests for parsing logic (multiple input variants).
|
||||
- Do not rely on network access in unit tests.
|
||||
- Test both the happy path and edge cases (missing fields, empty collections).
|
||||
- When adding a new vendor parser, include at minimum:
|
||||
- `Detect()` test with a positive and a negative sample file list.
|
||||
- `Parse()` test with a minimal but representative archive.
|
||||
@@ -1,204 +0,0 @@
|
||||
# 10 — Architectural Decision Log (ADL)
|
||||
|
||||
> **Rule:** Every significant architectural decision **must be recorded here** before or alongside
|
||||
> the code change. This applies to humans and AI assistants alike.
|
||||
>
|
||||
> Format: date · title · context · decision · consequences
|
||||
|
||||
---
|
||||
|
||||
## ADL-001 — In-memory only state (no database)
|
||||
|
||||
**Date:** project start
|
||||
**Context:** LOGPile is designed as a standalone diagnostic tool, not a persistent service.
|
||||
**Decision:** All parsed/collected data lives in `Server.result` (in-memory). No database, no files written.
|
||||
**Consequences:**
|
||||
- Data is lost on process restart — intentional.
|
||||
- Simple deployment: single binary, no setup required.
|
||||
- JSON export is the persistence mechanism for users who want to save results.
|
||||
|
||||
---
|
||||
|
||||
## ADL-002 — Vendor parser auto-registration via init()
|
||||
|
||||
**Date:** project start
|
||||
**Context:** Need an extensible parser registry without a central factory function.
|
||||
**Decision:** Each vendor parser registers itself in its package's `init()` function.
|
||||
`vendors/vendors.go` holds blank imports to trigger registration.
|
||||
**Consequences:**
|
||||
- Adding a new parser requires only: implement interface + add one blank import.
|
||||
- No central list to maintain (other than the import file).
|
||||
- `go test ./...` will include new parsers automatically.
|
||||
|
||||
---
|
||||
|
||||
## ADL-003 — Highest-confidence parser wins
|
||||
|
||||
**Date:** project start
|
||||
**Context:** Multiple parsers may partially match an archive (e.g. generic + specific vendor).
|
||||
**Decision:** Run all parsers' `Detect()`, select the one returning the highest score (0–100).
|
||||
**Consequences:**
|
||||
- Generic fallback (score 15) only activates when no vendor parser scores higher.
|
||||
- Parsers must be conservative with high scores (70+) to avoid false positives.
|
||||
|
||||
---
|
||||
|
||||
## ADL-004 — Canonical hardware.devices as single source of truth
|
||||
|
||||
**Date:** v1.5.0
|
||||
**Context:** UI tabs and Reanimator exporter were reading from different sub-fields of
|
||||
`AnalysisResult`, causing potential drift.
|
||||
**Decision:** Introduce `hardware.devices` as the canonical inventory repository.
|
||||
All UI tabs and all exporters must read exclusively from this repository.
|
||||
**Consequences:**
|
||||
- Any UI vs Reanimator discrepancy is classified as a bug, not a "known difference".
|
||||
- Deduplication logic runs once in the repository builder (serial → bdf → distinct).
|
||||
- New hardware attributes must be added to canonical schema first, then mapped to consumers.
|
||||
|
||||
---
|
||||
|
||||
## ADL-005 — No hardcoded PCI model strings; use pci.ids
|
||||
|
||||
**Date:** v1.5.0
|
||||
**Context:** NVIDIA and other vendors release new GPU models frequently; hardcoded maps
|
||||
required code changes for each new model ID.
|
||||
**Decision:** Use the `pciutils/pciids` database (git submodule, embedded at build time).
|
||||
PCI vendor/device ID → human-readable model name via lookup.
|
||||
**Consequences:**
|
||||
- New GPU models can be supported by updating `pci.ids` without code changes.
|
||||
- `make build` auto-syncs `pci.ids` from submodule before compilation.
|
||||
- External override via `LOGPILE_PCI_IDS_PATH` env var.
|
||||
|
||||
---
|
||||
|
||||
## ADL-006 — Reanimator export uses canonical hardware.devices (not raw sub-fields)
|
||||
|
||||
**Date:** v1.5.0
|
||||
**Context:** Early Reanimator exporter read from `Hardware.GPUs`, `Hardware.NICs`, etc.
|
||||
directly, diverging from UI data.
|
||||
**Decision:** Reanimator exporter must use `hardware.devices` — the same source as the UI.
|
||||
Exporter groups/filters canonical records by section; does not rebuild from sub-fields.
|
||||
**Consequences:**
|
||||
- Guarantees UI and export consistency.
|
||||
- Exporter code is simpler — mainly a filter+map, not a data reconstruction.
|
||||
|
||||
---
|
||||
|
||||
## ADL-007 — Documentation language is English
|
||||
|
||||
**Date:** 2026-02-20
|
||||
**Context:** Codebase documentation was mixed Russian/English, reducing clarity for
|
||||
international contributors and AI assistants.
|
||||
**Decision:** All maintained project documentation (`docs/bible/`, `README.md`,
|
||||
`CLAUDE.md`, and new technical docs) must be written in English.
|
||||
**Consequences:**
|
||||
- Bible is authoritative in English.
|
||||
- AI assistants get consistent, unambiguous context.
|
||||
|
||||
---
|
||||
|
||||
## ADL-008 — Bible is the single source of truth for architecture docs
|
||||
|
||||
**Date:** 2026-02-23
|
||||
**Context:** Architecture information was duplicated across `README.md`, `CLAUDE.md`,
|
||||
and the Bible, creating drift risk and stale guidance for humans and AI agents.
|
||||
**Decision:** Keep architecture and technical design documentation only in `docs/bible/`.
|
||||
Top-level `README.md` and `CLAUDE.md` must remain minimal pointers/instructions.
|
||||
**Consequences:**
|
||||
- Reduces documentation drift and duplicate updates.
|
||||
- AI assistants are directed to one authoritative source before making changes.
|
||||
- Documentation updates that affect architecture must include Bible changes (and ADL entries when significant).
|
||||
|
||||
---
|
||||
|
||||
## ADL-009 — Redfish analysis is performed from raw snapshot replay (unified tunnel)
|
||||
|
||||
**Date:** 2026-02-24
|
||||
**Context:** Live Redfish collection and raw export re-analysis used different parsing paths,
|
||||
which caused drift and made bug fixes difficult to validate consistently.
|
||||
**Decision:** Redfish live collection must produce a `raw_payloads.redfish_tree` snapshot first,
|
||||
then run the same replay analyzer used for imported raw exports.
|
||||
**Consequences:**
|
||||
- Same `redfish_tree` input produces the same parsed result in live and offline modes.
|
||||
- Debugging parser issues can be done against exported raw bundles without live BMC access.
|
||||
- Snapshot completeness becomes critical; collector seeds/limits are part of analyzer correctness.
|
||||
|
||||
---
|
||||
|
||||
## ADL-010 — Raw export is a self-contained re-analysis package (not a final result dump)
|
||||
|
||||
**Date:** 2026-02-24
|
||||
**Context:** Exporting only normalized `AnalysisResult` loses raw source fidelity and prevents
|
||||
future parser improvements from being applied to already collected data.
|
||||
**Decision:** `Export Raw Data` produces a self-contained raw package (JSON or ZIP bundle)
|
||||
that the application can reopen and re-analyze. Parsed data in the package is optional and not
|
||||
the source of truth on import.
|
||||
**Consequences:**
|
||||
- Re-opening an export always re-runs analysis from raw source (`redfish_tree` or uploaded file bytes).
|
||||
- Raw bundles include collection context and diagnostics for debugging (`collect.log`, `parser_fields.json`).
|
||||
- Endpoint compatibility is preserved (`/api/export/json`) while actual payload format may be a bundle.
|
||||
|
||||
---
|
||||
|
||||
## ADL-011 — Redfish snapshot crawler is bounded, prioritized, and failure-tolerant
|
||||
|
||||
**Date:** 2026-02-24
|
||||
**Context:** Full Redfish trees on modern GPU systems are large, noisy, and contain many
|
||||
vendor-specific or non-fetchable links. Unbounded crawling and naive queue design caused hangs
|
||||
and incomplete snapshots.
|
||||
**Decision:** Use a bounded snapshot crawler with:
|
||||
- explicit document cap (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
|
||||
- priority seed paths (PCIe/Fabrics/Firmware/Storage/PowerSubsystem/ThermalSubsystem)
|
||||
- normalized `@odata.id` paths (strip `#fragment`)
|
||||
- noisy expected error filtering (404/405/410/501 hidden from UI)
|
||||
- queue capacity sized to crawl cap to avoid producer/consumer deadlock
|
||||
**Consequences:**
|
||||
- Snapshot collection remains stable on large BMC trees.
|
||||
- Most high-value inventory paths are reached before the cap.
|
||||
- UI progress remains useful while debug logs retain low-level fetch failures.
|
||||
|
||||
---
|
||||
|
||||
## ADL-012 — Vendor-specific storage inventory probing is allowed as fallback
|
||||
|
||||
**Date:** 2026-02-24
|
||||
**Context:** Some Supermicro BMCs expose empty standard `Storage/.../Drives` collections while
|
||||
real disk inventory exists under vendor-specific `Disk.Bay` endpoints and enclosure links.
|
||||
**Decision:** When standard drive collections are empty, collector/replay may probe vendor-style
|
||||
`.../Drives/Disk.Bay.*` endpoints and follow `Storage.Links.Enclosures[*]` to recover physical drives.
|
||||
**Consequences:**
|
||||
- Higher storage inventory coverage on Supermicro HBA/HA-RAID/MRVL/NVMe backplane implementations.
|
||||
- Replay must mirror the same probing behavior to preserve deterministic results.
|
||||
- Probing remains bounded (finite candidate set) to avoid runaway requests.
|
||||
|
||||
---
|
||||
|
||||
## ADL-013 — PowerSubsystem is preferred over legacy Power on newer Redfish implementations
|
||||
|
||||
**Date:** 2026-02-24
|
||||
**Context:** X14+/newer Redfish implementations increasingly expose authoritative PSU data in
|
||||
`PowerSubsystem/PowerSupplies`, while legacy `/Power` may be incomplete or schema-shifted.
|
||||
**Decision:** Prefer `Chassis/*/PowerSubsystem/PowerSupplies` as the primary PSU source and use
|
||||
legacy `Chassis/*/Power` as fallback.
|
||||
**Consequences:**
|
||||
- Better compatibility with newer BMC firmware generations.
|
||||
- Legacy systems remain supported without special-case collector selection.
|
||||
- Snapshot priority seeds must include `PowerSubsystem` resources.
|
||||
|
||||
---
|
||||
|
||||
## ADL-014 — Threshold logic lives on the server; UI reflects status only
|
||||
|
||||
**Date:** 2026-02-24
|
||||
**Context:** Duplicating threshold math in frontend and backend creates drift and inconsistent
|
||||
highlighting (e.g. PSU mains voltage range checks).
|
||||
**Decision:** Business threshold evaluation (e.g. PSU voltage nominal range) must be computed on
|
||||
the server; frontend only renders status/flags returned by the API.
|
||||
**Consequences:**
|
||||
- Single source of truth for threshold policies.
|
||||
- UI can evolve visually without re-implementing domain logic.
|
||||
- API payloads may carry richer status semantics over time.
|
||||
|
||||
---
|
||||
|
||||
<!-- Add new decisions below this line using the format above -->
|
||||
@@ -1,59 +0,0 @@
|
||||
# LOGPile Bible
|
||||
|
||||
> **Documentation language:** English only. All maintained project documentation must be written in English.
|
||||
>
|
||||
> **Architectural decisions:** Every significant architectural decision **must** be recorded in
|
||||
> [`10-decisions.md`](10-decisions.md) before or alongside the code change.
|
||||
>
|
||||
> **Single source of truth:** Architecture and technical design documentation belongs in `docs/bible/`.
|
||||
> Keep `README.md` and `CLAUDE.md` minimal to avoid duplicate documentation.
|
||||
|
||||
This directory is the single source of truth for LOGPile's architecture, design, and integration contracts.
|
||||
It is structured so that both humans and AI assistants can navigate it quickly.
|
||||
|
||||
---
|
||||
|
||||
## Reading Map (Hierarchical)
|
||||
|
||||
### 1. Foundations (read first)
|
||||
|
||||
| File | What it covers |
|
||||
|------|----------------|
|
||||
| [01-overview.md](01-overview.md) | Product purpose, operating modes, scope |
|
||||
| [02-architecture.md](02-architecture.md) | Runtime structure, control flow, in-memory state |
|
||||
| [04-data-models.md](04-data-models.md) | Core contracts (`AnalysisResult`, canonical `hardware.devices`) |
|
||||
|
||||
### 2. Runtime Interfaces
|
||||
|
||||
| File | What it covers |
|
||||
|------|----------------|
|
||||
| [03-api.md](03-api.md) | HTTP API contracts and endpoint behavior |
|
||||
| [05-collectors.md](05-collectors.md) | Live collection connectors (Redfish, IPMI mock) |
|
||||
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor parsers |
|
||||
| [07-exporters.md](07-exporters.md) | CSV / JSON / Reanimator exports and integration mapping |
|
||||
|
||||
### 3. Delivery & Quality
|
||||
|
||||
| File | What it covers |
|
||||
|------|----------------|
|
||||
| [08-build-release.md](08-build-release.md) | Build, packaging, release workflow |
|
||||
| [09-testing.md](09-testing.md) | Testing expectations and verification guidance |
|
||||
|
||||
### 4. Governance (always current)
|
||||
|
||||
| File | What it covers |
|
||||
|------|----------------|
|
||||
| [10-decisions.md](10-decisions.md) | Architectural Decision Log (ADL) |
|
||||
|
||||
---
|
||||
|
||||
## Quick orientation for AI assistants
|
||||
|
||||
- Read order for most changes: `01` → `02` → `04` → relevant interface doc(s) → `10`
|
||||
- Entry point: `cmd/logpile/main.go`
|
||||
- HTTP server: `internal/server/` — handlers in `handlers.go`, routes in `server.go`
|
||||
- Data contracts: `internal/models/` — never break `AnalysisResult` JSON shape
|
||||
- Frontend contract: `web/static/js/app.js` — keep API responses stable
|
||||
- Canonical inventory: `hardware.devices` in `AnalysisResult` — source of truth for UI and exports
|
||||
- Parser registry: `internal/parser/vendors/` — `init()` auto-registration pattern
|
||||
- Collector registry: `internal/collector/registry.go`
|
||||
Reference in New Issue
Block a user