docs: refresh project documentation

This commit is contained in:
Mikhail Chusavitin
2026-03-15 16:35:16 +03:00
parent 47bb0ee939
commit 0acdc2b202
14 changed files with 508 additions and 1224 deletions

View File

@@ -1,35 +1,43 @@
# 01 — Overview
## What is LOGPile?
## Purpose
LOGPile is a standalone Go application for BMC (Baseboard Management Controller)
diagnostics analysis with an embedded web UI.
It runs as a single binary with no external file dependencies.
LOGPile is a standalone Go application for BMC diagnostics analysis with an embedded web UI.
It runs as a single binary and normalizes hardware data from archives or live Redfish collection.
## Operating modes
| Mode | Entry point | Description |
|------|-------------|-------------|
| **Offline / archive** | `POST /api/upload` | Upload a vendor diagnostic archive or a JSON snapshot; parse and display in UI |
| **Live / Redfish** | `POST /api/collect` | Connect to a live BMC via Redfish API, collect hardware inventory, display and export |
| Mode | Entry point | Outcome |
|------|-------------|---------|
| Archive upload | `POST /api/upload` | Parse a supported archive, raw export bundle, or JSON snapshot into `AnalysisResult` |
| Live collection | `POST /api/collect` | Collect from a live BMC via Redfish and store the result in memory |
| Batch convert | `POST /api/convert` | Convert multiple supported input files into Reanimator JSON in a ZIP artifact |
Both modes produce the same in-memory `AnalysisResult` structure and expose it
through the same API and UI.
All modes converge on the same normalized hardware model and exporter pipeline.
## Key capabilities
## In scope
- Single self-contained binary with embedded HTML/JS/CSS (no static file serving required).
- Vendor archive parsing: Inspur/Kaytus, Dell TSR, NVIDIA HGX Field Diagnostics,
NVIDIA Bug Report, Unraid, XigmaNAS, Generic text fallback.
- Live Redfish collection with async progress tracking.
- Normalized hardware inventory: CPU / RAM / Storage / GPU / PSU / NIC / PCIe / Firmware.
- Raw `redfish_tree` snapshot stored in `RawPayloads` for future offline re-analysis.
- Re-upload of a JSON snapshot for offline work (`/api/upload` accepts `AnalysisResult` JSON).
- Export in CSV, JSON (full `AnalysisResult`), and Reanimator format.
- PCI device model resolution via embedded `pci.ids` (no hardcoded model strings).
- Single-binary desktop/server utility with embedded UI
- Vendor archive parsing and live Redfish collection
- Canonical hardware inventory across UI and exports
- Reopenable raw export bundles for future re-analysis
- Reanimator export and batch conversion workflows
- Embedded `pci.ids` lookup for vendor/device name enrichment
## Non-goals (current scope)
## Current vendor coverage
- No persistent storage — all state is in-memory per process lifetime.
- IPMI collector is a mock scaffold only; real IPMI support is not implemented.
- No authentication layer on the HTTP server.
- Dell TSR
- H3C SDS G5/G6
- Inspur / Kaytus
- NVIDIA HGX Field Diagnostics
- NVIDIA Bug Report
- Unraid
- XigmaNAS
- Generic fallback parser
## Non-goals
- Persistent storage or multi-user state
- Production IPMI collection
- Authentication/authorization on the built-in HTTP server
- Long-term server-side job history beyond in-memory process lifetime

View File

@@ -2,114 +2,85 @@
## Runtime stack
| Layer | Technology |
|-------|------------|
| Layer | Implementation |
|-------|----------------|
| Language | Go 1.22+ |
| HTTP | `net/http`, `http.ServeMux` |
| UI | Embedded via `//go:embed` in `web/embed.go` (templates + static assets) |
| State | In-memory only — no database |
| Build | `CGO_ENABLED=0`, single static binary |
| HTTP | `net/http` + `http.ServeMux` |
| UI | Embedded templates and static assets via `go:embed` |
| State | In-memory only |
| Build | `CGO_ENABLED=0`, single binary |
Default port: **8082**
Default port: `8082`
## Directory structure
## Code map
```
cmd/logpile/main.go # Binary entry point, CLI flag parsing
internal/
collector/ # Live data collectors
registry.go # Collector registration
redfish.go # Redfish connector (real implementation)
ipmi_mock.go # IPMI mock connector (scaffold)
types.go # Connector request/progress contracts
parser/ # Archive parsers
parser.go # BMCParser (dispatcher) + parse orchestration
archive.go # Archive extraction helpers
registry.go # Parser registry + detect/selection
interface.go # VendorParser interface
vendors/ # Vendor-specific parser modules
vendors.go # Import-side-effect registrations
dell/
inspur/
nvidia/
nvidia_bug_report/
unraid/
xigmanas/
generic/
pciids/ # PCI IDs lookup (embedded pci.ids)
server/ # HTTP layer
server.go # Server struct, route registration
handlers.go # All HTTP handler functions
exporter/ # Export formatters
exporter.go # CSV + JSON exporters
reanimator_models.go
reanimator_converter.go
models/ # Shared data contracts
web/
embed.go # go:embed directive
templates/ # HTML templates
static/ # JS / CSS
js/app.js # Frontend — API contract consumer
```text
cmd/logpile/main.go entrypoint and CLI flags
internal/server/ HTTP handlers, jobs, upload/export flows
internal/collector/ live collection and Redfish replay
internal/analyzer/ shared analysis helpers
internal/parser/ archive extraction and parser dispatch
internal/exporter/ CSV and Reanimator conversion
internal/models/ stable data contracts
web/ embedded UI assets
```
## In-memory state
## Server state
The `Server` struct in `internal/server/server.go` holds:
`internal/server.Server` stores:
| Field | Type | Description |
|-------|------|-------------|
| `result` | `*models.AnalysisResult` | Current parsed/collected dataset |
| `detectedVendor` | `string` | Vendor identifier from last parse |
| `jobManager` | `*JobManager` | Tracks live collect job status/logs |
| `collectors` | `*collector.Registry` | Registered live collection connectors |
| Field | Purpose |
|------|---------|
| `result` | Current `AnalysisResult` shown in UI and used by exports |
| `detectedVendor` | Parser/collector identity for the current dataset |
| `rawExport` | Reopenable raw-export package associated with current result |
| `jobManager` | Shared async job state for collect and convert flows |
| `collectors` | Registered live collectors (`redfish`, `ipmi`) |
| `convertOutput` | Temporary ZIP artifacts for batch convert downloads |
State is replaced atomically on successful upload or collect.
On a failed/canceled collect, the previous `result` is preserved unchanged.
State is replaced only on successful upload or successful live collection.
Failed or canceled jobs do not overwrite the previous dataset.
## Upload flow (`POST /api/upload`)
## Main flows
```
multipart form field: "archive"
├─ file looks like JSON?
│ └─ parse as models.AnalysisResult snapshot → store in Server.result
└─ otherwise
└─ parser.NewBMCParser().ParseFromReader(...)
├─ try all registered vendor parsers (highest confidence wins)
└─ result → store in Server.result
```
### Upload
## Live collect flow (`POST /api/collect`)
1. `POST /api/upload` receives multipart field `archive`
2. JSON inputs are checked for raw-export package or `AnalysisResult` snapshot
3. Non-JSON inputs go through `parser.BMCParser`
4. Archive metadata is normalized onto `AnalysisResult`
5. Result becomes the current in-memory dataset
```
validate request (host / protocol / port / username / auth_type / tls_mode)
└─ launch async job
├─ progress callback → job log (queryable via GET /api/collect/{id})
├─ success:
│ set source metadata (source_type=api, protocol, host, date)
│ store result in Server.result
└─ failure / cancel:
previous Server.result unchanged
```
### Live collect
Job lifecycle states: `queued → running → success | failed | canceled`
1. `POST /api/collect` validates request fields
2. Server creates an async job and returns `202 Accepted`
3. Selected collector gathers raw data
4. For Redfish, collector saves `raw_payloads.redfish_tree`
5. Result is normalized, source metadata applied, and state replaced on success
### Batch convert
1. `POST /api/convert` accepts multiple files
2. Each supported file is analyzed independently
3. Successful results are converted to Reanimator JSON
4. Outputs are packaged into a temporary ZIP artifact
5. Client polls job status and downloads the artifact when ready
## Redfish design rule
Live Redfish collection and offline Redfish re-analysis must use the same replay path.
The collector first captures `raw_payloads.redfish_tree`, then the replay logic builds the normalized result.
## PCI IDs lookup
Load/override order (`LOGPILE_PCI_IDS_PATH` has highest priority because it is loaded last):
Lookup order:
1. Embedded `internal/parser/vendors/pciids/pci.ids` (base dataset compiled into binary)
1. Embedded `internal/parser/vendors/pciids/pci.ids`
2. `./pci.ids`
3. `/usr/share/hwdata/pci.ids`
4. `/usr/share/misc/pci.ids`
5. `/opt/homebrew/share/pciids/pci.ids`
6. Paths from `LOGPILE_PCI_IDS_PATH` (colon-separated on Unix; later loaded, override same IDs)
6. Extra paths from `LOGPILE_PCI_IDS_PATH`
This means unknown GPU/NIC model strings can be updated by refreshing `pci.ids`
without any code change.
Later sources override earlier ones for the same IDs.

View File

@@ -2,38 +2,37 @@
## Conventions
- All endpoints under `/api/`.
- Request bodies: `application/json` or `multipart/form-data` where noted.
- Responses: `application/json` unless file download.
- Export filenames follow pattern: `YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
- All endpoints are under `/api/`
- JSON responses are used unless the endpoint downloads a file
- Async jobs share the same status model: `queued`, `running`, `success`, `failed`, `canceled`
- Export filenames use `YYYY-MM-DD (MODEL) - SERIAL.<ext>` when board metadata exists
---
## Upload & Data Input
## Input endpoints
### `POST /api/upload`
Upload a vendor diagnostic archive or a JSON snapshot.
**Request:** `multipart/form-data`, field name `archive`.
Server-side multipart limit: **100 MiB**.
Uploads one file in multipart field `archive`.
Accepted inputs:
- `.tar`, `.tar.gz`, `.tgz` — vendor diagnostic archives
- `.txt` — plain text files
- JSON file containing a serialized `AnalysisResult` — re-loaded as-is
- supported archive/log formats from the parser registry
- `.json` `AnalysisResult` snapshots
- raw-export JSON packages
- raw-export ZIP bundles
**Response:** `200 OK` with parsed result summary, or `4xx`/`5xx` on error.
Result:
- parses or replays the input
- stores the result as current in-memory state
- returns parsed summary JSON
---
## Live Collection
Related helper:
- `GET /api/file-types` returns `archive_extensions`, `upload_extensions`, and `convert_extensions`
### `POST /api/collect`
Start a live collection job (`redfish` or `ipmi`).
Starts a live collection job.
Request body:
**Request body:**
```json
{
"host": "bmc01.example.local",
@@ -47,138 +46,125 @@ Start a live collection job (`redfish` or `ipmi`).
```
Supported values:
- `protocol`: `redfish` | `ipmi`
- `auth_type`: `password` | `token`
- `tls_mode`: `strict` | `insecure`
- `protocol`: `redfish` or `ipmi`
- `auth_type`: `password` or `token`
- `tls_mode`: `strict` or `insecure`
**Response:** `202 Accepted`
```json
{
"job_id": "job_a1b2c3d4e5f6",
"status": "queued",
"message": "Collection job accepted",
"created_at": "2026-02-23T12:00:00Z"
}
```
Validation behavior:
- `400 Bad Request` for invalid JSON
- `422 Unprocessable Entity` for semantic validation errors (missing/invalid fields)
Responses:
- `202` on accepted job creation
- `400` on malformed JSON
- `422` on validation errors
### `GET /api/collect/{id}`
Poll job status and progress log.
**Response:**
```json
{
"job_id": "job_a1b2c3d4e5f6",
"status": "running",
"progress": 55,
"logs": ["..."],
"created_at": "2026-02-23T12:00:00Z",
"updated_at": "2026-02-23T12:00:10Z"
}
```
Status values: `queued` | `running` | `success` | `failed` | `canceled`
Returns async collection job status, progress, timestamps, and accumulated logs.
### `POST /api/collect/{id}/cancel`
Cancel a running job.
Requests cancellation for a running collection job.
---
### `POST /api/convert`
## Data Queries
Starts a batch conversion job that accepts multiple files under `files[]` or `files`.
Each supported file is parsed independently and converted to Reanimator JSON.
Response fields:
- `job_id`
- `status`
- `accepted`
- `skipped`
- `total_files`
### `GET /api/convert/{id}`
Returns batch convert job status using the same async job envelope as collection.
### `GET /api/convert/{id}/download`
Downloads the ZIP artifact produced by a successful convert job.
## Read endpoints
### `GET /api/status`
Returns source metadata for the current dataset.
If nothing is loaded, response is `{ "loaded": false }`.
```json
{
"loaded": true,
"filename": "redfish://bmc01.example.local",
"vendor": "redfish",
"source_type": "api",
"protocol": "redfish",
"target_host": "bmc01.example.local",
"collected_at": "2026-02-10T15:30:00Z",
"stats": { "events": 0, "sensors": 0, "fru": 0 }
}
```
`source_type`: `archive` | `api`
When no dataset is loaded, response is `{ "loaded": false }`.
Typical fields:
- `loaded`
- `filename`
- `vendor`
- `source_type`
- `protocol`
- `target_host`
- `source_timezone`
- `collected_at`
- `stats`
### `GET /api/config`
Returns source metadata plus:
Returns the main UI configuration payload, including:
- source metadata
- `hardware.board`
- `hardware.firmware`
- canonical `hardware.devices`
- computed `specification` summary lines
- computed specification lines
### `GET /api/events`
Returns parsed diagnostic events.
Returns events sorted newest first.
### `GET /api/sensors`
Returns sensor readings (temperatures, voltages, fan speeds).
Returns parsed sensors plus synthesized PSU voltage sensors when telemetry is available.
### `GET /api/serials`
Returns serial numbers built from canonical `hardware.devices`.
Returns serial-oriented inventory built from canonical devices.
### `GET /api/firmware`
Returns firmware versions built from canonical `hardware.devices`.
Returns firmware-oriented inventory built from canonical devices.
### `GET /api/parse-errors`
Returns normalized parse and collection issues combined from:
- Redfish fetch errors in `raw_payloads`
- raw-export collect logs
- derived partial-inventory warnings
### `GET /api/parsers`
Returns list of registered vendor parsers with their identifiers.
Returns registered parser metadata.
---
### `GET /api/file-types`
## Export
Returns supported file extensions for upload and batch convert.
## Export endpoints
### `GET /api/export/csv`
Download serial numbers as CSV.
Downloads serial-number CSV.
### `GET /api/export/json`
Download full `AnalysisResult` as JSON (includes `raw_payloads`).
Downloads a raw-export artifact for reopen and re-analysis.
Current implementation emits a ZIP bundle containing:
- `raw_export.json`
- `collect.log`
- `parser_fields.json`
### `GET /api/export/reanimator`
Download hardware data in Reanimator format for asset tracking integration.
See [`07-exporters.md`](07-exporters.md) for full format spec.
Downloads Reanimator JSON built from the current normalized result.
---
## Management
## Management endpoints
### `DELETE /api/clear`
Clear current in-memory dataset.
Clears current in-memory dataset, raw export state, and temporary convert artifacts.
### `POST /api/shutdown`
Gracefully shut down the server process.
This endpoint terminates the current process after responding.
---
## Source metadata fields
Fields present in `/api/status` and `/api/config`:
| Field | Values |
|-------|--------|
| `source_type` | `archive` \| `api` |
| `protocol` | `redfish` \| `ipmi` (may be empty for archive uploads) |
| `target_host` | IP or hostname |
| `collected_at` | RFC3339 timestamp |
Gracefully shuts down the process after responding.

View File

@@ -1,104 +1,87 @@
# 04 — Data Models
## AnalysisResult
## Core contract: `AnalysisResult`
`internal/models/` — the central data contract shared by parsers, collectors, exporters, and the HTTP layer.
`internal/models/models.go` defines the shared result passed between parsers, collectors, server handlers, and exporters.
**Stability rule:** Never break the JSON shape of `AnalysisResult`.
Backward-compatible additions are allowed; removals or renames are not.
Stability rule:
- do not rename or remove JSON fields from `AnalysisResult`
- additive fields are allowed
- UI and exporter compatibility depends on this shape remaining stable
Key top-level fields:
Key fields:
| Field | Type | Description |
|-------|------|-------------|
| `filename` | `string` | Uploaded filename or generated live source identifier |
| `source_type` | `string` | `archive` or `api` |
| `protocol` | `string` | `redfish`, `ipmi`, or empty for archive uploads |
| `target_host` | `string` | BMC host for live collection |
| `collected_at` | `time.Time` | Upload/collection timestamp |
| `hardware` | `*HardwareConfig` | All normalized hardware inventory |
| `events` | `[]Event` | Diagnostic events from parsers |
| `fru` | `[]FRUInfo` | FRU/SDR-derived inventory details |
| `sensors` | `[]SensorReading` | Sensor readings |
| `raw_payloads` | `map[string]any` | Raw vendor data (e.g. `redfish_tree`) |
| Field | Meaning |
|------|---------|
| `filename` | Original upload name or synthesized live source name |
| `source_type` | `archive` or `api` |
| `protocol` | `redfish`, `ipmi`, or empty for archive uploads |
| `target_host` | Hostname or IP for live collection |
| `source_timezone` | Source timezone/offset if known |
| `collected_at` | Canonical collection/upload time |
| `raw_payloads` | Raw source data used for replay or diagnostics |
| `events` | Parsed event timeline |
| `fru` | FRU-derived inventory details |
| `sensors` | Sensor readings |
| `hardware` | Normalized hardware inventory |
`raw_payloads` is the durable source for offline re-analysis (especially for Redfish).
Normalized fields should be treated as derivable output from raw source data.
## `HardwareConfig`
### Hardware sub-structure
Main sections:
```
HardwareConfig
├── board BoardInfo — server/motherboard identity
├── devices []HardwareDevice — CANONICAL INVENTORY (see below)
├── cpus []CPU
├── memory []MemoryDIMM
├── storage []Storage
├── volumes []StorageVolume — logical RAID/VROC volumes
├── pcie_devices []PCIeDevice
├── gpus []GPU
├── network_adapters []NetworkAdapter
├── network_cards []NIC (legacy/alternate source field)
├── power_supplies []PSU
└── firmware []FirmwareInfo
```text
hardware.board
hardware.devices
hardware.cpus
hardware.memory
hardware.storage
hardware.volumes
hardware.pcie_devices
hardware.gpus
hardware.network_adapters
hardware.network_cards
hardware.power_supplies
hardware.firmware
```
---
`network_cards` is legacy/alternate source data.
`hardware.devices` is the canonical cross-section inventory.
## Canonical Device Repository (`hardware.devices`)
## Canonical inventory: `hardware.devices`
`hardware.devices` is the **single source of truth** for hardware inventory.
`hardware.devices` is the single source of truth for device-oriented UI and Reanimator export.
### Rules — must not be violated
Required rules:
1. All UI tabs displaying hardware components **must read from `hardware.devices`**.
2. The Device Inventory tab shows kinds: `pcie`, `storage`, `gpu`, `network`.
3. The Reanimator exporter **must use the same `hardware.devices`** as the UI.
4. Any discrepancy between UI data and Reanimator export data is a **bug**.
5. New hardware attributes must be added to the canonical device schema **first**,
then mapped to Reanimator/UI — never the other way around.
6. The exporter should group/filter canonical records by section, not rebuild data
from multiple sources.
1. UI hardware views must read from `hardware.devices`
2. Reanimator conversion must derive device sections from `hardware.devices`
3. UI/export mismatches are bugs, not accepted divergence
4. New shared device fields belong in `HardwareDevice` first
### Deduplication logic (applied once by repository builder)
Deduplication priority:
| Priority | Key used |
|----------|----------|
| 1 | `serial_number` — usable (not empty, not `N/A`, `NA`, `NONE`, `NULL`, `UNKNOWN`, `-`) |
| 2 | `bdf` — PCI Bus:Device.Function address |
| 3 | No merge — records remain distinct if both serial and bdf are absent |
| Priority | Key |
|----------|-----|
| 1 | usable `serial_number` |
| 2 | `bdf` |
| 3 | keep records separate |
### Device schema alignment
## Raw payloads
Keep `hardware.devices` schema as close as possible to Reanimator JSON field names.
This minimizes translation logic in the exporter and prevents drift.
`raw_payloads` is authoritative for replayable sources.
---
Current important payloads:
- `redfish_tree`
- `redfish_fetch_errors`
- `source_timezone`
## Source metadata fields (stored directly on `AnalysisResult`)
Normalized hardware fields are derived output, not the long-term source of truth.
Carried by both `/api/status` and `/api/config`:
## Raw export package
```json
{
"source_type": "api",
"protocol": "redfish",
"target_host": "10.0.0.1",
"collected_at": "2026-02-10T15:30:00Z"
}
```
Valid `source_type` values: `archive`, `api`
Valid `protocol` values: `redfish`, `ipmi` (empty is allowed for archive uploads)
---
## Raw Export Package (reopenable artifact)
`Export Raw Data` does not merely dump `AnalysisResult`; it emits a reopenable raw package
(JSON or ZIP bundle) that carries source data required for re-analysis.
`/api/export/json` produces a reopenable raw-export artifact.
Design rules:
- raw source is authoritative (`redfish_tree` or original file bytes)
- imports must re-analyze from raw source
- parsed field snapshots included in bundles are diagnostic artifacts, not the source of truth
- raw source stays authoritative
- uploads of raw-export artifacts must re-analyze from raw source
- parsed snapshots inside the bundle are diagnostic only

View File

@@ -3,107 +3,69 @@
Collectors live in `internal/collector/`.
Core files:
- `internal/collector/registry.go` — connector registry (`redfish`, `ipmi`)
- `internal/collector/redfish.go` — real Redfish connector
- `internal/collector/ipmi_mock.go` — IPMI mock connector scaffold
- `internal/collector/types.go` — request/progress contracts
- `registry.go` for protocol registration
- `redfish.go` for live collection
- `redfish_replay.go` for replay from raw payloads
- `ipmi_mock.go` for the placeholder IPMI implementation
- `types.go` for request/progress contracts
---
## Redfish collector
## Redfish Collector (`redfish`)
Status: active production path.
**Status:** Production-ready.
Request fields passed from the server:
- `host`
- `port`
- `username`
- `auth_type`
- credential field (`password` or token)
- `tls_mode`
### Request contract (from server)
### Core rule
Passed through from `/api/collect` after validation:
- `host`, `port`, `username`
- `auth_type=password|token` (+ matching credential field)
- `tls_mode=strict|insecure`
Live collection and replay must stay behaviorally aligned.
If the collector adds a fallback, probe, or normalization rule, replay must mirror it.
### Discovery
### Discovery model
Dynamic — does not assume fixed paths. Discovers:
- `Systems` collection → per-system resources
- `Chassis` collection → enclosure/board data
- `Managers` collection → BMC/firmware info
The collector does not rely on one fixed vendor tree.
It discovers and follows Redfish resources dynamically from root collections such as:
- `Systems`
- `Chassis`
- `Managers`
### Collected data
### Stored raw data
| Category | Notes |
|----------|-------|
| CPU | Model, cores, threads, socket, status |
| Memory | DIMM slot, size, type, speed, serial, manufacturer |
| Storage | Slot, type, model, serial, firmware, interface, status |
| GPU | Detected via PCIe class + NVIDIA vendor ID |
| PSU | Model, serial, wattage, firmware, telemetry (input/output power, voltage) |
| NIC | Model, serial, port count, BDF |
| PCIe | Slot, vendor_id, device_id, BDF, link width/speed |
| Firmware | BIOS, BMC versions |
Important raw payloads:
- `raw_payloads.redfish_tree`
- `raw_payloads.redfish_fetch_errors`
- `raw_payloads.source_timezone` when available
### Raw snapshot
### Snapshot crawler rules
Full Redfish response tree is stored in `result.RawPayloads["redfish_tree"]`.
This allows future offline re-analysis without re-collecting from a live BMC.
- bounded by `LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`
- prioritized toward high-value inventory paths
- tolerant of expected vendor-specific failures
- normalizes `@odata.id` values before queueing
### Unified Redfish analysis pipeline (live == replay)
### Redfish implementation guidance
LOGPile uses a **single Redfish analyzer path**:
When changing collection logic:
1. Live collector crawls the Redfish API and builds `raw_payloads.redfish_tree`
2. Parsed result is produced by replaying that tree through the same analyzer used by raw import
1. Prefer alternate-path support over vendor hardcoding
2. Keep expensive probing bounded
3. Deduplicate by serial, then BDF, then location/model fallbacks
4. Preserve replay determinism from saved raw payloads
5. Add tests for both the motivating topology and a negative case
This guarantees that live collection and `Export Raw Data` re-open/re-analyze produce the same
normalized output for the same `redfish_tree`.
### Known vendor fallbacks
### Snapshot crawler behavior (important)
- empty standard drive collections may trigger bounded `Disk.Bay` probing
- `Storage.Links.Enclosures[*]` may be followed to recover physical drives
- `PowerSubsystem/PowerSupplies` is preferred over legacy `Power` when available
The Redfish snapshot crawler is intentionally:
- **bounded** (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
- **prioritized** (PCIe, Fabrics, FirmwareInventory, Storage, PowerSubsystem, ThermalSubsystem)
- **tolerant** (skips noisy expected failures, strips `#fragment` from `@odata.id`)
## IPMI collector
Design notes:
- Queue capacity is sized to snapshot cap to avoid worker deadlocks on large trees.
- UI progress is coarse and human-readable; detailed per-request diagnostics are available via debug logs.
- `LOGPILE_REDFISH_DEBUG=1` and `LOGPILE_REDFISH_SNAPSHOT_DEBUG=1` enable console diagnostics.
Status: mock scaffold only.
### Parsing guidelines
When adding Redfish mappings, follow these principles:
- Support alternate collection paths (resources may appear at different odata URLs).
- Follow `@odata.id` references and handle embedded `Members` arrays.
- Prefer **raw-tree replay compatibility**: if live collector adds a fallback/probe, replay analyzer must mirror it.
- Deduplicate by serial / BDF / slot+model (in that priority order).
- Prefer tolerant/fallback parsing — missing fields should be silently skipped,
not cause the whole collection to fail.
### Vendor-specific storage fallbacks (Supermicro and similar)
When standard `Storage/.../Drives` collections are empty, collector/replay may recover drives via:
- `Storage.Links.Enclosures[*] -> .../Drives`
- direct probing of finite `Disk.Bay` candidates (`Disk.Bay.0`, `Disk.Bay0`, `.../0`)
This is required for some BMCs that publish drive inventory in vendor-specific paths while leaving
standard collections empty.
### PSU source preference (newer Redfish)
PSU inventory source order:
1. `Chassis/*/PowerSubsystem/PowerSupplies` (preferred on X14+/newer Redfish)
2. `Chassis/*/Power` (legacy fallback)
### Progress reporting
The collector emits progress log entries at each stage (connecting, enumerating systems,
collecting CPUs, etc.) so the UI can display meaningful status.
Current progress message strings are user-facing and may be localized.
---
## IPMI Collector (`ipmi`)
**Status:** Mock scaffold only — not implemented.
Registered in the collector registry but returns placeholder data.
Real IPMI support is a future work item.
It remains registered for protocol completeness, but it is not a real collection path.

View File

@@ -2,261 +2,69 @@
## Framework
### Registration
Parsers live in `internal/parser/` and vendor implementations live in `internal/parser/vendors/`.
Each vendor parser registers itself via Go's `init()` side-effect import pattern.
Core behavior:
- registration uses `init()` side effects
- all registered parsers run `Detect()`
- the highest-confidence parser wins
- generic fallback stays last and low-confidence
All registrations are collected in `internal/parser/vendors/vendors.go`:
```go
import (
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
// etc.
)
```
### VendorParser interface
`VendorParser` contract:
```go
type VendorParser interface {
Name() string // human-readable name
Vendor() string // vendor identifier string
Version() string // parser version (increment on logic changes)
Detect(files []ExtractedFile) int // confidence 0100
Name() string
Vendor() string
Version() string
Detect(files []ExtractedFile) int
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
}
```
### Selection logic
## Adding a parser
All registered parsers run `Detect()` against the uploaded archive's file list.
The parser with the **highest confidence score** is selected.
Multiple parsers may return >0; only the top scorer is used.
1. Create `internal/parser/vendors/<vendor>/`
2. Start from `internal/parser/vendors/template/parser.go.template`
3. Implement `Detect()` and `Parse()`
4. Add a blank import in `internal/parser/vendors/vendors.go`
5. Add at least one positive and one negative detection test
### Adding a new vendor parser
## Data quality rules
1. `mkdir -p internal/parser/vendors/VENDORNAME`
2. Copy `internal/parser/vendors/template/parser.go.template` as starting point.
3. Implement `Detect()` and `Parse()`.
4. Add blank import to `vendors/vendors.go`.
### System firmware only in `hardware.firmware`
`Detect()` tips:
- Look for unique filenames or directory names.
- Check file content for vendor-specific markers.
- Return 70+ only when confident; return 0 if clearly not a match.
`hardware.firmware` must contain system-level firmware only.
Device-bound firmware belongs on the device record and must not be duplicated at the top level.
### Parser versioning
### Strip embedded MAC addresses from model names
Each parser file contains a `parserVersion` constant.
Increment the version whenever parsing logic changes — this helps trace which
version produced a given result.
If a source embeds ` - XX:XX:XX:XX:XX:XX` in a model/name field, remove that suffix before storing it.
---
### Use `pci.ids` for empty or generic PCI model names
## Parser data quality rules
When `vendor_id` and `device_id` are known but the model name is missing or generic, resolve the name via `internal/parser/vendors/pciids`.
### FirmwareInfo — system-level only
## Active vendor coverage
`Hardware.Firmware` must contain **only system-level firmware**: BIOS, BMC/iDRAC,
Lifecycle Controller, CPLD, storage controllers, BOSS adapters.
| Vendor ID | Input family | Notes |
|-----------|--------------|-------|
| `dell` | TSR ZIP archives | Broad hardware, firmware, sensors, lifecycle events |
| `h3c_g5` | H3C SDS G5 bundles | INI/XML/CSV-driven hardware and event parsing |
| `h3c_g6` | H3C SDS G6 bundles | Similar flow with G6-specific files |
| `inspur` | onekeylog archives | FRU/SDR plus optional Redis enrichment |
| `nvidia` | HGX Field Diagnostics | GPU- and fabric-heavy diagnostic input |
| `nvidia_bug_report` | `nvidia-bug-report-*.log.gz` | dmidecode, lspci, NVIDIA driver sections |
| `unraid` | Unraid diagnostics/log bundles | Server and storage-focused parsing |
| `xigmanas` | XigmaNAS plain logs | FreeBSD/NAS-oriented inventory |
| `generic` | fallback | Low-confidence text fallback when nothing else matches |
**Device-bound firmware** (NIC, GPU, PSU, disk, backplane) **must NOT be added to
`Hardware.Firmware`**. It belongs to the device's own `Firmware` field and is already
present there. Duplicating it in `Hardware.Firmware` causes double entries in Reanimator.
## Practical guidance
The Reanimator exporter filters by `FirmwareInfo.DeviceName` prefix and by
`FirmwareInfo.Description` (FQDD prefix). Parsers must cooperate:
- Store the device's FQDD (or equivalent slot identifier) in `FirmwareInfo.Description`
for all firmware entries that come from a per-device inventory source (e.g. Dell
`DCIM_SoftwareIdentity`).
- FQDD prefixes that are device-bound: `NIC.`, `PSU.`, `Disk.`, `RAID.Backplane.`, `GPU.`
### NIC/device model names — strip embedded MAC addresses
Some vendors (confirmed: Dell TSR) embed the MAC address in the device model name field,
e.g. `ProductName = "NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08"`.
**Rule:** Strip any ` - XX:XX:XX:XX:XX:XX` suffix from model/name strings before storing
them in `FirmwareInfo.DeviceName`, `NetworkAdapter.Model`, or any other model field.
Use `nicMACInModelRE` (defined in the Dell parser) or an equivalent regex:
```
\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$
```
This applies to **all** string fields used as device names or model identifiers.
### PCI device name enrichment via pci.ids
If a PCIe device, GPU, NIC, or any hardware component has a `vendor_id` + `device_id`
but its model/name field is **empty or generic** (e.g. blank, equals the description,
or is just a raw hex ID), the parser **must** attempt to resolve the human-readable
model name from the embedded `pci.ids` database before storing the result.
**Rule:** When `Model` (or equivalent name field) is empty and both `VendorID` and
`DeviceID` are non-zero, call the pciids lookup and use the result as the model name.
```go
// Example pattern — use in any parser that handles PCIe/GPU/NIC devices:
if strings.TrimSpace(device.Model) == "" && device.VendorID != 0 && device.DeviceID != 0 {
if name := pciids.Lookup(device.VendorID, device.DeviceID); name != "" {
device.Model = name
}
}
```
This rule applies to all vendor parsers. The pciids package is available at
`internal/parser/vendors/pciids`. See ADL-005 for the rationale.
**Do not hardcode model name strings.** If a device is unknown today, it will be
resolved automatically once `pci.ids` is updated.
---
## Vendor parsers
### Inspur / Kaytus (`inspur`)
**Status:** Ready. Tested on KR4268X2 (onekeylog format).
**Archive format:** `.tar.gz` onekeylog
**Primary source files:**
| File | Content |
|------|---------|
| `asset.json` | Base hardware inventory |
| `component.log` | Component list |
| `devicefrusdr.log` | FRU and SDR data |
| `onekeylog/runningdata/redis-dump.rdb` | Runtime enrichment (optional) |
**Redis RDB enrichment** (applied conservatively — fills missing fields only):
- GPU: `serial_number`, `firmware` (VBIOS/FW), runtime telemetry
- NIC: firmware, serial, part number (when text logs leave fields empty)
**Module structure:**
```
inspur/
parser.go — main parser + registration
sdr.go — sensor/SDR parsing
fru.go — FRU serial parsing
asset.go — asset.json parsing
syslog.go — syslog parsing
```
---
### Dell TSR (`dell`)
**Status:** Ready (v3.0). Tested on nested TSR archives with embedded `*.pl.zip`.
**Archive format:** `.zip` (outer archive + nested `*.pl.zip`)
**Primary source files:**
- `tsr/metadata.json`
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml`
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml`
- `tsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xml`
- `tsr/hardware/sysinfo/lcfiles/curr_lclog.xml`
**Extracted data:**
- Board/system identity and BIOS/iDRAC firmware
- CPU, memory, physical disks, virtual disks, PSU, NIC, PCIe
- GPU inventory (`DCIM_VideoView`) + GPU sensor enrichment (`DCIM_GPUSensor`)
- Controller/backplane inventory (`DCIM_ControllerView`, `DCIM_EnclosureView`)
- Sensor readings (temperature/voltage/current/power/fan/utilization)
- Lifecycle events (`curr_lclog.xml`)
---
### NVIDIA HGX Field Diagnostics (`nvidia`)
**Status:** Ready (v1.1.0). Works with any server vendor.
**Archive format:** `.tar` / `.tar.gz`
**Confidence scoring:**
| File | Score |
|------|-------|
| `unified_summary.json` with "HGX Field Diag" marker | +40 |
| `summary.json` | +20 |
| `summary.csv` | +15 |
| `gpu_fieldiag/` directory | +15 |
**Source files:**
| File | Content |
|------|---------|
| `output.log` | dmidecode — server manufacturer, model, serial number |
| `unified_summary.json` | GPU details, NVSwitch devices, PCI addresses |
| `summary.json` | Diagnostic test results and error codes |
| `summary.csv` | Alternative test results format |
**Extracted data:**
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
**Severity mapping:**
- `info` — tests passed
- `warning` — e.g. "Row remapping failed"
- `critical` — error codes 300+
**Known limitations:**
- Detailed logs in `gpu_fieldiag/*.log` are not parsed.
- No CPU, memory, or storage extraction (not present in field diag archives).
---
### NVIDIA Bug Report (`nvidia_bug_report`)
**Status:** Ready (v1.0.0).
**File format:** `nvidia-bug-report-*.log.gz` (gzip-compressed text)
**Confidence:** 85 (high priority for matching filename pattern)
**Source sections parsed:**
| dmidecode section | Extracts |
|-------------------|---------|
| System Information | server serial, UUID, manufacturer, product name |
| Processor Information | CPU model, serial, core/thread count, frequency |
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
| Other source | Extracts |
|--------------|---------|
| `lspci -vvv` (Ethernet/Network/IB) | NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
| `/proc/driver/nvidia/gpus/*/information` | GPU model, BDF, UUID, VBIOS version, IRQ |
| NVRM version line | NVIDIA driver version |
**Known limitations:**
- Driver error/warning log lines not yet extracted.
- GPU temperature/utilization metrics require additional parsing sections.
---
### XigmaNAS (`xigmanas`)
**Status:** Ready.
**Archive format:** Plain log files (FreeBSD-based NAS system)
**Detection:** Files named `xigmanas`, `system`, or `dmesg`; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
**Extracted data:**
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
- Populates: `Hardware.Firmware`, `Hardware.CPUs`, `Hardware.Memory`, `Hardware.Storage`, `Sensors`
---
### Unraid (`unraid`)
**Status:** Ready (v1.0.0).
- Be conservative with high detect scores
- Prefer filling missing fields over overwriting stronger source data
- Keep parser version constants current when behavior changes
- Any new vendor-specific filtering or dedup logic must ship with tests for that vendor format
**Archive format:** Unraid diagnostics archive contents (text-heavy diagnostics directories).

View File

@@ -1,366 +1,63 @@
# 07 — Exporters & Reanimator Integration
# 07 — Exporters
## Export endpoints summary
## Export surfaces
| Endpoint | Format | Filename pattern |
|----------|--------|-----------------|
| `GET /api/export/csv` | CSV — serial numbers | `YYYY-MM-DD (MODEL) - SN.csv` |
| `GET /api/export/json` | **Raw export package** (JSON or ZIP bundle) for reopen/re-analysis | `YYYY-MM-DD (MODEL) - SN.(json|zip)` |
| `GET /api/export/reanimator` | Reanimator hardware JSON | `YYYY-MM-DD (MODEL) - SN.json` |
| Endpoint | Output | Purpose |
|----------|--------|---------|
| `GET /api/export/csv` | CSV | Serial-number export |
| `GET /api/export/json` | raw-export ZIP bundle | Reopen and re-analyze later |
| `GET /api/export/reanimator` | JSON | Reanimator hardware payload |
| `POST /api/convert` | async ZIP artifact | Batch archive-to-Reanimator conversion |
---
## Raw export
## Raw Export (`Export Raw Data`)
Raw export is not a final report dump.
It is a replayable artifact that preserves enough source data for future parser improvements.
### Purpose
Current bundle contents:
- `raw_export.json`
- `collect.log`
- `parser_fields.json`
Preserve enough source data to reproduce parsing later after parser fixes, without requiring
another live collection from the target system.
Design rules:
- raw source is authoritative
- uploads of raw export must replay from raw source
- parsed snapshots inside the bundle are diagnostic only
### Format
## Reanimator export
`/api/export/json` returns a **raw export package**:
- JSON package (machine-readable), or
- ZIP bundle containing:
- `raw_export.json` — machine-readable package
- `collect.log` — human-readable collection + parsing summary
- `parser_fields.json` — structured parsed field snapshot for diffs between parser versions
Implementation files:
- `internal/exporter/reanimator_models.go`
- `internal/exporter/reanimator_converter.go`
- `internal/server/handlers.go`
### Import / reopen behavior
Conversion rules:
- canonical source is `hardware.devices`
- timestamps are RFC3339
- status is normalized to Reanimator-friendly values
- missing PCIe serials may be generated from board serial + slot
- `NULL`-style board manufacturer/product values are treated as absent
When a raw export package is uploaded back into LOGPile:
- the app **re-analyzes from raw source**
- it does **not** trust embedded parsed output as source of truth
## Inclusion rules
For Redfish, this means replay from `raw_payloads.redfish_tree`.
Included:
- empty memory slots (`present=false`) for topology visibility
- PCIe-class devices even when serial must be synthesized
### Design rule
Excluded:
- storage without `serial_number`
- power supplies without `serial_number`
- non-present network adapters
- device-bound firmware duplicated at top-level firmware list
Raw export is a **re-analysis artifact**, not a final report dump. Keep it self-contained and
forward-compatible where possible (versioned package format, additive fields only).
## Batch convert
---
`POST /api/convert` accepts multiple supported files and produces a ZIP with:
- one `*.reanimator.json` file per successful input
- `convert-summary.txt`
## Reanimator Export
### Purpose
Exports hardware inventory data in the format expected by the Reanimator asset tracking
system. Enables one-click push from LOGPile to an external asset management platform.
### Implementation files
| File | Role |
|------|------|
| `internal/exporter/reanimator_models.go` | Go structs for Reanimator JSON |
| `internal/exporter/reanimator_converter.go` | `ConvertToReanimator()` and helpers |
| `internal/server/handlers.go` | `handleExportReanimator()` HTTP handler |
### Conversion rules
- Source: canonical `hardware.devices` repository (see [`04-data-models.md`](04-data-models.md))
- CPU manufacturer inferred from model string (Intel / AMD / ARM / Ampere)
- PCIe serial number generated when absent: `{board_serial}-PCIE-{slot}`
- Status values normalized to: `OK`, `Warning`, `Critical`, `Unknown` (`Empty` only for memory slots)
- Timestamps in RFC3339 format
- `target_host` derived from `filename` field (`redfish://…`, `ipmi://…`) if not in source; omitted if undeterminable
- `board.manufacturer` and `board.product_name` values of `"NULL"` treated as absent
### LOGPile → Reanimator field mapping
| LOGPile type | Reanimator section | Notes |
|---|---|---|
| `BoardInfo` | `board` | Direct mapping |
| `CPU` | `cpus` | + manufacturer (inferred) |
| `MemoryDIMM` | `memory` | Direct; empty slots included (`present=false`) |
| `Storage` | `storage` | Excluded if no `serial_number` |
| `PCIeDevice` | `pcie_devices` | Serial generated if missing |
| `GPU` | `pcie_devices` | `device_class=DisplayController` |
| `NetworkAdapter` | `pcie_devices` | `device_class=NetworkController` |
| `PSU` | `power_supplies` | Excluded if no serial or `present=false` |
| `FirmwareInfo` | `firmware` | Direct mapping |
### Inclusion / exclusion rules
**Included:**
- Memory slots with `present=false` (as Empty slots)
- PCIe devices without serial number (serial is generated)
**Excluded:**
- Storage without `serial_number`
- PSU without `serial_number` or with `present=false`
- NetworkAdapters with `present=false`
---
## Reanimator Integration Guide
This section documents the Reanimator receiver-side JSON format (what the Reanimator
system expects when it ingests a LOGPile export).
> **Important:** The Reanimator endpoint uses a strict JSON decoder (`DisallowUnknownFields`).
> Any unknown field — including nested ones — causes `400 Bad Request`.
> Use only `snake_case` keys listed here.
### Top-level structure
```json
{
"filename": "redfish://10.10.10.103",
"source_type": "api",
"protocol": "redfish",
"target_host": "10.10.10.103",
"collected_at": "2026-02-10T15:30:00Z",
"hardware": {
"board": {...},
"firmware": [...],
"cpus": [...],
"memory": [...],
"storage": [...],
"pcie_devices": [...],
"power_supplies": [...]
}
}
```
**Required:** `collected_at`, `hardware.board.serial_number`
**Optional:** `target_host`, `source_type`, `protocol`, `filename`
`source_type` values: `api`, `logfile`, `manual`
`protocol` values: `redfish`, `ipmi`, `snmp`, `ssh`
### Component status fields (all component sections)
Each component may carry:
| Field | Type | Description |
|-------|------|-------------|
| `status` | string | `OK`, `Warning`, `Critical`, `Unknown`, `Empty` |
| `status_checked_at` | RFC3339 | When status was last verified |
| `status_changed_at` | RFC3339 | When status last changed |
| `status_at_collection` | object | `{ "status": "...", "at": "..." }` — snapshot-time status |
| `status_history` | array | `[{ "status": "...", "changed_at": "...", "details": "..." }]` |
| `error_description` | string | Human-readable error for Warning/Critical |
### Board
```json
{
"board": {
"manufacturer": "Supermicro",
"product_name": "X12DPG-QT6",
"serial_number": "21D634101",
"part_number": "X12DPG-QT6-REV1.01",
"uuid": "d7ef2fe5-2fd0-11f0-910a-346f11040868"
}
}
```
`serial_number` required. `manufacturer` / `product_name` of `"NULL"` treated as absent.
### CPUs
```json
{
"socket": 0,
"model": "INTEL(R) XEON(R) GOLD 6530",
"cores": 32,
"threads": 64,
"frequency_mhz": 2100,
"max_frequency_mhz": 4000,
"manufacturer": "Intel",
"status": "OK"
}
```
`socket` (int) and `model` required. Serial generated: `{board_serial}-CPU-{socket}`.
LOT format: `CPU_{VENDOR}_{MODEL_NORMALIZED}` → e.g. `CPU_INTEL_XEON_GOLD_6530`
### Memory
```json
{
"slot": "CPU0_C0D0",
"location": "CPU0_C0D0",
"present": true,
"size_mb": 32768,
"type": "DDR5",
"max_speed_mhz": 4800,
"current_speed_mhz": 4800,
"manufacturer": "Hynix",
"serial_number": "80AD032419E17CEEC1",
"part_number": "HMCG88AGBRA191N",
"status": "OK"
}
```
`slot` and `present` required. `serial_number` required when `present=true`.
Empty slots (`present=false`, `status="Empty"`) are included but no component created.
LOT format: `DIMM_{TYPE}_{SIZE_GB}GB` → e.g. `DIMM_DDR5_32GB`
### Storage
```json
{
"slot": "OB01",
"type": "NVMe",
"model": "INTEL SSDPF2KX076T1",
"size_gb": 7680,
"serial_number": "BTAX41900GF87P6DGN",
"manufacturer": "Intel",
"firmware": "9CV10510",
"interface": "NVMe",
"present": true,
"status": "OK"
}
```
`slot`, `model`, `serial_number`, `present` required.
LOT format: `{TYPE}_{INTERFACE}_{SIZE_TB}TB` → e.g. `SSD_NVME_07.68TB`
### Power Supplies
```json
{
"slot": "0",
"present": true,
"model": "GW-CRPS3000LW",
"vendor": "Great Wall",
"wattage_w": 3000,
"serial_number": "2P06C102610",
"part_number": "V0310C9000000000",
"firmware": "00.03.05",
"status": "OK",
"input_power_w": 137,
"output_power_w": 104,
"input_voltage": 215.25
}
```
`slot`, `present` required. `serial_number` required when `present=true`.
Telemetry fields (`input_power_w`, `output_power_w`, `input_voltage`) stored in observation only.
LOT format: `PSU_{WATTAGE}W_{VENDOR_NORMALIZED}` → e.g. `PSU_3000W_GREAT_WALL`
### PCIe Devices
```json
{
"slot": "PCIeCard1",
"vendor_id": 32902,
"device_id": 2912,
"bdf": "0000:18:00.0",
"device_class": "MassStorageController",
"manufacturer": "Intel",
"model": "RAID Controller RSP3DD080F",
"link_width": 8,
"link_speed": "Gen3",
"max_link_width": 8,
"max_link_speed": "Gen3",
"serial_number": "RAID-001-12345",
"firmware": "50.9.1-4296",
"status": "OK"
}
```
`slot` required. Serial generated if absent: `{board_serial}-PCIE-{slot}`.
`device_class` values: `NetworkController`, `MassStorageController`, `DisplayController`, etc.
LOT format: `PCIE_{DEVICE_CLASS}_{MODEL_NORMALIZED}` → e.g. `PCIE_NETWORK_CONNECTX5`
### Firmware
```json
[
{ "device_name": "BIOS", "version": "06.08.05" },
{ "device_name": "BMC", "version": "5.17.00" }
]
```
Both fields required. Changes trigger `FIRMWARE_CHANGED` timeline events.
---
### Import process (Reanimator side)
1. Validate `collected_at` (RFC3339) and `hardware.board.serial_number`.
2. Find or create Asset by `board.serial_number``vendor_serial`.
3. For each component: filter `present=false`, auto-determine LOT, find or create Component,
create Observation, update Installations.
4. Detect removed components (present in previous snapshot, absent in current) → close Installation.
5. Generate timeline events: `LOG_COLLECTED`, `INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`.
**Idempotency:** Repeated import of the same snapshot (same content hash) returns `200 OK`
with `"duplicate": true` and does not create duplicate records.
### Reanimator API endpoint
```http
POST /ingest/hardware
Content-Type: application/json
```
**Success (201):**
```json
{
"status": "success",
"bundle_id": "lb_01J...",
"asset_id": "mach_01J...",
"collected_at": "2026-02-10T15:30:00Z",
"duplicate": false,
"summary": {
"parts_observed": 15,
"parts_created": 2,
"installations_created": 2,
"timeline_events_created": 9
}
}
```
**Duplicate (200):**
```json
{ "status": "success", "duplicate": true, "message": "LogBundle with this content hash already exists" }
```
**Error (400):**
```json
{ "status": "error", "error": "validation_failed", "details": { "field": "...", "message": "..." } }
```
Common `400` causes:
- Unknown JSON field (strict decoder)
- Wrong key name (e.g. `targetHost` instead of `target_host`)
- Invalid `collected_at` format (must be RFC3339)
- Empty `hardware.board.serial_number`
### LOT normalization rules
1. Remove special chars `( ) - ® ™`; replace spaces with `_`
2. Uppercase all
3. Collapse multiple underscores to one
4. Strip common prefixes like `MODEL:`, `PN:`
### Status values
| Value | Meaning | Action |
|-------|---------|--------|
| `OK` | Normal | — |
| `Warning` | Degraded | Create `COMPONENT_WARNING` event (optional) |
| `Critical` | Failed | Auto-create `failure_event`, create `COMPONENT_FAILED` event |
| `Unknown` | Not determinable | Treat as working |
| `Empty` | Slot unpopulated | No component created (memory/PCIe only) |
### Missing field handling
| Field | Fallback |
|-------|---------|
| CPU serial | Generated: `{board_serial}-CPU-{socket}` |
| PCIe serial | Generated: `{board_serial}-PCIE-{slot}` |
| Other serial | Component skipped if absent |
| manufacturer (PCIe) | Looked up from `vendor_id` (8086→Intel, 10de→NVIDIA, 15b3→Mellanox…) |
| status | Treated as `Unknown` |
| firmware | No `FIRMWARE_CHANGED` event |
Behavior:
- unsupported filenames are skipped
- each file is parsed independently
- one bad file must not fail the whole batch if at least one conversion succeeds
- result artifact is temporary and deleted after download

View File

@@ -4,86 +4,74 @@
Defined in `cmd/logpile/main.go`:
| Flag | Default | Description |
|------|---------|-------------|
| Flag | Default | Purpose |
|------|---------|---------|
| `--port` | `8082` | HTTP server port |
| `--file` | — | Reserved for archive preload (not active) |
| `--version` | | Print version and exit |
| `--no-browser` | | Do not open browser on start |
| `--hold-on-crash` | `true` on Windows | Keep console open on fatal crash for debugging |
| `--file` | empty | Preload archive file |
| `--version` | `false` | Print version and exit |
| `--no-browser` | `false` | Do not auto-open browser |
| `--hold-on-crash` | `true` on Windows | Keep console open after fatal crash |
## Build
## Common commands
```bash
# Local binary (current OS/arch)
make build
# Output: bin/logpile
# Cross-platform binaries
make build-all
# Output:
# bin/logpile-linux-amd64
# bin/logpile-linux-arm64
# bin/logpile-darwin-amd64
# bin/logpile-darwin-arm64
# bin/logpile-windows-amd64.exe
```
Both `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort`
before compilation to sync `pci.ids` from the submodule.
To skip PCI IDs update:
```bash
SKIP_PCI_IDS_UPDATE=1 make build
```
Build flags: `CGO_ENABLED=0` — fully static binary, no C runtime dependency.
## PCI IDs submodule
Source: `third_party/pciids` (git submodule → `github.com/pciutils/pciids`)
Local copy embedded at build time: `internal/parser/vendors/pciids/pci.ids`
```bash
# Manual update
make test
make fmt
make update-pci-ids
```
# Init submodule after fresh clone
Notes:
- `make build` outputs `bin/logpile`
- `make build-all` builds the supported cross-platform binaries
- `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort` unless `SKIP_PCI_IDS_UPDATE=1`
## PCI IDs
Source submodule: `third_party/pciids`
Embedded copy: `internal/parser/vendors/pciids/pci.ids`
Typical setup after clone:
```bash
git submodule update --init third_party/pciids
```
## Release process
## Release script
Run:
```bash
scripts/release.sh
./scripts/release.sh
```
What it does:
Current behavior:
1. Reads version from `git describe --tags`
2. Validates clean working tree (override: `ALLOW_DIRTY=1`)
3. Sets stable `GOPATH` / `GOCACHE` / `GOTOOLCHAIN` env
4. Creates `releases/{VERSION}/` directory
5. Generates `RELEASE_NOTES.md` template if not present
6. Builds `darwin-arm64` and `windows-amd64` binaries
7. Packages all binaries found in `bin/` as `.tar.gz` / `.zip`
2. Refuses a dirty tree unless `ALLOW_DIRTY=1`
3. Sets stable Go cache/toolchain environment
4. Creates `releases/{VERSION}/`
5. Creates a release-notes template if missing
6. Builds `darwin-arm64` and `windows-amd64`
7. Packages any already-present binaries from `bin/`
8. Generates `SHA256SUMS.txt`
9. Prints next steps (tag, push, create release manually)
Release notes template is created in `releases/{VERSION}/RELEASE_NOTES.md`.
Important limitation:
- `scripts/release.sh` does not run `make build-all` for you
- if you want Linux or additional macOS archives in the release directory, build them before running the script
## Running
## Run locally
```bash
./bin/logpile
./bin/logpile --port 9090
./bin/logpile --no-browser
./bin/logpile --version
./bin/logpile --hold-on-crash # keep console open on crash (default on Windows)
```
## macOS Gatekeeper
After downloading a binary, remove the quarantine attribute:
```bash
xattr -d com.apple.quarantine /path/to/logpile-darwin-arm64
```

View File

@@ -1,134 +1,54 @@
# 09 — Testing
## Required before merge
## Baseline
Required before merge:
```bash
go test ./...
```
All tests must pass before any change is merged.
## Test locations
## Where to add tests
| Change area | Test location |
|-------------|---------------|
| Collectors | `internal/collector/*_test.go` |
| HTTP handlers | `internal/server/*_test.go` |
| Area | Location |
|------|----------|
| Collectors and replay | `internal/collector/*_test.go` |
| HTTP handlers and jobs | `internal/server/*_test.go` |
| Exporters | `internal/exporter/*_test.go` |
| Parsers | `internal/parser/vendors/<vendor>/*_test.go` |
| Vendor parsers | `internal/parser/vendors/<vendor>/*_test.go` |
## Exporter tests
## General rules
The Reanimator exporter has comprehensive coverage:
- Prefer table-driven tests
- No network access in unit tests
- Cover happy path and realistic failure/partial-data cases
- New vendor parsers need both detection and parse coverage
| Test file | Coverage |
|-----------|----------|
| `reanimator_converter_test.go` | Unit tests per conversion function |
| `reanimator_integration_test.go` | Full export with realistic `AnalysisResult` |
## Mandatory coverage for dedup/filter/classify logic
Any new deduplication, filtering, or classification function must have:
1. A true-positive case
2. A true-negative case
3. A regression case for the vendor or topology that motivated the change
This is mandatory for inventory logic, firmware filtering, and similar code paths where silent data drift is likely.
## Mandatory coverage for expensive path selection
Any function that decides whether to crawl or probe an expensive path must have:
1. A positive selection case
2. A negative exclusion case
3. A topology-level count/integration case
The goal is to catch runaway I/O regressions before they ship.
## Useful focused commands
Run exporter tests only:
```bash
go test ./internal/exporter/...
go test ./internal/exporter/... -v -run Reanimator
go test ./internal/exporter/... -cover
```
## Guidelines
- Prefer table-driven tests for parsing logic (multiple input variants).
- Do not rely on network access in unit tests.
- Test both the happy path and edge cases (missing fields, empty collections).
- When adding a new vendor parser, include at minimum:
- `Detect()` test with a positive and a negative sample file list.
- `Parse()` test with a minimal but representative archive.
## Dedup and filtering functions — mandatory coverage
Any function that deduplicates, filters, or classifies hardware inventory items
**must** have tests covering all three axes before the code is considered done:
| Axis | What to test | Why |
|------|-------------|-----|
| **True positive** | Items that ARE duplicates are collapsed to one | Proves the function works |
| **True negative** | Items that are NOT duplicates are kept separate | Proves the function doesn't over-collapse |
| **Counter-case** | The scenario that motivated the original code still works after changes | Prevents regression from future fixes |
### Worked example — GPU dedup regression (2026-03-11)
`collectGPUsFromProcessors` was added for MSI (chassis Id matches processor Id).
No tests → when Supermicro HGX arrived (chassis Id = "HGX_GPU_SXM_1", processor Id = "GPU_SXM_1"),
the chassis lookup silently returned nothing, serial stayed empty, UUID was new → 8 duplicate GPUs.
Simultaneously, fixing `gpuDocDedupKey` to use `slot|model` before path collapsed two distinct
GraphicsControllers GPUs with the same model into one — breaking an existing test that had no
counter-case for the path-fallback scenario.
**Required test matrix for any dedup function:**
```
TestXxx_CollapsesDuplicates — same item via two sources → 1 result
TestXxx_KeepsDistinct — two different items with same model → 2 results
TestXxx_<VendorThatMotivated> — the specific vendor/setup that triggered the code
```
### Worked example — firmware filter regression (2026-03-12)
`collectFirmwareInventory` was added in `6c19a58` without coverage for Supermicro naming.
`isDeviceBoundFirmwareName` had patterns for Dell-style names (`"GPU SomeDevice"`, `"NIC OnboardLAN"`)
but Supermicro Redfish uses `"GPU1 System Slot0"` and `"NIC1 System Slot0 ..."` — digit follows
immediately after the type prefix. 29 device-bound entries leaked into `hardware.firmware`.
`9c5512d` attempted to fix this with HGX ID patterns (`_fw_gpu_`, etc.) in the wrong field:
the filter checked `DeviceName` but `collectFirmwareInventory` populates it from `Name` first
(`"Software Inventory"` for all HGX per-component slots), not from the `Id` field that contains
the firmware ID like `"HGX_FW_GPU_SXM_1"`. The patterns were effectively dead code from day one.
**Required test matrix for any filter function:**
```
TestXxx_FiltersDeviceBound_Dell — Dell-style names that motivated the original code
TestXxx_FiltersDeviceBound_Supermicro — Supermicro names with digit suffix (GPU1/NIC1)
TestXxx_KeepsSystemLevel — BIOS, BMC, CPLD names must NOT be filtered
```
### Practical rule
When you write a new filter/dedup/classify function, ask:
1. Does my test cover the vendor that motivated this code?
2. Does my test cover a *different* vendor or naming convention where the function must NOT fire?
3. If I change the dedup key logic, do existing tests still exercise the old correct behavior?
4. When the filter checks a field on a model struct, does my test verify that the field is
actually populated by the collector? (Dead-code filter pattern: `9c5512d` `_fw_gpu_` check.)
If any answer is "no" — add the missing test before committing.
## Collector candidate-selection functions — mandatory coverage
Any function that selects paths for an expensive operation (probing, crawling, plan-B retry)
**must** have tests covering:
| Axis | What to test | Why |
|------|-------------|-----|
| **Positive** | Paths that should be selected ARE selected | Proves the feature works |
| **Negative** | Paths that should be excluded ARE excluded | Prevents runaway I/O |
| **Topology integration** | Given a realistic `out` map, the count of selected paths matches expectations | Catches implicit coupling between the selector and the surrounding data shape |
### Worked example — NVMe post-probe regression (2026-03-12)
`shouldAdaptiveNVMeProbe` was added in `2fa4a12` for Supermicro NVMe backplanes that return
`Members: []` but serve disks at `Disk.Bay.N` paths. No topology-level test was added.
When SYS-A21GE-NBRT (HGX B200) arrived, its 35 sub-chassis (GPU, NVSwitch, PCIeRetimer,
ERoT, IRoT, BMC, FPGA) all have `ChassisType=Module/Component/Zone` and empty `/Drives`
all 35 passed the filter → 35 × 384 = 13 440 HTTP requests → 22 min extra per collection.
A topology integration test (`TestNVMePostProbeSkipsNonStorageChassis`) would have caught
this at commit time: given GPU chassis + backplane, exactly 1 candidate must be selected.
**Required test matrix for any path-selection function:**
```
TestXxx_SelectsTargetPath — the path that motivated the code IS selected
TestXxx_SkipsIrrelevantPath — a path that must never be selected IS skipped
TestXxx_TopologyCount — given a realistic multi-chassis map, selected count = N
go test ./internal/collector/...
go test ./internal/server/...
go test ./internal/parser/vendors/...
```

View File

@@ -1,59 +1,41 @@
# LOGPile Bible
> **Documentation language:** English only. All maintained project documentation must be written in English.
>
> **Architectural decisions:** Every significant architectural decision **must** be recorded in
> [`10-decisions.md`](10-decisions.md) before or alongside the code change.
>
> **Single source of truth:** Architecture and technical design documentation belongs in `docs/bible/`.
> Keep `README.md` and `CLAUDE.md` minimal to avoid duplicate documentation.
`bible-local/` is the project-specific source of truth for LOGPile.
Keep top-level docs minimal and put maintained architecture/API contracts here.
This directory is the single source of truth for LOGPile's architecture, design, and integration contracts.
It is structured so that both humans and AI assistants can navigate it quickly.
## Rules
---
- Documentation language: English only
- Update relevant bible files in the same change as the code
- Record significant architectural decisions in [`10-decisions.md`](10-decisions.md)
- Do not duplicate shared rules from `bible/`
## Reading Map (Hierarchical)
## Read order
### 1. Foundations (read first)
| File | Purpose |
|------|---------|
| [01-overview.md](01-overview.md) | Product scope, modes, non-goals |
| [02-architecture.md](02-architecture.md) | Runtime structure, state, main flows |
| [04-data-models.md](04-data-models.md) | Stable data contracts and canonical inventory |
| [03-api.md](03-api.md) | HTTP endpoints and response contracts |
| [05-collectors.md](05-collectors.md) | Live collection behavior |
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor coverage |
| [07-exporters.md](07-exporters.md) | Raw export, Reanimator export, batch convert |
| [08-build-release.md](08-build-release.md) | Build and release workflow |
| [09-testing.md](09-testing.md) | Test expectations and regression rules |
| [10-decisions.md](10-decisions.md) | Architectural Decision Log |
| File | What it covers |
|------|----------------|
| [01-overview.md](01-overview.md) | Product purpose, operating modes, scope |
| [02-architecture.md](02-architecture.md) | Runtime structure, control flow, in-memory state |
| [04-data-models.md](04-data-models.md) | Core contracts (`AnalysisResult`, canonical `hardware.devices`) |
## Fast orientation
### 2. Runtime Interfaces
| File | What it covers |
|------|----------------|
| [03-api.md](03-api.md) | HTTP API contracts and endpoint behavior |
| [05-collectors.md](05-collectors.md) | Live collection connectors (Redfish, IPMI mock) |
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor parsers |
| [07-exporters.md](07-exporters.md) | CSV / JSON / Reanimator exports and integration mapping |
### 3. Delivery & Quality
| File | What it covers |
|------|----------------|
| [08-build-release.md](08-build-release.md) | Build, packaging, release workflow |
| [09-testing.md](09-testing.md) | Testing expectations and verification guidance |
### 4. Governance (always current)
| File | What it covers |
|------|----------------|
| [10-decisions.md](10-decisions.md) | Architectural Decision Log (ADL) |
---
## Quick orientation for AI assistants
- Read order for most changes: `01``02``04` → relevant interface doc(s) → `10`
- Entry point: `cmd/logpile/main.go`
- HTTP server: `internal/server/` — handlers in `handlers.go`, routes in `server.go`
- Data contracts: `internal/models/` — never break `AnalysisResult` JSON shape
- Frontend contract: `web/static/js/app.js` — keep API responses stable
- Canonical inventory: `hardware.devices` in `AnalysisResult` — source of truth for UI and exports
- Parser registry: `internal/parser/vendors/``init()` auto-registration pattern
- Collector registry: `internal/collector/registry.go`
- HTTP layer: `internal/server/`
- Core contracts: `internal/models/models.go`
- Live collection: `internal/collector/`
- Archive parsing: `internal/parser/`
- Export conversion: `internal/exporter/`
- Frontend consumer: `web/static/js/app.js`
## Maintenance rule
If a document becomes stale, either fix it immediately or delete it.
Stale docs are worse than missing docs.