docs: refresh project documentation
This commit is contained in:
@@ -1,35 +1,43 @@
|
||||
# 01 — Overview
|
||||
|
||||
## What is LOGPile?
|
||||
## Purpose
|
||||
|
||||
LOGPile is a standalone Go application for BMC (Baseboard Management Controller)
|
||||
diagnostics analysis with an embedded web UI.
|
||||
It runs as a single binary with no external file dependencies.
|
||||
LOGPile is a standalone Go application for BMC diagnostics analysis with an embedded web UI.
|
||||
It runs as a single binary and normalizes hardware data from archives or live Redfish collection.
|
||||
|
||||
## Operating modes
|
||||
|
||||
| Mode | Entry point | Description |
|
||||
|------|-------------|-------------|
|
||||
| **Offline / archive** | `POST /api/upload` | Upload a vendor diagnostic archive or a JSON snapshot; parse and display in UI |
|
||||
| **Live / Redfish** | `POST /api/collect` | Connect to a live BMC via Redfish API, collect hardware inventory, display and export |
|
||||
| Mode | Entry point | Outcome |
|
||||
|------|-------------|---------|
|
||||
| Archive upload | `POST /api/upload` | Parse a supported archive, raw export bundle, or JSON snapshot into `AnalysisResult` |
|
||||
| Live collection | `POST /api/collect` | Collect from a live BMC via Redfish and store the result in memory |
|
||||
| Batch convert | `POST /api/convert` | Convert multiple supported input files into Reanimator JSON in a ZIP artifact |
|
||||
|
||||
Both modes produce the same in-memory `AnalysisResult` structure and expose it
|
||||
through the same API and UI.
|
||||
All modes converge on the same normalized hardware model and exporter pipeline.
|
||||
|
||||
## Key capabilities
|
||||
## In scope
|
||||
|
||||
- Single self-contained binary with embedded HTML/JS/CSS (no static file serving required).
|
||||
- Vendor archive parsing: Inspur/Kaytus, Dell TSR, NVIDIA HGX Field Diagnostics,
|
||||
NVIDIA Bug Report, Unraid, XigmaNAS, Generic text fallback.
|
||||
- Live Redfish collection with async progress tracking.
|
||||
- Normalized hardware inventory: CPU / RAM / Storage / GPU / PSU / NIC / PCIe / Firmware.
|
||||
- Raw `redfish_tree` snapshot stored in `RawPayloads` for future offline re-analysis.
|
||||
- Re-upload of a JSON snapshot for offline work (`/api/upload` accepts `AnalysisResult` JSON).
|
||||
- Export in CSV, JSON (full `AnalysisResult`), and Reanimator format.
|
||||
- PCI device model resolution via embedded `pci.ids` (no hardcoded model strings).
|
||||
- Single-binary desktop/server utility with embedded UI
|
||||
- Vendor archive parsing and live Redfish collection
|
||||
- Canonical hardware inventory across UI and exports
|
||||
- Reopenable raw export bundles for future re-analysis
|
||||
- Reanimator export and batch conversion workflows
|
||||
- Embedded `pci.ids` lookup for vendor/device name enrichment
|
||||
|
||||
## Non-goals (current scope)
|
||||
## Current vendor coverage
|
||||
|
||||
- No persistent storage — all state is in-memory per process lifetime.
|
||||
- IPMI collector is a mock scaffold only; real IPMI support is not implemented.
|
||||
- No authentication layer on the HTTP server.
|
||||
- Dell TSR
|
||||
- H3C SDS G5/G6
|
||||
- Inspur / Kaytus
|
||||
- NVIDIA HGX Field Diagnostics
|
||||
- NVIDIA Bug Report
|
||||
- Unraid
|
||||
- XigmaNAS
|
||||
- Generic fallback parser
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Persistent storage or multi-user state
|
||||
- Production IPMI collection
|
||||
- Authentication/authorization on the built-in HTTP server
|
||||
- Long-term server-side job history beyond in-memory process lifetime
|
||||
|
||||
@@ -2,114 +2,85 @@
|
||||
|
||||
## Runtime stack
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|------------|
|
||||
| Layer | Implementation |
|
||||
|-------|----------------|
|
||||
| Language | Go 1.22+ |
|
||||
| HTTP | `net/http`, `http.ServeMux` |
|
||||
| UI | Embedded via `//go:embed` in `web/embed.go` (templates + static assets) |
|
||||
| State | In-memory only — no database |
|
||||
| Build | `CGO_ENABLED=0`, single static binary |
|
||||
| HTTP | `net/http` + `http.ServeMux` |
|
||||
| UI | Embedded templates and static assets via `go:embed` |
|
||||
| State | In-memory only |
|
||||
| Build | `CGO_ENABLED=0`, single binary |
|
||||
|
||||
Default port: **8082**
|
||||
Default port: `8082`
|
||||
|
||||
## Directory structure
|
||||
## Code map
|
||||
|
||||
```
|
||||
cmd/logpile/main.go # Binary entry point, CLI flag parsing
|
||||
internal/
|
||||
collector/ # Live data collectors
|
||||
registry.go # Collector registration
|
||||
redfish.go # Redfish connector (real implementation)
|
||||
ipmi_mock.go # IPMI mock connector (scaffold)
|
||||
types.go # Connector request/progress contracts
|
||||
parser/ # Archive parsers
|
||||
parser.go # BMCParser (dispatcher) + parse orchestration
|
||||
archive.go # Archive extraction helpers
|
||||
registry.go # Parser registry + detect/selection
|
||||
interface.go # VendorParser interface
|
||||
vendors/ # Vendor-specific parser modules
|
||||
vendors.go # Import-side-effect registrations
|
||||
dell/
|
||||
inspur/
|
||||
nvidia/
|
||||
nvidia_bug_report/
|
||||
unraid/
|
||||
xigmanas/
|
||||
generic/
|
||||
pciids/ # PCI IDs lookup (embedded pci.ids)
|
||||
server/ # HTTP layer
|
||||
server.go # Server struct, route registration
|
||||
handlers.go # All HTTP handler functions
|
||||
exporter/ # Export formatters
|
||||
exporter.go # CSV + JSON exporters
|
||||
reanimator_models.go
|
||||
reanimator_converter.go
|
||||
models/ # Shared data contracts
|
||||
web/
|
||||
embed.go # go:embed directive
|
||||
templates/ # HTML templates
|
||||
static/ # JS / CSS
|
||||
js/app.js # Frontend — API contract consumer
|
||||
```text
|
||||
cmd/logpile/main.go entrypoint and CLI flags
|
||||
internal/server/ HTTP handlers, jobs, upload/export flows
|
||||
internal/collector/ live collection and Redfish replay
|
||||
internal/analyzer/ shared analysis helpers
|
||||
internal/parser/ archive extraction and parser dispatch
|
||||
internal/exporter/ CSV and Reanimator conversion
|
||||
internal/models/ stable data contracts
|
||||
web/ embedded UI assets
|
||||
```
|
||||
|
||||
## In-memory state
|
||||
## Server state
|
||||
|
||||
The `Server` struct in `internal/server/server.go` holds:
|
||||
`internal/server.Server` stores:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `result` | `*models.AnalysisResult` | Current parsed/collected dataset |
|
||||
| `detectedVendor` | `string` | Vendor identifier from last parse |
|
||||
| `jobManager` | `*JobManager` | Tracks live collect job status/logs |
|
||||
| `collectors` | `*collector.Registry` | Registered live collection connectors |
|
||||
| Field | Purpose |
|
||||
|------|---------|
|
||||
| `result` | Current `AnalysisResult` shown in UI and used by exports |
|
||||
| `detectedVendor` | Parser/collector identity for the current dataset |
|
||||
| `rawExport` | Reopenable raw-export package associated with current result |
|
||||
| `jobManager` | Shared async job state for collect and convert flows |
|
||||
| `collectors` | Registered live collectors (`redfish`, `ipmi`) |
|
||||
| `convertOutput` | Temporary ZIP artifacts for batch convert downloads |
|
||||
|
||||
State is replaced atomically on successful upload or collect.
|
||||
On a failed/canceled collect, the previous `result` is preserved unchanged.
|
||||
State is replaced only on successful upload or successful live collection.
|
||||
Failed or canceled jobs do not overwrite the previous dataset.
|
||||
|
||||
## Upload flow (`POST /api/upload`)
|
||||
## Main flows
|
||||
|
||||
```
|
||||
multipart form field: "archive"
|
||||
│
|
||||
├─ file looks like JSON?
|
||||
│ └─ parse as models.AnalysisResult snapshot → store in Server.result
|
||||
│
|
||||
└─ otherwise
|
||||
└─ parser.NewBMCParser().ParseFromReader(...)
|
||||
│
|
||||
├─ try all registered vendor parsers (highest confidence wins)
|
||||
└─ result → store in Server.result
|
||||
```
|
||||
### Upload
|
||||
|
||||
## Live collect flow (`POST /api/collect`)
|
||||
1. `POST /api/upload` receives multipart field `archive`
|
||||
2. JSON inputs are checked for raw-export package or `AnalysisResult` snapshot
|
||||
3. Non-JSON inputs go through `parser.BMCParser`
|
||||
4. Archive metadata is normalized onto `AnalysisResult`
|
||||
5. Result becomes the current in-memory dataset
|
||||
|
||||
```
|
||||
validate request (host / protocol / port / username / auth_type / tls_mode)
|
||||
│
|
||||
└─ launch async job
|
||||
│
|
||||
├─ progress callback → job log (queryable via GET /api/collect/{id})
|
||||
│
|
||||
├─ success:
|
||||
│ set source metadata (source_type=api, protocol, host, date)
|
||||
│ store result in Server.result
|
||||
│
|
||||
└─ failure / cancel:
|
||||
previous Server.result unchanged
|
||||
```
|
||||
### Live collect
|
||||
|
||||
Job lifecycle states: `queued → running → success | failed | canceled`
|
||||
1. `POST /api/collect` validates request fields
|
||||
2. Server creates an async job and returns `202 Accepted`
|
||||
3. Selected collector gathers raw data
|
||||
4. For Redfish, collector saves `raw_payloads.redfish_tree`
|
||||
5. Result is normalized, source metadata applied, and state replaced on success
|
||||
|
||||
### Batch convert
|
||||
|
||||
1. `POST /api/convert` accepts multiple files
|
||||
2. Each supported file is analyzed independently
|
||||
3. Successful results are converted to Reanimator JSON
|
||||
4. Outputs are packaged into a temporary ZIP artifact
|
||||
5. Client polls job status and downloads the artifact when ready
|
||||
|
||||
## Redfish design rule
|
||||
|
||||
Live Redfish collection and offline Redfish re-analysis must use the same replay path.
|
||||
The collector first captures `raw_payloads.redfish_tree`, then the replay logic builds the normalized result.
|
||||
|
||||
## PCI IDs lookup
|
||||
|
||||
Load/override order (`LOGPILE_PCI_IDS_PATH` has highest priority because it is loaded last):
|
||||
Lookup order:
|
||||
|
||||
1. Embedded `internal/parser/vendors/pciids/pci.ids` (base dataset compiled into binary)
|
||||
1. Embedded `internal/parser/vendors/pciids/pci.ids`
|
||||
2. `./pci.ids`
|
||||
3. `/usr/share/hwdata/pci.ids`
|
||||
4. `/usr/share/misc/pci.ids`
|
||||
5. `/opt/homebrew/share/pciids/pci.ids`
|
||||
6. Paths from `LOGPILE_PCI_IDS_PATH` (colon-separated on Unix; later loaded, override same IDs)
|
||||
6. Extra paths from `LOGPILE_PCI_IDS_PATH`
|
||||
|
||||
This means unknown GPU/NIC model strings can be updated by refreshing `pci.ids`
|
||||
without any code change.
|
||||
Later sources override earlier ones for the same IDs.
|
||||
|
||||
@@ -2,38 +2,37 @@
|
||||
|
||||
## Conventions
|
||||
|
||||
- All endpoints under `/api/`.
|
||||
- Request bodies: `application/json` or `multipart/form-data` where noted.
|
||||
- Responses: `application/json` unless file download.
|
||||
- Export filenames follow pattern: `YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
|
||||
- All endpoints are under `/api/`
|
||||
- JSON responses are used unless the endpoint downloads a file
|
||||
- Async jobs share the same status model: `queued`, `running`, `success`, `failed`, `canceled`
|
||||
- Export filenames use `YYYY-MM-DD (MODEL) - SERIAL.<ext>` when board metadata exists
|
||||
|
||||
---
|
||||
|
||||
## Upload & Data Input
|
||||
## Input endpoints
|
||||
|
||||
### `POST /api/upload`
|
||||
|
||||
Upload a vendor diagnostic archive or a JSON snapshot.
|
||||
|
||||
**Request:** `multipart/form-data`, field name `archive`.
|
||||
Server-side multipart limit: **100 MiB**.
|
||||
Uploads one file in multipart field `archive`.
|
||||
|
||||
Accepted inputs:
|
||||
- `.tar`, `.tar.gz`, `.tgz` — vendor diagnostic archives
|
||||
- `.txt` — plain text files
|
||||
- JSON file containing a serialized `AnalysisResult` — re-loaded as-is
|
||||
- supported archive/log formats from the parser registry
|
||||
- `.json` `AnalysisResult` snapshots
|
||||
- raw-export JSON packages
|
||||
- raw-export ZIP bundles
|
||||
|
||||
**Response:** `200 OK` with parsed result summary, or `4xx`/`5xx` on error.
|
||||
Result:
|
||||
- parses or replays the input
|
||||
- stores the result as current in-memory state
|
||||
- returns parsed summary JSON
|
||||
|
||||
---
|
||||
|
||||
## Live Collection
|
||||
Related helper:
|
||||
- `GET /api/file-types` returns `archive_extensions`, `upload_extensions`, and `convert_extensions`
|
||||
|
||||
### `POST /api/collect`
|
||||
|
||||
Start a live collection job (`redfish` or `ipmi`).
|
||||
Starts a live collection job.
|
||||
|
||||
Request body:
|
||||
|
||||
**Request body:**
|
||||
```json
|
||||
{
|
||||
"host": "bmc01.example.local",
|
||||
@@ -47,138 +46,125 @@ Start a live collection job (`redfish` or `ipmi`).
|
||||
```
|
||||
|
||||
Supported values:
|
||||
- `protocol`: `redfish` | `ipmi`
|
||||
- `auth_type`: `password` | `token`
|
||||
- `tls_mode`: `strict` | `insecure`
|
||||
- `protocol`: `redfish` or `ipmi`
|
||||
- `auth_type`: `password` or `token`
|
||||
- `tls_mode`: `strict` or `insecure`
|
||||
|
||||
**Response:** `202 Accepted`
|
||||
```json
|
||||
{
|
||||
"job_id": "job_a1b2c3d4e5f6",
|
||||
"status": "queued",
|
||||
"message": "Collection job accepted",
|
||||
"created_at": "2026-02-23T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
Validation behavior:
|
||||
- `400 Bad Request` for invalid JSON
|
||||
- `422 Unprocessable Entity` for semantic validation errors (missing/invalid fields)
|
||||
Responses:
|
||||
- `202` on accepted job creation
|
||||
- `400` on malformed JSON
|
||||
- `422` on validation errors
|
||||
|
||||
### `GET /api/collect/{id}`
|
||||
|
||||
Poll job status and progress log.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"job_id": "job_a1b2c3d4e5f6",
|
||||
"status": "running",
|
||||
"progress": 55,
|
||||
"logs": ["..."],
|
||||
"created_at": "2026-02-23T12:00:00Z",
|
||||
"updated_at": "2026-02-23T12:00:10Z"
|
||||
}
|
||||
```
|
||||
|
||||
Status values: `queued` | `running` | `success` | `failed` | `canceled`
|
||||
Returns async collection job status, progress, timestamps, and accumulated logs.
|
||||
|
||||
### `POST /api/collect/{id}/cancel`
|
||||
|
||||
Cancel a running job.
|
||||
Requests cancellation for a running collection job.
|
||||
|
||||
---
|
||||
### `POST /api/convert`
|
||||
|
||||
## Data Queries
|
||||
Starts a batch conversion job that accepts multiple files under `files[]` or `files`.
|
||||
Each supported file is parsed independently and converted to Reanimator JSON.
|
||||
|
||||
Response fields:
|
||||
- `job_id`
|
||||
- `status`
|
||||
- `accepted`
|
||||
- `skipped`
|
||||
- `total_files`
|
||||
|
||||
### `GET /api/convert/{id}`
|
||||
|
||||
Returns batch convert job status using the same async job envelope as collection.
|
||||
|
||||
### `GET /api/convert/{id}/download`
|
||||
|
||||
Downloads the ZIP artifact produced by a successful convert job.
|
||||
|
||||
## Read endpoints
|
||||
|
||||
### `GET /api/status`
|
||||
|
||||
Returns source metadata for the current dataset.
|
||||
If nothing is loaded, response is `{ "loaded": false }`.
|
||||
|
||||
```json
|
||||
{
|
||||
"loaded": true,
|
||||
"filename": "redfish://bmc01.example.local",
|
||||
"vendor": "redfish",
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "bmc01.example.local",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"stats": { "events": 0, "sensors": 0, "fru": 0 }
|
||||
}
|
||||
```
|
||||
|
||||
`source_type`: `archive` | `api`
|
||||
|
||||
When no dataset is loaded, response is `{ "loaded": false }`.
|
||||
Typical fields:
|
||||
- `loaded`
|
||||
- `filename`
|
||||
- `vendor`
|
||||
- `source_type`
|
||||
- `protocol`
|
||||
- `target_host`
|
||||
- `source_timezone`
|
||||
- `collected_at`
|
||||
- `stats`
|
||||
|
||||
### `GET /api/config`
|
||||
|
||||
Returns source metadata plus:
|
||||
Returns the main UI configuration payload, including:
|
||||
- source metadata
|
||||
- `hardware.board`
|
||||
- `hardware.firmware`
|
||||
- canonical `hardware.devices`
|
||||
- computed `specification` summary lines
|
||||
- computed specification lines
|
||||
|
||||
### `GET /api/events`
|
||||
|
||||
Returns parsed diagnostic events.
|
||||
Returns events sorted newest first.
|
||||
|
||||
### `GET /api/sensors`
|
||||
|
||||
Returns sensor readings (temperatures, voltages, fan speeds).
|
||||
Returns parsed sensors plus synthesized PSU voltage sensors when telemetry is available.
|
||||
|
||||
### `GET /api/serials`
|
||||
|
||||
Returns serial numbers built from canonical `hardware.devices`.
|
||||
Returns serial-oriented inventory built from canonical devices.
|
||||
|
||||
### `GET /api/firmware`
|
||||
|
||||
Returns firmware versions built from canonical `hardware.devices`.
|
||||
Returns firmware-oriented inventory built from canonical devices.
|
||||
|
||||
### `GET /api/parse-errors`
|
||||
|
||||
Returns normalized parse and collection issues combined from:
|
||||
- Redfish fetch errors in `raw_payloads`
|
||||
- raw-export collect logs
|
||||
- derived partial-inventory warnings
|
||||
|
||||
### `GET /api/parsers`
|
||||
|
||||
Returns list of registered vendor parsers with their identifiers.
|
||||
Returns registered parser metadata.
|
||||
|
||||
---
|
||||
### `GET /api/file-types`
|
||||
|
||||
## Export
|
||||
Returns supported file extensions for upload and batch convert.
|
||||
|
||||
## Export endpoints
|
||||
|
||||
### `GET /api/export/csv`
|
||||
|
||||
Download serial numbers as CSV.
|
||||
Downloads serial-number CSV.
|
||||
|
||||
### `GET /api/export/json`
|
||||
|
||||
Download full `AnalysisResult` as JSON (includes `raw_payloads`).
|
||||
Downloads a raw-export artifact for reopen and re-analysis.
|
||||
Current implementation emits a ZIP bundle containing:
|
||||
- `raw_export.json`
|
||||
- `collect.log`
|
||||
- `parser_fields.json`
|
||||
|
||||
### `GET /api/export/reanimator`
|
||||
|
||||
Download hardware data in Reanimator format for asset tracking integration.
|
||||
See [`07-exporters.md`](07-exporters.md) for full format spec.
|
||||
Downloads Reanimator JSON built from the current normalized result.
|
||||
|
||||
---
|
||||
|
||||
## Management
|
||||
## Management endpoints
|
||||
|
||||
### `DELETE /api/clear`
|
||||
|
||||
Clear current in-memory dataset.
|
||||
Clears current in-memory dataset, raw export state, and temporary convert artifacts.
|
||||
|
||||
### `POST /api/shutdown`
|
||||
|
||||
Gracefully shut down the server process.
|
||||
This endpoint terminates the current process after responding.
|
||||
|
||||
---
|
||||
|
||||
## Source metadata fields
|
||||
|
||||
Fields present in `/api/status` and `/api/config`:
|
||||
|
||||
| Field | Values |
|
||||
|-------|--------|
|
||||
| `source_type` | `archive` \| `api` |
|
||||
| `protocol` | `redfish` \| `ipmi` (may be empty for archive uploads) |
|
||||
| `target_host` | IP or hostname |
|
||||
| `collected_at` | RFC3339 timestamp |
|
||||
Gracefully shuts down the process after responding.
|
||||
|
||||
@@ -1,104 +1,87 @@
|
||||
# 04 — Data Models
|
||||
|
||||
## AnalysisResult
|
||||
## Core contract: `AnalysisResult`
|
||||
|
||||
`internal/models/` — the central data contract shared by parsers, collectors, exporters, and the HTTP layer.
|
||||
`internal/models/models.go` defines the shared result passed between parsers, collectors, server handlers, and exporters.
|
||||
|
||||
**Stability rule:** Never break the JSON shape of `AnalysisResult`.
|
||||
Backward-compatible additions are allowed; removals or renames are not.
|
||||
Stability rule:
|
||||
- do not rename or remove JSON fields from `AnalysisResult`
|
||||
- additive fields are allowed
|
||||
- UI and exporter compatibility depends on this shape remaining stable
|
||||
|
||||
Key top-level fields:
|
||||
Key fields:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `filename` | `string` | Uploaded filename or generated live source identifier |
|
||||
| `source_type` | `string` | `archive` or `api` |
|
||||
| `protocol` | `string` | `redfish`, `ipmi`, or empty for archive uploads |
|
||||
| `target_host` | `string` | BMC host for live collection |
|
||||
| `collected_at` | `time.Time` | Upload/collection timestamp |
|
||||
| `hardware` | `*HardwareConfig` | All normalized hardware inventory |
|
||||
| `events` | `[]Event` | Diagnostic events from parsers |
|
||||
| `fru` | `[]FRUInfo` | FRU/SDR-derived inventory details |
|
||||
| `sensors` | `[]SensorReading` | Sensor readings |
|
||||
| `raw_payloads` | `map[string]any` | Raw vendor data (e.g. `redfish_tree`) |
|
||||
| Field | Meaning |
|
||||
|------|---------|
|
||||
| `filename` | Original upload name or synthesized live source name |
|
||||
| `source_type` | `archive` or `api` |
|
||||
| `protocol` | `redfish`, `ipmi`, or empty for archive uploads |
|
||||
| `target_host` | Hostname or IP for live collection |
|
||||
| `source_timezone` | Source timezone/offset if known |
|
||||
| `collected_at` | Canonical collection/upload time |
|
||||
| `raw_payloads` | Raw source data used for replay or diagnostics |
|
||||
| `events` | Parsed event timeline |
|
||||
| `fru` | FRU-derived inventory details |
|
||||
| `sensors` | Sensor readings |
|
||||
| `hardware` | Normalized hardware inventory |
|
||||
|
||||
`raw_payloads` is the durable source for offline re-analysis (especially for Redfish).
|
||||
Normalized fields should be treated as derivable output from raw source data.
|
||||
## `HardwareConfig`
|
||||
|
||||
### Hardware sub-structure
|
||||
Main sections:
|
||||
|
||||
```
|
||||
HardwareConfig
|
||||
├── board BoardInfo — server/motherboard identity
|
||||
├── devices []HardwareDevice — CANONICAL INVENTORY (see below)
|
||||
├── cpus []CPU
|
||||
├── memory []MemoryDIMM
|
||||
├── storage []Storage
|
||||
├── volumes []StorageVolume — logical RAID/VROC volumes
|
||||
├── pcie_devices []PCIeDevice
|
||||
├── gpus []GPU
|
||||
├── network_adapters []NetworkAdapter
|
||||
├── network_cards []NIC (legacy/alternate source field)
|
||||
├── power_supplies []PSU
|
||||
└── firmware []FirmwareInfo
|
||||
```text
|
||||
hardware.board
|
||||
hardware.devices
|
||||
hardware.cpus
|
||||
hardware.memory
|
||||
hardware.storage
|
||||
hardware.volumes
|
||||
hardware.pcie_devices
|
||||
hardware.gpus
|
||||
hardware.network_adapters
|
||||
hardware.network_cards
|
||||
hardware.power_supplies
|
||||
hardware.firmware
|
||||
```
|
||||
|
||||
---
|
||||
`network_cards` is legacy/alternate source data.
|
||||
`hardware.devices` is the canonical cross-section inventory.
|
||||
|
||||
## Canonical Device Repository (`hardware.devices`)
|
||||
## Canonical inventory: `hardware.devices`
|
||||
|
||||
`hardware.devices` is the **single source of truth** for hardware inventory.
|
||||
`hardware.devices` is the single source of truth for device-oriented UI and Reanimator export.
|
||||
|
||||
### Rules — must not be violated
|
||||
Required rules:
|
||||
|
||||
1. All UI tabs displaying hardware components **must read from `hardware.devices`**.
|
||||
2. The Device Inventory tab shows kinds: `pcie`, `storage`, `gpu`, `network`.
|
||||
3. The Reanimator exporter **must use the same `hardware.devices`** as the UI.
|
||||
4. Any discrepancy between UI data and Reanimator export data is a **bug**.
|
||||
5. New hardware attributes must be added to the canonical device schema **first**,
|
||||
then mapped to Reanimator/UI — never the other way around.
|
||||
6. The exporter should group/filter canonical records by section, not rebuild data
|
||||
from multiple sources.
|
||||
1. UI hardware views must read from `hardware.devices`
|
||||
2. Reanimator conversion must derive device sections from `hardware.devices`
|
||||
3. UI/export mismatches are bugs, not accepted divergence
|
||||
4. New shared device fields belong in `HardwareDevice` first
|
||||
|
||||
### Deduplication logic (applied once by repository builder)
|
||||
Deduplication priority:
|
||||
|
||||
| Priority | Key used |
|
||||
|----------|----------|
|
||||
| 1 | `serial_number` — usable (not empty, not `N/A`, `NA`, `NONE`, `NULL`, `UNKNOWN`, `-`) |
|
||||
| 2 | `bdf` — PCI Bus:Device.Function address |
|
||||
| 3 | No merge — records remain distinct if both serial and bdf are absent |
|
||||
| Priority | Key |
|
||||
|----------|-----|
|
||||
| 1 | usable `serial_number` |
|
||||
| 2 | `bdf` |
|
||||
| 3 | keep records separate |
|
||||
|
||||
### Device schema alignment
|
||||
## Raw payloads
|
||||
|
||||
Keep `hardware.devices` schema as close as possible to Reanimator JSON field names.
|
||||
This minimizes translation logic in the exporter and prevents drift.
|
||||
`raw_payloads` is authoritative for replayable sources.
|
||||
|
||||
---
|
||||
Current important payloads:
|
||||
- `redfish_tree`
|
||||
- `redfish_fetch_errors`
|
||||
- `source_timezone`
|
||||
|
||||
## Source metadata fields (stored directly on `AnalysisResult`)
|
||||
Normalized hardware fields are derived output, not the long-term source of truth.
|
||||
|
||||
Carried by both `/api/status` and `/api/config`:
|
||||
## Raw export package
|
||||
|
||||
```json
|
||||
{
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "10.0.0.1",
|
||||
"collected_at": "2026-02-10T15:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
Valid `source_type` values: `archive`, `api`
|
||||
Valid `protocol` values: `redfish`, `ipmi` (empty is allowed for archive uploads)
|
||||
|
||||
---
|
||||
|
||||
## Raw Export Package (reopenable artifact)
|
||||
|
||||
`Export Raw Data` does not merely dump `AnalysisResult`; it emits a reopenable raw package
|
||||
(JSON or ZIP bundle) that carries source data required for re-analysis.
|
||||
`/api/export/json` produces a reopenable raw-export artifact.
|
||||
|
||||
Design rules:
|
||||
- raw source is authoritative (`redfish_tree` or original file bytes)
|
||||
- imports must re-analyze from raw source
|
||||
- parsed field snapshots included in bundles are diagnostic artifacts, not the source of truth
|
||||
- raw source stays authoritative
|
||||
- uploads of raw-export artifacts must re-analyze from raw source
|
||||
- parsed snapshots inside the bundle are diagnostic only
|
||||
|
||||
@@ -3,107 +3,69 @@
|
||||
Collectors live in `internal/collector/`.
|
||||
|
||||
Core files:
|
||||
- `internal/collector/registry.go` — connector registry (`redfish`, `ipmi`)
|
||||
- `internal/collector/redfish.go` — real Redfish connector
|
||||
- `internal/collector/ipmi_mock.go` — IPMI mock connector scaffold
|
||||
- `internal/collector/types.go` — request/progress contracts
|
||||
- `registry.go` for protocol registration
|
||||
- `redfish.go` for live collection
|
||||
- `redfish_replay.go` for replay from raw payloads
|
||||
- `ipmi_mock.go` for the placeholder IPMI implementation
|
||||
- `types.go` for request/progress contracts
|
||||
|
||||
---
|
||||
## Redfish collector
|
||||
|
||||
## Redfish Collector (`redfish`)
|
||||
Status: active production path.
|
||||
|
||||
**Status:** Production-ready.
|
||||
Request fields passed from the server:
|
||||
- `host`
|
||||
- `port`
|
||||
- `username`
|
||||
- `auth_type`
|
||||
- credential field (`password` or token)
|
||||
- `tls_mode`
|
||||
|
||||
### Request contract (from server)
|
||||
### Core rule
|
||||
|
||||
Passed through from `/api/collect` after validation:
|
||||
- `host`, `port`, `username`
|
||||
- `auth_type=password|token` (+ matching credential field)
|
||||
- `tls_mode=strict|insecure`
|
||||
Live collection and replay must stay behaviorally aligned.
|
||||
If the collector adds a fallback, probe, or normalization rule, replay must mirror it.
|
||||
|
||||
### Discovery
|
||||
### Discovery model
|
||||
|
||||
Dynamic — does not assume fixed paths. Discovers:
|
||||
- `Systems` collection → per-system resources
|
||||
- `Chassis` collection → enclosure/board data
|
||||
- `Managers` collection → BMC/firmware info
|
||||
The collector does not rely on one fixed vendor tree.
|
||||
It discovers and follows Redfish resources dynamically from root collections such as:
|
||||
- `Systems`
|
||||
- `Chassis`
|
||||
- `Managers`
|
||||
|
||||
### Collected data
|
||||
### Stored raw data
|
||||
|
||||
| Category | Notes |
|
||||
|----------|-------|
|
||||
| CPU | Model, cores, threads, socket, status |
|
||||
| Memory | DIMM slot, size, type, speed, serial, manufacturer |
|
||||
| Storage | Slot, type, model, serial, firmware, interface, status |
|
||||
| GPU | Detected via PCIe class + NVIDIA vendor ID |
|
||||
| PSU | Model, serial, wattage, firmware, telemetry (input/output power, voltage) |
|
||||
| NIC | Model, serial, port count, BDF |
|
||||
| PCIe | Slot, vendor_id, device_id, BDF, link width/speed |
|
||||
| Firmware | BIOS, BMC versions |
|
||||
Important raw payloads:
|
||||
- `raw_payloads.redfish_tree`
|
||||
- `raw_payloads.redfish_fetch_errors`
|
||||
- `raw_payloads.source_timezone` when available
|
||||
|
||||
### Raw snapshot
|
||||
### Snapshot crawler rules
|
||||
|
||||
Full Redfish response tree is stored in `result.RawPayloads["redfish_tree"]`.
|
||||
This allows future offline re-analysis without re-collecting from a live BMC.
|
||||
- bounded by `LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`
|
||||
- prioritized toward high-value inventory paths
|
||||
- tolerant of expected vendor-specific failures
|
||||
- normalizes `@odata.id` values before queueing
|
||||
|
||||
### Unified Redfish analysis pipeline (live == replay)
|
||||
### Redfish implementation guidance
|
||||
|
||||
LOGPile uses a **single Redfish analyzer path**:
|
||||
When changing collection logic:
|
||||
|
||||
1. Live collector crawls the Redfish API and builds `raw_payloads.redfish_tree`
|
||||
2. Parsed result is produced by replaying that tree through the same analyzer used by raw import
|
||||
1. Prefer alternate-path support over vendor hardcoding
|
||||
2. Keep expensive probing bounded
|
||||
3. Deduplicate by serial, then BDF, then location/model fallbacks
|
||||
4. Preserve replay determinism from saved raw payloads
|
||||
5. Add tests for both the motivating topology and a negative case
|
||||
|
||||
This guarantees that live collection and `Export Raw Data` re-open/re-analyze produce the same
|
||||
normalized output for the same `redfish_tree`.
|
||||
### Known vendor fallbacks
|
||||
|
||||
### Snapshot crawler behavior (important)
|
||||
- empty standard drive collections may trigger bounded `Disk.Bay` probing
|
||||
- `Storage.Links.Enclosures[*]` may be followed to recover physical drives
|
||||
- `PowerSubsystem/PowerSupplies` is preferred over legacy `Power` when available
|
||||
|
||||
The Redfish snapshot crawler is intentionally:
|
||||
- **bounded** (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
|
||||
- **prioritized** (PCIe, Fabrics, FirmwareInventory, Storage, PowerSubsystem, ThermalSubsystem)
|
||||
- **tolerant** (skips noisy expected failures, strips `#fragment` from `@odata.id`)
|
||||
## IPMI collector
|
||||
|
||||
Design notes:
|
||||
- Queue capacity is sized to snapshot cap to avoid worker deadlocks on large trees.
|
||||
- UI progress is coarse and human-readable; detailed per-request diagnostics are available via debug logs.
|
||||
- `LOGPILE_REDFISH_DEBUG=1` and `LOGPILE_REDFISH_SNAPSHOT_DEBUG=1` enable console diagnostics.
|
||||
Status: mock scaffold only.
|
||||
|
||||
### Parsing guidelines
|
||||
|
||||
When adding Redfish mappings, follow these principles:
|
||||
- Support alternate collection paths (resources may appear at different odata URLs).
|
||||
- Follow `@odata.id` references and handle embedded `Members` arrays.
|
||||
- Prefer **raw-tree replay compatibility**: if live collector adds a fallback/probe, replay analyzer must mirror it.
|
||||
- Deduplicate by serial / BDF / slot+model (in that priority order).
|
||||
- Prefer tolerant/fallback parsing — missing fields should be silently skipped,
|
||||
not cause the whole collection to fail.
|
||||
|
||||
### Vendor-specific storage fallbacks (Supermicro and similar)
|
||||
|
||||
When standard `Storage/.../Drives` collections are empty, collector/replay may recover drives via:
|
||||
- `Storage.Links.Enclosures[*] -> .../Drives`
|
||||
- direct probing of finite `Disk.Bay` candidates (`Disk.Bay.0`, `Disk.Bay0`, `.../0`)
|
||||
|
||||
This is required for some BMCs that publish drive inventory in vendor-specific paths while leaving
|
||||
standard collections empty.
|
||||
|
||||
### PSU source preference (newer Redfish)
|
||||
|
||||
PSU inventory source order:
|
||||
1. `Chassis/*/PowerSubsystem/PowerSupplies` (preferred on X14+/newer Redfish)
|
||||
2. `Chassis/*/Power` (legacy fallback)
|
||||
|
||||
### Progress reporting
|
||||
|
||||
The collector emits progress log entries at each stage (connecting, enumerating systems,
|
||||
collecting CPUs, etc.) so the UI can display meaningful status.
|
||||
Current progress message strings are user-facing and may be localized.
|
||||
|
||||
---
|
||||
|
||||
## IPMI Collector (`ipmi`)
|
||||
|
||||
**Status:** Mock scaffold only — not implemented.
|
||||
|
||||
Registered in the collector registry but returns placeholder data.
|
||||
Real IPMI support is a future work item.
|
||||
It remains registered for protocol completeness, but it is not a real collection path.
|
||||
|
||||
@@ -2,261 +2,69 @@
|
||||
|
||||
## Framework
|
||||
|
||||
### Registration
|
||||
Parsers live in `internal/parser/` and vendor implementations live in `internal/parser/vendors/`.
|
||||
|
||||
Each vendor parser registers itself via Go's `init()` side-effect import pattern.
|
||||
Core behavior:
|
||||
- registration uses `init()` side effects
|
||||
- all registered parsers run `Detect()`
|
||||
- the highest-confidence parser wins
|
||||
- generic fallback stays last and low-confidence
|
||||
|
||||
All registrations are collected in `internal/parser/vendors/vendors.go`:
|
||||
```go
|
||||
import (
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/dell"
|
||||
// etc.
|
||||
)
|
||||
```
|
||||
|
||||
### VendorParser interface
|
||||
`VendorParser` contract:
|
||||
|
||||
```go
|
||||
type VendorParser interface {
|
||||
Name() string // human-readable name
|
||||
Vendor() string // vendor identifier string
|
||||
Version() string // parser version (increment on logic changes)
|
||||
Detect(files []ExtractedFile) int // confidence 0–100
|
||||
Name() string
|
||||
Vendor() string
|
||||
Version() string
|
||||
Detect(files []ExtractedFile) int
|
||||
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
|
||||
}
|
||||
```
|
||||
|
||||
### Selection logic
|
||||
## Adding a parser
|
||||
|
||||
All registered parsers run `Detect()` against the uploaded archive's file list.
|
||||
The parser with the **highest confidence score** is selected.
|
||||
Multiple parsers may return >0; only the top scorer is used.
|
||||
1. Create `internal/parser/vendors/<vendor>/`
|
||||
2. Start from `internal/parser/vendors/template/parser.go.template`
|
||||
3. Implement `Detect()` and `Parse()`
|
||||
4. Add a blank import in `internal/parser/vendors/vendors.go`
|
||||
5. Add at least one positive and one negative detection test
|
||||
|
||||
### Adding a new vendor parser
|
||||
## Data quality rules
|
||||
|
||||
1. `mkdir -p internal/parser/vendors/VENDORNAME`
|
||||
2. Copy `internal/parser/vendors/template/parser.go.template` as starting point.
|
||||
3. Implement `Detect()` and `Parse()`.
|
||||
4. Add blank import to `vendors/vendors.go`.
|
||||
### System firmware only in `hardware.firmware`
|
||||
|
||||
`Detect()` tips:
|
||||
- Look for unique filenames or directory names.
|
||||
- Check file content for vendor-specific markers.
|
||||
- Return 70+ only when confident; return 0 if clearly not a match.
|
||||
`hardware.firmware` must contain system-level firmware only.
|
||||
Device-bound firmware belongs on the device record and must not be duplicated at the top level.
|
||||
|
||||
### Parser versioning
|
||||
### Strip embedded MAC addresses from model names
|
||||
|
||||
Each parser file contains a `parserVersion` constant.
|
||||
Increment the version whenever parsing logic changes — this helps trace which
|
||||
version produced a given result.
|
||||
If a source embeds ` - XX:XX:XX:XX:XX:XX` in a model/name field, remove that suffix before storing it.
|
||||
|
||||
---
|
||||
### Use `pci.ids` for empty or generic PCI model names
|
||||
|
||||
## Parser data quality rules
|
||||
When `vendor_id` and `device_id` are known but the model name is missing or generic, resolve the name via `internal/parser/vendors/pciids`.
|
||||
|
||||
### FirmwareInfo — system-level only
|
||||
## Active vendor coverage
|
||||
|
||||
`Hardware.Firmware` must contain **only system-level firmware**: BIOS, BMC/iDRAC,
|
||||
Lifecycle Controller, CPLD, storage controllers, BOSS adapters.
|
||||
| Vendor ID | Input family | Notes |
|
||||
|-----------|--------------|-------|
|
||||
| `dell` | TSR ZIP archives | Broad hardware, firmware, sensors, lifecycle events |
|
||||
| `h3c_g5` | H3C SDS G5 bundles | INI/XML/CSV-driven hardware and event parsing |
|
||||
| `h3c_g6` | H3C SDS G6 bundles | Similar flow with G6-specific files |
|
||||
| `inspur` | onekeylog archives | FRU/SDR plus optional Redis enrichment |
|
||||
| `nvidia` | HGX Field Diagnostics | GPU- and fabric-heavy diagnostic input |
|
||||
| `nvidia_bug_report` | `nvidia-bug-report-*.log.gz` | dmidecode, lspci, NVIDIA driver sections |
|
||||
| `unraid` | Unraid diagnostics/log bundles | Server and storage-focused parsing |
|
||||
| `xigmanas` | XigmaNAS plain logs | FreeBSD/NAS-oriented inventory |
|
||||
| `generic` | fallback | Low-confidence text fallback when nothing else matches |
|
||||
|
||||
**Device-bound firmware** (NIC, GPU, PSU, disk, backplane) **must NOT be added to
|
||||
`Hardware.Firmware`**. It belongs to the device's own `Firmware` field and is already
|
||||
present there. Duplicating it in `Hardware.Firmware` causes double entries in Reanimator.
|
||||
## Practical guidance
|
||||
|
||||
The Reanimator exporter filters by `FirmwareInfo.DeviceName` prefix and by
|
||||
`FirmwareInfo.Description` (FQDD prefix). Parsers must cooperate:
|
||||
|
||||
- Store the device's FQDD (or equivalent slot identifier) in `FirmwareInfo.Description`
|
||||
for all firmware entries that come from a per-device inventory source (e.g. Dell
|
||||
`DCIM_SoftwareIdentity`).
|
||||
- FQDD prefixes that are device-bound: `NIC.`, `PSU.`, `Disk.`, `RAID.Backplane.`, `GPU.`
|
||||
|
||||
### NIC/device model names — strip embedded MAC addresses
|
||||
|
||||
Some vendors (confirmed: Dell TSR) embed the MAC address in the device model name field,
|
||||
e.g. `ProductName = "NVIDIA ConnectX-6 Lx 2x 25G SFP28 OCP3.0 SFF - C4:70:BD:DB:56:08"`.
|
||||
|
||||
**Rule:** Strip any ` - XX:XX:XX:XX:XX:XX` suffix from model/name strings before storing
|
||||
them in `FirmwareInfo.DeviceName`, `NetworkAdapter.Model`, or any other model field.
|
||||
|
||||
Use `nicMACInModelRE` (defined in the Dell parser) or an equivalent regex:
|
||||
```
|
||||
\s+-\s+([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$
|
||||
```
|
||||
|
||||
This applies to **all** string fields used as device names or model identifiers.
|
||||
|
||||
### PCI device name enrichment via pci.ids
|
||||
|
||||
If a PCIe device, GPU, NIC, or any hardware component has a `vendor_id` + `device_id`
|
||||
but its model/name field is **empty or generic** (e.g. blank, equals the description,
|
||||
or is just a raw hex ID), the parser **must** attempt to resolve the human-readable
|
||||
model name from the embedded `pci.ids` database before storing the result.
|
||||
|
||||
**Rule:** When `Model` (or equivalent name field) is empty and both `VendorID` and
|
||||
`DeviceID` are non-zero, call the pciids lookup and use the result as the model name.
|
||||
|
||||
```go
|
||||
// Example pattern — use in any parser that handles PCIe/GPU/NIC devices:
|
||||
if strings.TrimSpace(device.Model) == "" && device.VendorID != 0 && device.DeviceID != 0 {
|
||||
if name := pciids.Lookup(device.VendorID, device.DeviceID); name != "" {
|
||||
device.Model = name
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This rule applies to all vendor parsers. The pciids package is available at
|
||||
`internal/parser/vendors/pciids`. See ADL-005 for the rationale.
|
||||
|
||||
**Do not hardcode model name strings.** If a device is unknown today, it will be
|
||||
resolved automatically once `pci.ids` is updated.
|
||||
|
||||
---
|
||||
|
||||
## Vendor parsers
|
||||
|
||||
### Inspur / Kaytus (`inspur`)
|
||||
|
||||
**Status:** Ready. Tested on KR4268X2 (onekeylog format).
|
||||
|
||||
**Archive format:** `.tar.gz` onekeylog
|
||||
|
||||
**Primary source files:**
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `asset.json` | Base hardware inventory |
|
||||
| `component.log` | Component list |
|
||||
| `devicefrusdr.log` | FRU and SDR data |
|
||||
| `onekeylog/runningdata/redis-dump.rdb` | Runtime enrichment (optional) |
|
||||
|
||||
**Redis RDB enrichment** (applied conservatively — fills missing fields only):
|
||||
- GPU: `serial_number`, `firmware` (VBIOS/FW), runtime telemetry
|
||||
- NIC: firmware, serial, part number (when text logs leave fields empty)
|
||||
|
||||
**Module structure:**
|
||||
```
|
||||
inspur/
|
||||
parser.go — main parser + registration
|
||||
sdr.go — sensor/SDR parsing
|
||||
fru.go — FRU serial parsing
|
||||
asset.go — asset.json parsing
|
||||
syslog.go — syslog parsing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Dell TSR (`dell`)
|
||||
|
||||
**Status:** Ready (v3.0). Tested on nested TSR archives with embedded `*.pl.zip`.
|
||||
|
||||
**Archive format:** `.zip` (outer archive + nested `*.pl.zip`)
|
||||
|
||||
**Primary source files:**
|
||||
- `tsr/metadata.json`
|
||||
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_View.xml`
|
||||
- `tsr/hardware/sysinfo/inventory/sysinfo_DCIM_SoftwareIdentity.xml`
|
||||
- `tsr/hardware/sysinfo/inventory/sysinfo_CIM_Sensor.xml`
|
||||
- `tsr/hardware/sysinfo/lcfiles/curr_lclog.xml`
|
||||
|
||||
**Extracted data:**
|
||||
- Board/system identity and BIOS/iDRAC firmware
|
||||
- CPU, memory, physical disks, virtual disks, PSU, NIC, PCIe
|
||||
- GPU inventory (`DCIM_VideoView`) + GPU sensor enrichment (`DCIM_GPUSensor`)
|
||||
- Controller/backplane inventory (`DCIM_ControllerView`, `DCIM_EnclosureView`)
|
||||
- Sensor readings (temperature/voltage/current/power/fan/utilization)
|
||||
- Lifecycle events (`curr_lclog.xml`)
|
||||
|
||||
---
|
||||
|
||||
### NVIDIA HGX Field Diagnostics (`nvidia`)
|
||||
|
||||
**Status:** Ready (v1.1.0). Works with any server vendor.
|
||||
|
||||
**Archive format:** `.tar` / `.tar.gz`
|
||||
|
||||
**Confidence scoring:**
|
||||
|
||||
| File | Score |
|
||||
|------|-------|
|
||||
| `unified_summary.json` with "HGX Field Diag" marker | +40 |
|
||||
| `summary.json` | +20 |
|
||||
| `summary.csv` | +15 |
|
||||
| `gpu_fieldiag/` directory | +15 |
|
||||
|
||||
**Source files:**
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `output.log` | dmidecode — server manufacturer, model, serial number |
|
||||
| `unified_summary.json` | GPU details, NVSwitch devices, PCI addresses |
|
||||
| `summary.json` | Diagnostic test results and error codes |
|
||||
| `summary.csv` | Alternative test results format |
|
||||
|
||||
**Extracted data:**
|
||||
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
|
||||
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
|
||||
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
|
||||
|
||||
**Severity mapping:**
|
||||
- `info` — tests passed
|
||||
- `warning` — e.g. "Row remapping failed"
|
||||
- `critical` — error codes 300+
|
||||
|
||||
**Known limitations:**
|
||||
- Detailed logs in `gpu_fieldiag/*.log` are not parsed.
|
||||
- No CPU, memory, or storage extraction (not present in field diag archives).
|
||||
|
||||
---
|
||||
|
||||
### NVIDIA Bug Report (`nvidia_bug_report`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
|
||||
**File format:** `nvidia-bug-report-*.log.gz` (gzip-compressed text)
|
||||
|
||||
**Confidence:** 85 (high priority for matching filename pattern)
|
||||
|
||||
**Source sections parsed:**
|
||||
|
||||
| dmidecode section | Extracts |
|
||||
|-------------------|---------|
|
||||
| System Information | server serial, UUID, manufacturer, product name |
|
||||
| Processor Information | CPU model, serial, core/thread count, frequency |
|
||||
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
|
||||
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
|
||||
|
||||
| Other source | Extracts |
|
||||
|--------------|---------|
|
||||
| `lspci -vvv` (Ethernet/Network/IB) | NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
|
||||
| `/proc/driver/nvidia/gpus/*/information` | GPU model, BDF, UUID, VBIOS version, IRQ |
|
||||
| NVRM version line | NVIDIA driver version |
|
||||
|
||||
**Known limitations:**
|
||||
- Driver error/warning log lines not yet extracted.
|
||||
- GPU temperature/utilization metrics require additional parsing sections.
|
||||
|
||||
---
|
||||
|
||||
### XigmaNAS (`xigmanas`)
|
||||
|
||||
**Status:** Ready.
|
||||
|
||||
**Archive format:** Plain log files (FreeBSD-based NAS system)
|
||||
|
||||
**Detection:** Files named `xigmanas`, `system`, or `dmesg`; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
|
||||
|
||||
**Extracted data:**
|
||||
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
|
||||
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
|
||||
- Populates: `Hardware.Firmware`, `Hardware.CPUs`, `Hardware.Memory`, `Hardware.Storage`, `Sensors`
|
||||
|
||||
---
|
||||
|
||||
### Unraid (`unraid`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
- Be conservative with high detect scores
|
||||
- Prefer filling missing fields over overwriting stronger source data
|
||||
- Keep parser version constants current when behavior changes
|
||||
- Any new vendor-specific filtering or dedup logic must ship with tests for that vendor format
|
||||
|
||||
**Archive format:** Unraid diagnostics archive contents (text-heavy diagnostics directories).
|
||||
|
||||
|
||||
@@ -1,366 +1,63 @@
|
||||
# 07 — Exporters & Reanimator Integration
|
||||
# 07 — Exporters
|
||||
|
||||
## Export endpoints summary
|
||||
## Export surfaces
|
||||
|
||||
| Endpoint | Format | Filename pattern |
|
||||
|----------|--------|-----------------|
|
||||
| `GET /api/export/csv` | CSV — serial numbers | `YYYY-MM-DD (MODEL) - SN.csv` |
|
||||
| `GET /api/export/json` | **Raw export package** (JSON or ZIP bundle) for reopen/re-analysis | `YYYY-MM-DD (MODEL) - SN.(json|zip)` |
|
||||
| `GET /api/export/reanimator` | Reanimator hardware JSON | `YYYY-MM-DD (MODEL) - SN.json` |
|
||||
| Endpoint | Output | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `GET /api/export/csv` | CSV | Serial-number export |
|
||||
| `GET /api/export/json` | raw-export ZIP bundle | Reopen and re-analyze later |
|
||||
| `GET /api/export/reanimator` | JSON | Reanimator hardware payload |
|
||||
| `POST /api/convert` | async ZIP artifact | Batch archive-to-Reanimator conversion |
|
||||
|
||||
---
|
||||
## Raw export
|
||||
|
||||
## Raw Export (`Export Raw Data`)
|
||||
Raw export is not a final report dump.
|
||||
It is a replayable artifact that preserves enough source data for future parser improvements.
|
||||
|
||||
### Purpose
|
||||
Current bundle contents:
|
||||
- `raw_export.json`
|
||||
- `collect.log`
|
||||
- `parser_fields.json`
|
||||
|
||||
Preserve enough source data to reproduce parsing later after parser fixes, without requiring
|
||||
another live collection from the target system.
|
||||
Design rules:
|
||||
- raw source is authoritative
|
||||
- uploads of raw export must replay from raw source
|
||||
- parsed snapshots inside the bundle are diagnostic only
|
||||
|
||||
### Format
|
||||
## Reanimator export
|
||||
|
||||
`/api/export/json` returns a **raw export package**:
|
||||
- JSON package (machine-readable), or
|
||||
- ZIP bundle containing:
|
||||
- `raw_export.json` — machine-readable package
|
||||
- `collect.log` — human-readable collection + parsing summary
|
||||
- `parser_fields.json` — structured parsed field snapshot for diffs between parser versions
|
||||
Implementation files:
|
||||
- `internal/exporter/reanimator_models.go`
|
||||
- `internal/exporter/reanimator_converter.go`
|
||||
- `internal/server/handlers.go`
|
||||
|
||||
### Import / reopen behavior
|
||||
Conversion rules:
|
||||
- canonical source is `hardware.devices`
|
||||
- timestamps are RFC3339
|
||||
- status is normalized to Reanimator-friendly values
|
||||
- missing PCIe serials may be generated from board serial + slot
|
||||
- `NULL`-style board manufacturer/product values are treated as absent
|
||||
|
||||
When a raw export package is uploaded back into LOGPile:
|
||||
- the app **re-analyzes from raw source**
|
||||
- it does **not** trust embedded parsed output as source of truth
|
||||
## Inclusion rules
|
||||
|
||||
For Redfish, this means replay from `raw_payloads.redfish_tree`.
|
||||
Included:
|
||||
- empty memory slots (`present=false`) for topology visibility
|
||||
- PCIe-class devices even when serial must be synthesized
|
||||
|
||||
### Design rule
|
||||
Excluded:
|
||||
- storage without `serial_number`
|
||||
- power supplies without `serial_number`
|
||||
- non-present network adapters
|
||||
- device-bound firmware duplicated at top-level firmware list
|
||||
|
||||
Raw export is a **re-analysis artifact**, not a final report dump. Keep it self-contained and
|
||||
forward-compatible where possible (versioned package format, additive fields only).
|
||||
## Batch convert
|
||||
|
||||
---
|
||||
`POST /api/convert` accepts multiple supported files and produces a ZIP with:
|
||||
- one `*.reanimator.json` file per successful input
|
||||
- `convert-summary.txt`
|
||||
|
||||
## Reanimator Export
|
||||
|
||||
### Purpose
|
||||
|
||||
Exports hardware inventory data in the format expected by the Reanimator asset tracking
|
||||
system. Enables one-click push from LOGPile to an external asset management platform.
|
||||
|
||||
### Implementation files
|
||||
|
||||
| File | Role |
|
||||
|------|------|
|
||||
| `internal/exporter/reanimator_models.go` | Go structs for Reanimator JSON |
|
||||
| `internal/exporter/reanimator_converter.go` | `ConvertToReanimator()` and helpers |
|
||||
| `internal/server/handlers.go` | `handleExportReanimator()` HTTP handler |
|
||||
|
||||
### Conversion rules
|
||||
|
||||
- Source: canonical `hardware.devices` repository (see [`04-data-models.md`](04-data-models.md))
|
||||
- CPU manufacturer inferred from model string (Intel / AMD / ARM / Ampere)
|
||||
- PCIe serial number generated when absent: `{board_serial}-PCIE-{slot}`
|
||||
- Status values normalized to: `OK`, `Warning`, `Critical`, `Unknown` (`Empty` only for memory slots)
|
||||
- Timestamps in RFC3339 format
|
||||
- `target_host` derived from `filename` field (`redfish://…`, `ipmi://…`) if not in source; omitted if undeterminable
|
||||
- `board.manufacturer` and `board.product_name` values of `"NULL"` treated as absent
|
||||
|
||||
### LOGPile → Reanimator field mapping
|
||||
|
||||
| LOGPile type | Reanimator section | Notes |
|
||||
|---|---|---|
|
||||
| `BoardInfo` | `board` | Direct mapping |
|
||||
| `CPU` | `cpus` | + manufacturer (inferred) |
|
||||
| `MemoryDIMM` | `memory` | Direct; empty slots included (`present=false`) |
|
||||
| `Storage` | `storage` | Excluded if no `serial_number` |
|
||||
| `PCIeDevice` | `pcie_devices` | Serial generated if missing |
|
||||
| `GPU` | `pcie_devices` | `device_class=DisplayController` |
|
||||
| `NetworkAdapter` | `pcie_devices` | `device_class=NetworkController` |
|
||||
| `PSU` | `power_supplies` | Excluded if no serial or `present=false` |
|
||||
| `FirmwareInfo` | `firmware` | Direct mapping |
|
||||
|
||||
### Inclusion / exclusion rules
|
||||
|
||||
**Included:**
|
||||
- Memory slots with `present=false` (as Empty slots)
|
||||
- PCIe devices without serial number (serial is generated)
|
||||
|
||||
**Excluded:**
|
||||
- Storage without `serial_number`
|
||||
- PSU without `serial_number` or with `present=false`
|
||||
- NetworkAdapters with `present=false`
|
||||
|
||||
---
|
||||
|
||||
## Reanimator Integration Guide
|
||||
|
||||
This section documents the Reanimator receiver-side JSON format (what the Reanimator
|
||||
system expects when it ingests a LOGPile export).
|
||||
|
||||
> **Important:** The Reanimator endpoint uses a strict JSON decoder (`DisallowUnknownFields`).
|
||||
> Any unknown field — including nested ones — causes `400 Bad Request`.
|
||||
> Use only `snake_case` keys listed here.
|
||||
|
||||
### Top-level structure
|
||||
|
||||
```json
|
||||
{
|
||||
"filename": "redfish://10.10.10.103",
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "10.10.10.103",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"hardware": {
|
||||
"board": {...},
|
||||
"firmware": [...],
|
||||
"cpus": [...],
|
||||
"memory": [...],
|
||||
"storage": [...],
|
||||
"pcie_devices": [...],
|
||||
"power_supplies": [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Required:** `collected_at`, `hardware.board.serial_number`
|
||||
**Optional:** `target_host`, `source_type`, `protocol`, `filename`
|
||||
|
||||
`source_type` values: `api`, `logfile`, `manual`
|
||||
`protocol` values: `redfish`, `ipmi`, `snmp`, `ssh`
|
||||
|
||||
### Component status fields (all component sections)
|
||||
|
||||
Each component may carry:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `status` | string | `OK`, `Warning`, `Critical`, `Unknown`, `Empty` |
|
||||
| `status_checked_at` | RFC3339 | When status was last verified |
|
||||
| `status_changed_at` | RFC3339 | When status last changed |
|
||||
| `status_at_collection` | object | `{ "status": "...", "at": "..." }` — snapshot-time status |
|
||||
| `status_history` | array | `[{ "status": "...", "changed_at": "...", "details": "..." }]` |
|
||||
| `error_description` | string | Human-readable error for Warning/Critical |
|
||||
|
||||
### Board
|
||||
|
||||
```json
|
||||
{
|
||||
"board": {
|
||||
"manufacturer": "Supermicro",
|
||||
"product_name": "X12DPG-QT6",
|
||||
"serial_number": "21D634101",
|
||||
"part_number": "X12DPG-QT6-REV1.01",
|
||||
"uuid": "d7ef2fe5-2fd0-11f0-910a-346f11040868"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`serial_number` required. `manufacturer` / `product_name` of `"NULL"` treated as absent.
|
||||
|
||||
### CPUs
|
||||
|
||||
```json
|
||||
{
|
||||
"socket": 0,
|
||||
"model": "INTEL(R) XEON(R) GOLD 6530",
|
||||
"cores": 32,
|
||||
"threads": 64,
|
||||
"frequency_mhz": 2100,
|
||||
"max_frequency_mhz": 4000,
|
||||
"manufacturer": "Intel",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`socket` (int) and `model` required. Serial generated: `{board_serial}-CPU-{socket}`.
|
||||
|
||||
LOT format: `CPU_{VENDOR}_{MODEL_NORMALIZED}` → e.g. `CPU_INTEL_XEON_GOLD_6530`
|
||||
|
||||
### Memory
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "CPU0_C0D0",
|
||||
"location": "CPU0_C0D0",
|
||||
"present": true,
|
||||
"size_mb": 32768,
|
||||
"type": "DDR5",
|
||||
"max_speed_mhz": 4800,
|
||||
"current_speed_mhz": 4800,
|
||||
"manufacturer": "Hynix",
|
||||
"serial_number": "80AD032419E17CEEC1",
|
||||
"part_number": "HMCG88AGBRA191N",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot` and `present` required. `serial_number` required when `present=true`.
|
||||
Empty slots (`present=false`, `status="Empty"`) are included but no component created.
|
||||
|
||||
LOT format: `DIMM_{TYPE}_{SIZE_GB}GB` → e.g. `DIMM_DDR5_32GB`
|
||||
|
||||
### Storage
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "OB01",
|
||||
"type": "NVMe",
|
||||
"model": "INTEL SSDPF2KX076T1",
|
||||
"size_gb": 7680,
|
||||
"serial_number": "BTAX41900GF87P6DGN",
|
||||
"manufacturer": "Intel",
|
||||
"firmware": "9CV10510",
|
||||
"interface": "NVMe",
|
||||
"present": true,
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot`, `model`, `serial_number`, `present` required.
|
||||
|
||||
LOT format: `{TYPE}_{INTERFACE}_{SIZE_TB}TB` → e.g. `SSD_NVME_07.68TB`
|
||||
|
||||
### Power Supplies
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "0",
|
||||
"present": true,
|
||||
"model": "GW-CRPS3000LW",
|
||||
"vendor": "Great Wall",
|
||||
"wattage_w": 3000,
|
||||
"serial_number": "2P06C102610",
|
||||
"part_number": "V0310C9000000000",
|
||||
"firmware": "00.03.05",
|
||||
"status": "OK",
|
||||
"input_power_w": 137,
|
||||
"output_power_w": 104,
|
||||
"input_voltage": 215.25
|
||||
}
|
||||
```
|
||||
|
||||
`slot`, `present` required. `serial_number` required when `present=true`.
|
||||
Telemetry fields (`input_power_w`, `output_power_w`, `input_voltage`) stored in observation only.
|
||||
|
||||
LOT format: `PSU_{WATTAGE}W_{VENDOR_NORMALIZED}` → e.g. `PSU_3000W_GREAT_WALL`
|
||||
|
||||
### PCIe Devices
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "PCIeCard1",
|
||||
"vendor_id": 32902,
|
||||
"device_id": 2912,
|
||||
"bdf": "0000:18:00.0",
|
||||
"device_class": "MassStorageController",
|
||||
"manufacturer": "Intel",
|
||||
"model": "RAID Controller RSP3DD080F",
|
||||
"link_width": 8,
|
||||
"link_speed": "Gen3",
|
||||
"max_link_width": 8,
|
||||
"max_link_speed": "Gen3",
|
||||
"serial_number": "RAID-001-12345",
|
||||
"firmware": "50.9.1-4296",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot` required. Serial generated if absent: `{board_serial}-PCIE-{slot}`.
|
||||
|
||||
`device_class` values: `NetworkController`, `MassStorageController`, `DisplayController`, etc.
|
||||
|
||||
LOT format: `PCIE_{DEVICE_CLASS}_{MODEL_NORMALIZED}` → e.g. `PCIE_NETWORK_CONNECTX5`
|
||||
|
||||
### Firmware
|
||||
|
||||
```json
|
||||
[
|
||||
{ "device_name": "BIOS", "version": "06.08.05" },
|
||||
{ "device_name": "BMC", "version": "5.17.00" }
|
||||
]
|
||||
```
|
||||
|
||||
Both fields required. Changes trigger `FIRMWARE_CHANGED` timeline events.
|
||||
|
||||
---
|
||||
|
||||
### Import process (Reanimator side)
|
||||
|
||||
1. Validate `collected_at` (RFC3339) and `hardware.board.serial_number`.
|
||||
2. Find or create Asset by `board.serial_number` → `vendor_serial`.
|
||||
3. For each component: filter `present=false`, auto-determine LOT, find or create Component,
|
||||
create Observation, update Installations.
|
||||
4. Detect removed components (present in previous snapshot, absent in current) → close Installation.
|
||||
5. Generate timeline events: `LOG_COLLECTED`, `INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`.
|
||||
|
||||
**Idempotency:** Repeated import of the same snapshot (same content hash) returns `200 OK`
|
||||
with `"duplicate": true` and does not create duplicate records.
|
||||
|
||||
### Reanimator API endpoint
|
||||
|
||||
```http
|
||||
POST /ingest/hardware
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
**Success (201):**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"bundle_id": "lb_01J...",
|
||||
"asset_id": "mach_01J...",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"duplicate": false,
|
||||
"summary": {
|
||||
"parts_observed": 15,
|
||||
"parts_created": 2,
|
||||
"installations_created": 2,
|
||||
"timeline_events_created": 9
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Duplicate (200):**
|
||||
```json
|
||||
{ "status": "success", "duplicate": true, "message": "LogBundle with this content hash already exists" }
|
||||
```
|
||||
|
||||
**Error (400):**
|
||||
```json
|
||||
{ "status": "error", "error": "validation_failed", "details": { "field": "...", "message": "..." } }
|
||||
```
|
||||
|
||||
Common `400` causes:
|
||||
- Unknown JSON field (strict decoder)
|
||||
- Wrong key name (e.g. `targetHost` instead of `target_host`)
|
||||
- Invalid `collected_at` format (must be RFC3339)
|
||||
- Empty `hardware.board.serial_number`
|
||||
|
||||
### LOT normalization rules
|
||||
|
||||
1. Remove special chars `( ) - ® ™`; replace spaces with `_`
|
||||
2. Uppercase all
|
||||
3. Collapse multiple underscores to one
|
||||
4. Strip common prefixes like `MODEL:`, `PN:`
|
||||
|
||||
### Status values
|
||||
|
||||
| Value | Meaning | Action |
|
||||
|-------|---------|--------|
|
||||
| `OK` | Normal | — |
|
||||
| `Warning` | Degraded | Create `COMPONENT_WARNING` event (optional) |
|
||||
| `Critical` | Failed | Auto-create `failure_event`, create `COMPONENT_FAILED` event |
|
||||
| `Unknown` | Not determinable | Treat as working |
|
||||
| `Empty` | Slot unpopulated | No component created (memory/PCIe only) |
|
||||
|
||||
### Missing field handling
|
||||
|
||||
| Field | Fallback |
|
||||
|-------|---------|
|
||||
| CPU serial | Generated: `{board_serial}-CPU-{socket}` |
|
||||
| PCIe serial | Generated: `{board_serial}-PCIE-{slot}` |
|
||||
| Other serial | Component skipped if absent |
|
||||
| manufacturer (PCIe) | Looked up from `vendor_id` (8086→Intel, 10de→NVIDIA, 15b3→Mellanox…) |
|
||||
| status | Treated as `Unknown` |
|
||||
| firmware | No `FIRMWARE_CHANGED` event |
|
||||
Behavior:
|
||||
- unsupported filenames are skipped
|
||||
- each file is parsed independently
|
||||
- one bad file must not fail the whole batch if at least one conversion succeeds
|
||||
- result artifact is temporary and deleted after download
|
||||
|
||||
@@ -4,86 +4,74 @@
|
||||
|
||||
Defined in `cmd/logpile/main.go`:
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| Flag | Default | Purpose |
|
||||
|------|---------|---------|
|
||||
| `--port` | `8082` | HTTP server port |
|
||||
| `--file` | — | Reserved for archive preload (not active) |
|
||||
| `--version` | — | Print version and exit |
|
||||
| `--no-browser` | — | Do not open browser on start |
|
||||
| `--hold-on-crash` | `true` on Windows | Keep console open on fatal crash for debugging |
|
||||
| `--file` | empty | Preload archive file |
|
||||
| `--version` | `false` | Print version and exit |
|
||||
| `--no-browser` | `false` | Do not auto-open browser |
|
||||
| `--hold-on-crash` | `true` on Windows | Keep console open after fatal crash |
|
||||
|
||||
## Build
|
||||
## Common commands
|
||||
|
||||
```bash
|
||||
# Local binary (current OS/arch)
|
||||
make build
|
||||
# Output: bin/logpile
|
||||
|
||||
# Cross-platform binaries
|
||||
make build-all
|
||||
# Output:
|
||||
# bin/logpile-linux-amd64
|
||||
# bin/logpile-linux-arm64
|
||||
# bin/logpile-darwin-amd64
|
||||
# bin/logpile-darwin-arm64
|
||||
# bin/logpile-windows-amd64.exe
|
||||
```
|
||||
|
||||
Both `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort`
|
||||
before compilation to sync `pci.ids` from the submodule.
|
||||
|
||||
To skip PCI IDs update:
|
||||
```bash
|
||||
SKIP_PCI_IDS_UPDATE=1 make build
|
||||
```
|
||||
|
||||
Build flags: `CGO_ENABLED=0` — fully static binary, no C runtime dependency.
|
||||
|
||||
## PCI IDs submodule
|
||||
|
||||
Source: `third_party/pciids` (git submodule → `github.com/pciutils/pciids`)
|
||||
Local copy embedded at build time: `internal/parser/vendors/pciids/pci.ids`
|
||||
|
||||
```bash
|
||||
# Manual update
|
||||
make test
|
||||
make fmt
|
||||
make update-pci-ids
|
||||
```
|
||||
|
||||
# Init submodule after fresh clone
|
||||
Notes:
|
||||
- `make build` outputs `bin/logpile`
|
||||
- `make build-all` builds the supported cross-platform binaries
|
||||
- `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort` unless `SKIP_PCI_IDS_UPDATE=1`
|
||||
|
||||
## PCI IDs
|
||||
|
||||
Source submodule: `third_party/pciids`
|
||||
Embedded copy: `internal/parser/vendors/pciids/pci.ids`
|
||||
|
||||
Typical setup after clone:
|
||||
|
||||
```bash
|
||||
git submodule update --init third_party/pciids
|
||||
```
|
||||
|
||||
## Release process
|
||||
## Release script
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
scripts/release.sh
|
||||
./scripts/release.sh
|
||||
```
|
||||
|
||||
What it does:
|
||||
Current behavior:
|
||||
|
||||
1. Reads version from `git describe --tags`
|
||||
2. Validates clean working tree (override: `ALLOW_DIRTY=1`)
|
||||
3. Sets stable `GOPATH` / `GOCACHE` / `GOTOOLCHAIN` env
|
||||
4. Creates `releases/{VERSION}/` directory
|
||||
5. Generates `RELEASE_NOTES.md` template if not present
|
||||
6. Builds `darwin-arm64` and `windows-amd64` binaries
|
||||
7. Packages all binaries found in `bin/` as `.tar.gz` / `.zip`
|
||||
2. Refuses a dirty tree unless `ALLOW_DIRTY=1`
|
||||
3. Sets stable Go cache/toolchain environment
|
||||
4. Creates `releases/{VERSION}/`
|
||||
5. Creates a release-notes template if missing
|
||||
6. Builds `darwin-arm64` and `windows-amd64`
|
||||
7. Packages any already-present binaries from `bin/`
|
||||
8. Generates `SHA256SUMS.txt`
|
||||
9. Prints next steps (tag, push, create release manually)
|
||||
|
||||
Release notes template is created in `releases/{VERSION}/RELEASE_NOTES.md`.
|
||||
Important limitation:
|
||||
- `scripts/release.sh` does not run `make build-all` for you
|
||||
- if you want Linux or additional macOS archives in the release directory, build them before running the script
|
||||
|
||||
## Running
|
||||
## Run locally
|
||||
|
||||
```bash
|
||||
./bin/logpile
|
||||
./bin/logpile --port 9090
|
||||
./bin/logpile --no-browser
|
||||
./bin/logpile --version
|
||||
./bin/logpile --hold-on-crash # keep console open on crash (default on Windows)
|
||||
```
|
||||
|
||||
## macOS Gatekeeper
|
||||
|
||||
After downloading a binary, remove the quarantine attribute:
|
||||
```bash
|
||||
xattr -d com.apple.quarantine /path/to/logpile-darwin-arm64
|
||||
```
|
||||
|
||||
@@ -1,134 +1,54 @@
|
||||
# 09 — Testing
|
||||
|
||||
## Required before merge
|
||||
## Baseline
|
||||
|
||||
Required before merge:
|
||||
|
||||
```bash
|
||||
go test ./...
|
||||
```
|
||||
|
||||
All tests must pass before any change is merged.
|
||||
## Test locations
|
||||
|
||||
## Where to add tests
|
||||
|
||||
| Change area | Test location |
|
||||
|-------------|---------------|
|
||||
| Collectors | `internal/collector/*_test.go` |
|
||||
| HTTP handlers | `internal/server/*_test.go` |
|
||||
| Area | Location |
|
||||
|------|----------|
|
||||
| Collectors and replay | `internal/collector/*_test.go` |
|
||||
| HTTP handlers and jobs | `internal/server/*_test.go` |
|
||||
| Exporters | `internal/exporter/*_test.go` |
|
||||
| Parsers | `internal/parser/vendors/<vendor>/*_test.go` |
|
||||
| Vendor parsers | `internal/parser/vendors/<vendor>/*_test.go` |
|
||||
|
||||
## Exporter tests
|
||||
## General rules
|
||||
|
||||
The Reanimator exporter has comprehensive coverage:
|
||||
- Prefer table-driven tests
|
||||
- No network access in unit tests
|
||||
- Cover happy path and realistic failure/partial-data cases
|
||||
- New vendor parsers need both detection and parse coverage
|
||||
|
||||
| Test file | Coverage |
|
||||
|-----------|----------|
|
||||
| `reanimator_converter_test.go` | Unit tests per conversion function |
|
||||
| `reanimator_integration_test.go` | Full export with realistic `AnalysisResult` |
|
||||
## Mandatory coverage for dedup/filter/classify logic
|
||||
|
||||
Any new deduplication, filtering, or classification function must have:
|
||||
|
||||
1. A true-positive case
|
||||
2. A true-negative case
|
||||
3. A regression case for the vendor or topology that motivated the change
|
||||
|
||||
This is mandatory for inventory logic, firmware filtering, and similar code paths where silent data drift is likely.
|
||||
|
||||
## Mandatory coverage for expensive path selection
|
||||
|
||||
Any function that decides whether to crawl or probe an expensive path must have:
|
||||
|
||||
1. A positive selection case
|
||||
2. A negative exclusion case
|
||||
3. A topology-level count/integration case
|
||||
|
||||
The goal is to catch runaway I/O regressions before they ship.
|
||||
|
||||
## Useful focused commands
|
||||
|
||||
Run exporter tests only:
|
||||
```bash
|
||||
go test ./internal/exporter/...
|
||||
go test ./internal/exporter/... -v -run Reanimator
|
||||
go test ./internal/exporter/... -cover
|
||||
```
|
||||
|
||||
## Guidelines
|
||||
|
||||
- Prefer table-driven tests for parsing logic (multiple input variants).
|
||||
- Do not rely on network access in unit tests.
|
||||
- Test both the happy path and edge cases (missing fields, empty collections).
|
||||
- When adding a new vendor parser, include at minimum:
|
||||
- `Detect()` test with a positive and a negative sample file list.
|
||||
- `Parse()` test with a minimal but representative archive.
|
||||
|
||||
## Dedup and filtering functions — mandatory coverage
|
||||
|
||||
Any function that deduplicates, filters, or classifies hardware inventory items
|
||||
**must** have tests covering all three axes before the code is considered done:
|
||||
|
||||
| Axis | What to test | Why |
|
||||
|------|-------------|-----|
|
||||
| **True positive** | Items that ARE duplicates are collapsed to one | Proves the function works |
|
||||
| **True negative** | Items that are NOT duplicates are kept separate | Proves the function doesn't over-collapse |
|
||||
| **Counter-case** | The scenario that motivated the original code still works after changes | Prevents regression from future fixes |
|
||||
|
||||
### Worked example — GPU dedup regression (2026-03-11)
|
||||
|
||||
`collectGPUsFromProcessors` was added for MSI (chassis Id matches processor Id).
|
||||
No tests → when Supermicro HGX arrived (chassis Id = "HGX_GPU_SXM_1", processor Id = "GPU_SXM_1"),
|
||||
the chassis lookup silently returned nothing, serial stayed empty, UUID was new → 8 duplicate GPUs.
|
||||
|
||||
Simultaneously, fixing `gpuDocDedupKey` to use `slot|model` before path collapsed two distinct
|
||||
GraphicsControllers GPUs with the same model into one — breaking an existing test that had no
|
||||
counter-case for the path-fallback scenario.
|
||||
|
||||
**Required test matrix for any dedup function:**
|
||||
|
||||
```
|
||||
TestXxx_CollapsesDuplicates — same item via two sources → 1 result
|
||||
TestXxx_KeepsDistinct — two different items with same model → 2 results
|
||||
TestXxx_<VendorThatMotivated> — the specific vendor/setup that triggered the code
|
||||
```
|
||||
|
||||
### Worked example — firmware filter regression (2026-03-12)
|
||||
|
||||
`collectFirmwareInventory` was added in `6c19a58` without coverage for Supermicro naming.
|
||||
`isDeviceBoundFirmwareName` had patterns for Dell-style names (`"GPU SomeDevice"`, `"NIC OnboardLAN"`)
|
||||
but Supermicro Redfish uses `"GPU1 System Slot0"` and `"NIC1 System Slot0 ..."` — digit follows
|
||||
immediately after the type prefix. 29 device-bound entries leaked into `hardware.firmware`.
|
||||
|
||||
`9c5512d` attempted to fix this with HGX ID patterns (`_fw_gpu_`, etc.) in the wrong field:
|
||||
the filter checked `DeviceName` but `collectFirmwareInventory` populates it from `Name` first
|
||||
(`"Software Inventory"` for all HGX per-component slots), not from the `Id` field that contains
|
||||
the firmware ID like `"HGX_FW_GPU_SXM_1"`. The patterns were effectively dead code from day one.
|
||||
|
||||
**Required test matrix for any filter function:**
|
||||
|
||||
```
|
||||
TestXxx_FiltersDeviceBound_Dell — Dell-style names that motivated the original code
|
||||
TestXxx_FiltersDeviceBound_Supermicro — Supermicro names with digit suffix (GPU1/NIC1)
|
||||
TestXxx_KeepsSystemLevel — BIOS, BMC, CPLD names must NOT be filtered
|
||||
```
|
||||
|
||||
### Practical rule
|
||||
|
||||
When you write a new filter/dedup/classify function, ask:
|
||||
1. Does my test cover the vendor that motivated this code?
|
||||
2. Does my test cover a *different* vendor or naming convention where the function must NOT fire?
|
||||
3. If I change the dedup key logic, do existing tests still exercise the old correct behavior?
|
||||
4. When the filter checks a field on a model struct, does my test verify that the field is
|
||||
actually populated by the collector? (Dead-code filter pattern: `9c5512d` `_fw_gpu_` check.)
|
||||
|
||||
If any answer is "no" — add the missing test before committing.
|
||||
|
||||
## Collector candidate-selection functions — mandatory coverage
|
||||
|
||||
Any function that selects paths for an expensive operation (probing, crawling, plan-B retry)
|
||||
**must** have tests covering:
|
||||
|
||||
| Axis | What to test | Why |
|
||||
|------|-------------|-----|
|
||||
| **Positive** | Paths that should be selected ARE selected | Proves the feature works |
|
||||
| **Negative** | Paths that should be excluded ARE excluded | Prevents runaway I/O |
|
||||
| **Topology integration** | Given a realistic `out` map, the count of selected paths matches expectations | Catches implicit coupling between the selector and the surrounding data shape |
|
||||
|
||||
### Worked example — NVMe post-probe regression (2026-03-12)
|
||||
|
||||
`shouldAdaptiveNVMeProbe` was added in `2fa4a12` for Supermicro NVMe backplanes that return
|
||||
`Members: []` but serve disks at `Disk.Bay.N` paths. No topology-level test was added.
|
||||
|
||||
When SYS-A21GE-NBRT (HGX B200) arrived, its 35 sub-chassis (GPU, NVSwitch, PCIeRetimer,
|
||||
ERoT, IRoT, BMC, FPGA) all have `ChassisType=Module/Component/Zone` and empty `/Drives` →
|
||||
all 35 passed the filter → 35 × 384 = 13 440 HTTP requests → 22 min extra per collection.
|
||||
|
||||
A topology integration test (`TestNVMePostProbeSkipsNonStorageChassis`) would have caught
|
||||
this at commit time: given GPU chassis + backplane, exactly 1 candidate must be selected.
|
||||
|
||||
**Required test matrix for any path-selection function:**
|
||||
|
||||
```
|
||||
TestXxx_SelectsTargetPath — the path that motivated the code IS selected
|
||||
TestXxx_SkipsIrrelevantPath — a path that must never be selected IS skipped
|
||||
TestXxx_TopologyCount — given a realistic multi-chassis map, selected count = N
|
||||
go test ./internal/collector/...
|
||||
go test ./internal/server/...
|
||||
go test ./internal/parser/vendors/...
|
||||
```
|
||||
|
||||
@@ -1,59 +1,41 @@
|
||||
# LOGPile Bible
|
||||
|
||||
> **Documentation language:** English only. All maintained project documentation must be written in English.
|
||||
>
|
||||
> **Architectural decisions:** Every significant architectural decision **must** be recorded in
|
||||
> [`10-decisions.md`](10-decisions.md) before or alongside the code change.
|
||||
>
|
||||
> **Single source of truth:** Architecture and technical design documentation belongs in `docs/bible/`.
|
||||
> Keep `README.md` and `CLAUDE.md` minimal to avoid duplicate documentation.
|
||||
`bible-local/` is the project-specific source of truth for LOGPile.
|
||||
Keep top-level docs minimal and put maintained architecture/API contracts here.
|
||||
|
||||
This directory is the single source of truth for LOGPile's architecture, design, and integration contracts.
|
||||
It is structured so that both humans and AI assistants can navigate it quickly.
|
||||
## Rules
|
||||
|
||||
---
|
||||
- Documentation language: English only
|
||||
- Update relevant bible files in the same change as the code
|
||||
- Record significant architectural decisions in [`10-decisions.md`](10-decisions.md)
|
||||
- Do not duplicate shared rules from `bible/`
|
||||
|
||||
## Reading Map (Hierarchical)
|
||||
## Read order
|
||||
|
||||
### 1. Foundations (read first)
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| [01-overview.md](01-overview.md) | Product scope, modes, non-goals |
|
||||
| [02-architecture.md](02-architecture.md) | Runtime structure, state, main flows |
|
||||
| [04-data-models.md](04-data-models.md) | Stable data contracts and canonical inventory |
|
||||
| [03-api.md](03-api.md) | HTTP endpoints and response contracts |
|
||||
| [05-collectors.md](05-collectors.md) | Live collection behavior |
|
||||
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor coverage |
|
||||
| [07-exporters.md](07-exporters.md) | Raw export, Reanimator export, batch convert |
|
||||
| [08-build-release.md](08-build-release.md) | Build and release workflow |
|
||||
| [09-testing.md](09-testing.md) | Test expectations and regression rules |
|
||||
| [10-decisions.md](10-decisions.md) | Architectural Decision Log |
|
||||
|
||||
| File | What it covers |
|
||||
|------|----------------|
|
||||
| [01-overview.md](01-overview.md) | Product purpose, operating modes, scope |
|
||||
| [02-architecture.md](02-architecture.md) | Runtime structure, control flow, in-memory state |
|
||||
| [04-data-models.md](04-data-models.md) | Core contracts (`AnalysisResult`, canonical `hardware.devices`) |
|
||||
## Fast orientation
|
||||
|
||||
### 2. Runtime Interfaces
|
||||
|
||||
| File | What it covers |
|
||||
|------|----------------|
|
||||
| [03-api.md](03-api.md) | HTTP API contracts and endpoint behavior |
|
||||
| [05-collectors.md](05-collectors.md) | Live collection connectors (Redfish, IPMI mock) |
|
||||
| [06-parsers.md](06-parsers.md) | Archive parser framework and vendor parsers |
|
||||
| [07-exporters.md](07-exporters.md) | CSV / JSON / Reanimator exports and integration mapping |
|
||||
|
||||
### 3. Delivery & Quality
|
||||
|
||||
| File | What it covers |
|
||||
|------|----------------|
|
||||
| [08-build-release.md](08-build-release.md) | Build, packaging, release workflow |
|
||||
| [09-testing.md](09-testing.md) | Testing expectations and verification guidance |
|
||||
|
||||
### 4. Governance (always current)
|
||||
|
||||
| File | What it covers |
|
||||
|------|----------------|
|
||||
| [10-decisions.md](10-decisions.md) | Architectural Decision Log (ADL) |
|
||||
|
||||
---
|
||||
|
||||
## Quick orientation for AI assistants
|
||||
|
||||
- Read order for most changes: `01` → `02` → `04` → relevant interface doc(s) → `10`
|
||||
- Entry point: `cmd/logpile/main.go`
|
||||
- HTTP server: `internal/server/` — handlers in `handlers.go`, routes in `server.go`
|
||||
- Data contracts: `internal/models/` — never break `AnalysisResult` JSON shape
|
||||
- Frontend contract: `web/static/js/app.js` — keep API responses stable
|
||||
- Canonical inventory: `hardware.devices` in `AnalysisResult` — source of truth for UI and exports
|
||||
- Parser registry: `internal/parser/vendors/` — `init()` auto-registration pattern
|
||||
- Collector registry: `internal/collector/registry.go`
|
||||
- HTTP layer: `internal/server/`
|
||||
- Core contracts: `internal/models/models.go`
|
||||
- Live collection: `internal/collector/`
|
||||
- Archive parsing: `internal/parser/`
|
||||
- Export conversion: `internal/exporter/`
|
||||
- Frontend consumer: `web/static/js/app.js`
|
||||
|
||||
## Maintenance rule
|
||||
|
||||
If a document becomes stale, either fix it immediately or delete it.
|
||||
Stale docs are worse than missing docs.
|
||||
|
||||
Reference in New Issue
Block a user