Update docs and add release artifacts

2026-02-25 12:17:17 +03:00
parent a4a1a19a94
commit 693b7346ab
9 changed files with 241 additions and 1 deletions
--- a/docs/bible/04-data-models.md
+++ b/docs/bible/04-data-models.md
@@ -22,6 +22,9 @@ Key top-level fields:
 | `sensors` | `[]SensorReading` | Sensor readings |
 | `raw_payloads` | `map[string]any` | Raw vendor data (e.g. `redfish_tree`) |

+`raw_payloads` is the durable source for offline re-analysis (especially for Redfish).
+Normalized fields should be treated as derivable output from raw source data.
+
 ### Hardware sub-structure

 ```
@@ -31,6 +34,7 @@ HardwareConfig
  ├── cpus         []CPU
  ├── memory       []MemoryDIMM
  ├── storage      []Storage
+  ├── volumes      []StorageVolume  — logical RAID/VROC volumes
  ├── pcie_devices []PCIeDevice
  ├── gpus         []GPU
  ├── network_adapters []NetworkAdapter
@@ -86,3 +90,15 @@ Carried by both `/api/status` and `/api/config`:

 Valid `source_type` values: `archive`, `api`
 Valid `protocol` values: `redfish`, `ipmi` (empty is allowed for archive uploads)
+
+---
+
+## Raw Export Package (reopenable artifact)
+
+`Export Raw Data` does not merely dump `AnalysisResult`; it emits a reopenable raw package
+(JSON or ZIP bundle) that carries source data required for re-analysis.
+
+Design rules:
+- raw source is authoritative (`redfish_tree` or original file bytes)
+- imports must re-analyze from raw source
+- parsed field snapshots included in bundles are diagnostic artifacts, not the source of truth
--- a/docs/bible/05-collectors.md
+++ b/docs/bible/05-collectors.md
@@ -46,15 +46,53 @@ Dynamic — does not assume fixed paths. Discovers:
 Full Redfish response tree is stored in `result.RawPayloads["redfish_tree"]`.
 This allows future offline re-analysis without re-collecting from a live BMC.

+### Unified Redfish analysis pipeline (live == replay)
+
+LOGPile uses a **single Redfish analyzer path**:
+
+1. Live collector crawls the Redfish API and builds `raw_payloads.redfish_tree`
+2. Parsed result is produced by replaying that tree through the same analyzer used by raw import
+
+This guarantees that live collection and `Export Raw Data` re-open/re-analyze produce the same
+normalized output for the same `redfish_tree`.
+
+### Snapshot crawler behavior (important)
+
+The Redfish snapshot crawler is intentionally:
+- **bounded** (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
+- **prioritized** (PCIe, Fabrics, FirmwareInventory, Storage, PowerSubsystem, ThermalSubsystem)
+- **tolerant** (skips noisy expected failures, strips `#fragment` from `@odata.id`)
+
+Design notes:
+- Queue capacity is sized to snapshot cap to avoid worker deadlocks on large trees.
+- UI progress is coarse and human-readable; detailed per-request diagnostics are available via debug logs.
+- `LOGPILE_REDFISH_DEBUG=1` and `LOGPILE_REDFISH_SNAPSHOT_DEBUG=1` enable console diagnostics.
+
 ### Parsing guidelines

 When adding Redfish mappings, follow these principles:
 - Support alternate collection paths (resources may appear at different odata URLs).
 - Follow `@odata.id` references and handle embedded `Members` arrays.
+- Prefer **raw-tree replay compatibility**: if live collector adds a fallback/probe, replay analyzer must mirror it.
 - Deduplicate by serial / BDF / slot+model (in that priority order).
 - Prefer tolerant/fallback parsing — missing fields should be silently skipped,
  not cause the whole collection to fail.

+### Vendor-specific storage fallbacks (Supermicro and similar)
+
+When standard `Storage/.../Drives` collections are empty, collector/replay may recover drives via:
+- `Storage.Links.Enclosures[*] -> .../Drives`
+- direct probing of finite `Disk.Bay` candidates (`Disk.Bay.0`, `Disk.Bay0`, `.../0`)
+
+This is required for some BMCs that publish drive inventory in vendor-specific paths while leaving
+standard collections empty.
+
+### PSU source preference (newer Redfish)
+
+PSU inventory source order:
+1. `Chassis/*/PowerSubsystem/PowerSupplies` (preferred on X14+/newer Redfish)
+2. `Chassis/*/Power` (legacy fallback)
+
 ### Progress reporting

 The collector emits progress log entries at each stage (connecting, enumerating systems,
--- a/docs/bible/07-exporters.md
+++ b/docs/bible/07-exporters.md
@@ -5,11 +5,42 @@
 | Endpoint | Format | Filename pattern |
 |----------|--------|-----------------|
 | `GET /api/export/csv` | CSV — serial numbers | `YYYY-MM-DD (MODEL) - SN.csv` |
-| `GET /api/export/json` | Full `AnalysisResult` JSON (incl. `raw_payloads`) | `YYYY-MM-DD (MODEL) - SN.json` |
+| `GET /api/export/json` | **Raw export package** (JSON or ZIP bundle) for reopen/re-analysis | `YYYY-MM-DD (MODEL) - SN.(json|zip)` |
 | `GET /api/export/reanimator` | Reanimator hardware JSON | `YYYY-MM-DD (MODEL) - SN.json` |

 ---

+## Raw Export (`Export Raw Data`)
+
+### Purpose
+
+Preserve enough source data to reproduce parsing later after parser fixes, without requiring
+another live collection from the target system.
+
+### Format
+
+`/api/export/json` returns a **raw export package**:
+- JSON package (machine-readable), or
+- ZIP bundle containing:
+  - `raw_export.json` — machine-readable package
+  - `collect.log` — human-readable collection + parsing summary
+  - `parser_fields.json` — structured parsed field snapshot for diffs between parser versions
+
+### Import / reopen behavior
+
+When a raw export package is uploaded back into LOGPile:
+- the app **re-analyzes from raw source**
+- it does **not** trust embedded parsed output as source of truth
+
+For Redfish, this means replay from `raw_payloads.redfish_tree`.
+
+### Design rule
+
+Raw export is a **re-analysis artifact**, not a final report dump. Keep it self-contained and
+forward-compatible where possible (versioned package format, additive fields only).
+
+---
+
 ## Reanimator Export

 ### Purpose
--- a/docs/bible/10-decisions.md
+++ b/docs/bible/10-decisions.md
@@ -111,4 +111,94 @@ Top-level `README.md` and `CLAUDE.md` must remain minimal pointers/instructions.

 ---

+## ADL-009 — Redfish analysis is performed from raw snapshot replay (unified tunnel)
+
+**Date:** 2026-02-24
+**Context:** Live Redfish collection and raw export re-analysis used different parsing paths,
+which caused drift and made bug fixes difficult to validate consistently.
+**Decision:** Redfish live collection must produce a `raw_payloads.redfish_tree` snapshot first,
+then run the same replay analyzer used for imported raw exports.
+**Consequences:**
+- Same `redfish_tree` input produces the same parsed result in live and offline modes.
+- Debugging parser issues can be done against exported raw bundles without live BMC access.
+- Snapshot completeness becomes critical; collector seeds/limits are part of analyzer correctness.
+
+---
+
+## ADL-010 — Raw export is a self-contained re-analysis package (not a final result dump)
+
+**Date:** 2026-02-24
+**Context:** Exporting only normalized `AnalysisResult` loses raw source fidelity and prevents
+future parser improvements from being applied to already collected data.
+**Decision:** `Export Raw Data` produces a self-contained raw package (JSON or ZIP bundle)
+that the application can reopen and re-analyze. Parsed data in the package is optional and not
+the source of truth on import.
+**Consequences:**
+- Re-opening an export always re-runs analysis from raw source (`redfish_tree` or uploaded file bytes).
+- Raw bundles include collection context and diagnostics for debugging (`collect.log`, `parser_fields.json`).
+- Endpoint compatibility is preserved (`/api/export/json`) while actual payload format may be a bundle.
+
+---
+
+## ADL-011 — Redfish snapshot crawler is bounded, prioritized, and failure-tolerant
+
+**Date:** 2026-02-24
+**Context:** Full Redfish trees on modern GPU systems are large, noisy, and contain many
+vendor-specific or non-fetchable links. Unbounded crawling and naive queue design caused hangs
+and incomplete snapshots.
+**Decision:** Use a bounded snapshot crawler with:
+- explicit document cap (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
+- priority seed paths (PCIe/Fabrics/Firmware/Storage/PowerSubsystem/ThermalSubsystem)
+- normalized `@odata.id` paths (strip `#fragment`)
+- noisy expected error filtering (404/405/410/501 hidden from UI)
+- queue capacity sized to crawl cap to avoid producer/consumer deadlock
+**Consequences:**
+- Snapshot collection remains stable on large BMC trees.
+- Most high-value inventory paths are reached before the cap.
+- UI progress remains useful while debug logs retain low-level fetch failures.
+
+---
+
+## ADL-012 — Vendor-specific storage inventory probing is allowed as fallback
+
+**Date:** 2026-02-24
+**Context:** Some Supermicro BMCs expose empty standard `Storage/.../Drives` collections while
+real disk inventory exists under vendor-specific `Disk.Bay` endpoints and enclosure links.
+**Decision:** When standard drive collections are empty, collector/replay may probe vendor-style
+`.../Drives/Disk.Bay.*` endpoints and follow `Storage.Links.Enclosures[*]` to recover physical drives.
+**Consequences:**
+- Higher storage inventory coverage on Supermicro HBA/HA-RAID/MRVL/NVMe backplane implementations.
+- Replay must mirror the same probing behavior to preserve deterministic results.
+- Probing remains bounded (finite candidate set) to avoid runaway requests.
+
+---
+
+## ADL-013 — PowerSubsystem is preferred over legacy Power on newer Redfish implementations
+
+**Date:** 2026-02-24
+**Context:** X14+/newer Redfish implementations increasingly expose authoritative PSU data in
+`PowerSubsystem/PowerSupplies`, while legacy `/Power` may be incomplete or schema-shifted.
+**Decision:** Prefer `Chassis/*/PowerSubsystem/PowerSupplies` as the primary PSU source and use
+legacy `Chassis/*/Power` as fallback.
+**Consequences:**
+- Better compatibility with newer BMC firmware generations.
+- Legacy systems remain supported without special-case collector selection.
+- Snapshot priority seeds must include `PowerSubsystem` resources.
+
+---
+
+## ADL-014 — Threshold logic lives on the server; UI reflects status only
+
+**Date:** 2026-02-24
+**Context:** Duplicating threshold math in frontend and backend creates drift and inconsistent
+highlighting (e.g. PSU mains voltage range checks).
+**Decision:** Business threshold evaluation (e.g. PSU voltage nominal range) must be computed on
+the server; frontend only renders status/flags returned by the API.
+**Consequences:**
+- Single source of truth for threshold policies.
+- UI can evolve visually without re-implementing domain logic.
+- API payloads may carry richer status semantics over time.
+
+---
+
 <!-- Add new decisions below this line using the format above -->