Files
logpile/bible-local/10-decisions.md
Michael Chus 78806f9fa0 Add shared bible submodule, rename local bible to bible-local
- Add bible.git as submodule at bible/
- Move docs/bible/ → bible-local/ (project-specific architecture)
- Update CLAUDE.md to reference both bible/ and bible-local/
- Add AGENTS.md for Codex with same structure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 16:38:57 +03:00

205 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 10 — Architectural Decision Log (ADL)
> **Rule:** Every significant architectural decision **must be recorded here** before or alongside
> the code change. This applies to humans and AI assistants alike.
>
> Format: date · title · context · decision · consequences
---
## ADL-001 — In-memory only state (no database)
**Date:** project start
**Context:** LOGPile is designed as a standalone diagnostic tool, not a persistent service.
**Decision:** All parsed/collected data lives in `Server.result` (in-memory). No database, no files written.
**Consequences:**
- Data is lost on process restart — intentional.
- Simple deployment: single binary, no setup required.
- JSON export is the persistence mechanism for users who want to save results.
---
## ADL-002 — Vendor parser auto-registration via init()
**Date:** project start
**Context:** Need an extensible parser registry without a central factory function.
**Decision:** Each vendor parser registers itself in its package's `init()` function.
`vendors/vendors.go` holds blank imports to trigger registration.
**Consequences:**
- Adding a new parser requires only: implement interface + add one blank import.
- No central list to maintain (other than the import file).
- `go test ./...` will include new parsers automatically.
---
## ADL-003 — Highest-confidence parser wins
**Date:** project start
**Context:** Multiple parsers may partially match an archive (e.g. generic + specific vendor).
**Decision:** Run all parsers' `Detect()`, select the one returning the highest score (0100).
**Consequences:**
- Generic fallback (score 15) only activates when no vendor parser scores higher.
- Parsers must be conservative with high scores (70+) to avoid false positives.
---
## ADL-004 — Canonical hardware.devices as single source of truth
**Date:** v1.5.0
**Context:** UI tabs and Reanimator exporter were reading from different sub-fields of
`AnalysisResult`, causing potential drift.
**Decision:** Introduce `hardware.devices` as the canonical inventory repository.
All UI tabs and all exporters must read exclusively from this repository.
**Consequences:**
- Any UI vs Reanimator discrepancy is classified as a bug, not a "known difference".
- Deduplication logic runs once in the repository builder (serial → bdf → distinct).
- New hardware attributes must be added to canonical schema first, then mapped to consumers.
---
## ADL-005 — No hardcoded PCI model strings; use pci.ids
**Date:** v1.5.0
**Context:** NVIDIA and other vendors release new GPU models frequently; hardcoded maps
required code changes for each new model ID.
**Decision:** Use the `pciutils/pciids` database (git submodule, embedded at build time).
PCI vendor/device ID → human-readable model name via lookup.
**Consequences:**
- New GPU models can be supported by updating `pci.ids` without code changes.
- `make build` auto-syncs `pci.ids` from submodule before compilation.
- External override via `LOGPILE_PCI_IDS_PATH` env var.
---
## ADL-006 — Reanimator export uses canonical hardware.devices (not raw sub-fields)
**Date:** v1.5.0
**Context:** Early Reanimator exporter read from `Hardware.GPUs`, `Hardware.NICs`, etc.
directly, diverging from UI data.
**Decision:** Reanimator exporter must use `hardware.devices` — the same source as the UI.
Exporter groups/filters canonical records by section; does not rebuild from sub-fields.
**Consequences:**
- Guarantees UI and export consistency.
- Exporter code is simpler — mainly a filter+map, not a data reconstruction.
---
## ADL-007 — Documentation language is English
**Date:** 2026-02-20
**Context:** Codebase documentation was mixed Russian/English, reducing clarity for
international contributors and AI assistants.
**Decision:** All maintained project documentation (`docs/bible/`, `README.md`,
`CLAUDE.md`, and new technical docs) must be written in English.
**Consequences:**
- Bible is authoritative in English.
- AI assistants get consistent, unambiguous context.
---
## ADL-008 — Bible is the single source of truth for architecture docs
**Date:** 2026-02-23
**Context:** Architecture information was duplicated across `README.md`, `CLAUDE.md`,
and the Bible, creating drift risk and stale guidance for humans and AI agents.
**Decision:** Keep architecture and technical design documentation only in `docs/bible/`.
Top-level `README.md` and `CLAUDE.md` must remain minimal pointers/instructions.
**Consequences:**
- Reduces documentation drift and duplicate updates.
- AI assistants are directed to one authoritative source before making changes.
- Documentation updates that affect architecture must include Bible changes (and ADL entries when significant).
---
## ADL-009 — Redfish analysis is performed from raw snapshot replay (unified tunnel)
**Date:** 2026-02-24
**Context:** Live Redfish collection and raw export re-analysis used different parsing paths,
which caused drift and made bug fixes difficult to validate consistently.
**Decision:** Redfish live collection must produce a `raw_payloads.redfish_tree` snapshot first,
then run the same replay analyzer used for imported raw exports.
**Consequences:**
- Same `redfish_tree` input produces the same parsed result in live and offline modes.
- Debugging parser issues can be done against exported raw bundles without live BMC access.
- Snapshot completeness becomes critical; collector seeds/limits are part of analyzer correctness.
---
## ADL-010 — Raw export is a self-contained re-analysis package (not a final result dump)
**Date:** 2026-02-24
**Context:** Exporting only normalized `AnalysisResult` loses raw source fidelity and prevents
future parser improvements from being applied to already collected data.
**Decision:** `Export Raw Data` produces a self-contained raw package (JSON or ZIP bundle)
that the application can reopen and re-analyze. Parsed data in the package is optional and not
the source of truth on import.
**Consequences:**
- Re-opening an export always re-runs analysis from raw source (`redfish_tree` or uploaded file bytes).
- Raw bundles include collection context and diagnostics for debugging (`collect.log`, `parser_fields.json`).
- Endpoint compatibility is preserved (`/api/export/json`) while actual payload format may be a bundle.
---
## ADL-011 — Redfish snapshot crawler is bounded, prioritized, and failure-tolerant
**Date:** 2026-02-24
**Context:** Full Redfish trees on modern GPU systems are large, noisy, and contain many
vendor-specific or non-fetchable links. Unbounded crawling and naive queue design caused hangs
and incomplete snapshots.
**Decision:** Use a bounded snapshot crawler with:
- explicit document cap (`LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`)
- priority seed paths (PCIe/Fabrics/Firmware/Storage/PowerSubsystem/ThermalSubsystem)
- normalized `@odata.id` paths (strip `#fragment`)
- noisy expected error filtering (404/405/410/501 hidden from UI)
- queue capacity sized to crawl cap to avoid producer/consumer deadlock
**Consequences:**
- Snapshot collection remains stable on large BMC trees.
- Most high-value inventory paths are reached before the cap.
- UI progress remains useful while debug logs retain low-level fetch failures.
---
## ADL-012 — Vendor-specific storage inventory probing is allowed as fallback
**Date:** 2026-02-24
**Context:** Some Supermicro BMCs expose empty standard `Storage/.../Drives` collections while
real disk inventory exists under vendor-specific `Disk.Bay` endpoints and enclosure links.
**Decision:** When standard drive collections are empty, collector/replay may probe vendor-style
`.../Drives/Disk.Bay.*` endpoints and follow `Storage.Links.Enclosures[*]` to recover physical drives.
**Consequences:**
- Higher storage inventory coverage on Supermicro HBA/HA-RAID/MRVL/NVMe backplane implementations.
- Replay must mirror the same probing behavior to preserve deterministic results.
- Probing remains bounded (finite candidate set) to avoid runaway requests.
---
## ADL-013 — PowerSubsystem is preferred over legacy Power on newer Redfish implementations
**Date:** 2026-02-24
**Context:** X14+/newer Redfish implementations increasingly expose authoritative PSU data in
`PowerSubsystem/PowerSupplies`, while legacy `/Power` may be incomplete or schema-shifted.
**Decision:** Prefer `Chassis/*/PowerSubsystem/PowerSupplies` as the primary PSU source and use
legacy `Chassis/*/Power` as fallback.
**Consequences:**
- Better compatibility with newer BMC firmware generations.
- Legacy systems remain supported without special-case collector selection.
- Snapshot priority seeds must include `PowerSubsystem` resources.
---
## ADL-014 — Threshold logic lives on the server; UI reflects status only
**Date:** 2026-02-24
**Context:** Duplicating threshold math in frontend and backend creates drift and inconsistent
highlighting (e.g. PSU mains voltage range checks).
**Decision:** Business threshold evaluation (e.g. PSU voltage nominal range) must be computed on
the server; frontend only renders status/flags returned by the API.
**Consequences:**
- Single source of truth for threshold policies.
- UI can evolve visually without re-implementing domain logic.
- API payloads may carry richer status semantics over time.
---
<!-- Add new decisions below this line using the format above -->