refactor: unified ingest pipeline + modular Redfish profile framework

Implement the full architectural plan: unified ingest.Service entry point
for archive and Redfish payloads, modular redfishprofile package with
composable profiles (generic, ami-family, msi, supermicro, dell,
hgx-topology), score-based profile matching with fallback expansion mode,
and profile-driven acquisition/analysis plans.

Vendor-specific logic moved out of common executors and into profile hooks.
GPU chassis lookup strategies and known storage recovery collections
(IntelVROC/HA-RAID/MRVL) now live in ResolvedAnalysisPlan, populated by
profiles at analysis time. Replay helpers read from the plan; no hardcoded
path lists remain in generic code.

Also splits redfish_replay.go into domain modules (gpu, storage, inventory,
fru, profiles) and adds full fixture/matcher/directive test coverage
including Dell, AMI, unknown-vendor fallback, and deterministic ordering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mikhail Chusavitin
2026-03-18 08:48:58 +03:00
parent d8d3d8c524
commit d650a6ba1c
45 changed files with 5231 additions and 1011 deletions

View File

@@ -20,6 +20,7 @@ LOGPile remains responsible for upload, collection, parsing, normalization, and
```text
cmd/logpile/main.go entrypoint and CLI flags
internal/server/ HTTP handlers, jobs, upload/export flows
internal/ingest/ source-family orchestration for upload and raw replay
internal/collector/ live collection and Redfish replay
internal/analyzer/ shared analysis helpers
internal/parser/ archive extraction and parser dispatch
@@ -50,18 +51,21 @@ Failed or canceled jobs do not overwrite the previous dataset.
### Upload
1. `POST /api/upload` receives multipart field `archive`
2. JSON inputs are checked for raw-export package or `AnalysisResult` snapshot
3. Non-JSON inputs go through `parser.BMCParser`
4. Archive metadata is normalized onto `AnalysisResult`
5. Result becomes the current in-memory dataset
2. `internal/ingest.Service` resolves the source family
3. JSON inputs are checked for raw-export package or `AnalysisResult` snapshot
4. Non-JSON archives go through the archive parser family
5. Archive metadata is normalized onto `AnalysisResult`
6. Result becomes the current in-memory dataset
### Live collect
1. `POST /api/collect` validates request fields
2. Server creates an async job and returns `202 Accepted`
3. Selected collector gathers raw data
4. For Redfish, collector saves `raw_payloads.redfish_tree`
5. Result is normalized, source metadata applied, and state replaced on success
4. For Redfish, collector runs minimal discovery, matches Redfish profiles, and builds an acquisition plan
5. Collector applies profile tuning hints (for example crawl breadth, prefetch, bounded plan-B passes)
6. Collector saves `raw_payloads.redfish_tree` plus acquisition diagnostics
7. Result is normalized, source metadata applied, and state replaced on success
### Batch convert
@@ -76,6 +80,10 @@ Failed or canceled jobs do not overwrite the previous dataset.
Live Redfish collection and offline Redfish re-analysis must use the same replay path.
The collector first captures `raw_payloads.redfish_tree`, then the replay logic builds the normalized result.
Redfish is being split into two coordinated phases:
- acquisition: profile-driven snapshot collection strategy
- analysis: replay over the saved snapshot with the same profile framework
## PCI IDs lookup
Lookup order:

View File

@@ -6,6 +6,12 @@ Core files:
- `registry.go` for protocol registration
- `redfish.go` for live collection
- `redfish_replay.go` for replay from raw payloads
- `redfish_replay_gpu.go` for profile-driven GPU replay collectors and GPU fallback helpers
- `redfish_replay_storage.go` for profile-driven storage replay collectors and storage recovery helpers
- `redfish_replay_inventory.go` for replay inventory collectors (PCIe, NIC, BMC MAC, NIC enrichment)
- `redfish_replay_fru.go` for board fallback helpers and Assembly/FRU replay extraction
- `redfish_replay_profiles.go` for profile-driven replay helpers and vendor-aware recovery helpers
- `redfishprofile/` for Redfish profile matching and acquisition/analysis hooks
- `ipmi_mock.go` for the placeholder IPMI implementation
- `types.go` for request/progress contracts
@@ -50,11 +56,72 @@ It discovers and follows Redfish resources dynamically from root collections suc
- `Chassis`
- `Managers`
After minimal discovery the collector builds `MatchSignals` and selects a Redfish profile mode:
- `matched` when one or more profiles score with high confidence
- `fallback` when vendor/platform confidence is low; in this mode the collector aggregates safe additive profile probes to maximize snapshot completeness
Profile modules may contribute:
- primary acquisition seeds
- bounded `PlanBPaths` for secondary recovery
- critical paths
- acquisition notes/diagnostics
- tuning hints such as snapshot document cap, prefetch behavior, and expensive post-probe toggles
- post-probe policy for numeric collection recovery, direct NVMe `Disk.Bay` recovery, and sensor post-probe enablement
- recovery policy for critical collection member retry, slow numeric plan-B probing, and profile-specific plan-B activation
- scoped path policy for discovered `Systems/*`, `Chassis/*`, and `Managers/*` branches when a profile needs extra seeds/critical targets beyond the vendor-neutral core set
- prefetch policy for which critical paths are eligible for adaptive prefetch and which path shapes are explicitly excluded
Model- or topology-specific `CriticalPaths` and profile `PlanBPaths` must live in the profile
module that owns the behavior. The collector core may execute those paths, but it should not
hardcode vendor-specific recovery targets.
The same rule applies to expensive post-probe decisions: the collector core may execute bounded
post-probe loops, but profiles own whether those loops are enabled for a given platform shape.
The same rule applies to critical recovery passes: the collector core may run bounded plan-B
loops, but profiles own whether member retry, slow numeric recovery, and profile-specific plan-B
passes are enabled.
When a profile needs extra discovered-path branches such as storage controller subtrees, it must
provide them as scoped suffix policy rather than by hardcoding platform-shaped suffixes into the
collector core baseline seed list.
The same applies to prefetch shaping: the collector core may execute adaptive prefetch, but
profiles own the include/exclude rules for which critical paths should participate.
The same applies to critical inventory shaping: the collector core should keep only a minimal
vendor-neutral critical baseline, while profiles own additional system/chassis/manager critical
suffixes and top-level critical targets.
Resolved live acquisition plans should be built inside `redfishprofile/`, not by hand in
`redfish.go`. The collector core should receive discovered resources plus the selected profile
plan and then execute the resolved seed/critical paths.
When profile behavior depends on what discovery actually returned, use a post-discovery
refinement hook in `redfishprofile/` instead of hardcoding guessed absolute paths in the static
plan. MSI GPU chassis refinement is the reference example.
Live Redfish collection must expose profile-match diagnostics:
- collector logs must include the selected modules and score for every known module
- job status responses must carry structured `active_modules` and `module_scores`
- the collect page should render active modules as chips from structured status data, not by
parsing log lines
On replay, profile-derived analysis directives may enable vendor-specific inventory linking
helpers such as processor-GPU fallback, chassis-ID alias resolution, and bounded storage recovery.
Replay should now resolve a structured analysis plan inside `redfishprofile/`, analogous to the
live acquisition plan. The replay core may execute collectors against the resolved directives, but
snapshot-aware vendor decisions should live in profile analysis hooks, not in `redfish_replay.go`.
GPU and storage replay executors should consume the resolved analysis plan directly, not a raw
`AnalysisDirectives` struct, so the boundary between planning and execution stays explicit.
Profile matching and acquisition tuning must be regression-tested against repo-owned compact
fixtures under `internal/collector/redfishprofile/testdata/`, derived from representative
raw-export snapshots, for at least MSI and Supermicro shapes.
When multiple raw-export snapshots exist for the same platform, profile selection must remain
stable across those sibling fixtures unless the topology actually changes.
Analysis-plan metadata should be stored in replay raw payloads so vendor hook activation is
debuggable offline.
### Stored raw data
Important raw payloads:
- `raw_payloads.redfish_tree`
- `raw_payloads.redfish_fetch_errors`
- `raw_payloads.redfish_profiles`
- `raw_payloads.source_timezone` when available
### Snapshot crawler rules
@@ -68,7 +135,7 @@ Important raw payloads:
When changing collection logic:
1. Prefer alternate-path support over vendor hardcoding
1. Prefer profile modules over ad-hoc vendor branches in the collector core
2. Keep expensive probing bounded
3. Deduplicate by serial, then BDF, then location/model fallbacks
4. Preserve replay determinism from saved raw payloads

View File

@@ -274,6 +274,188 @@ for `Enclosure`, `RackMount`, and any unrecognised type (fail-safe).
both the excluded types and the storage-capable types (see `TestChassisTypeCanHaveNVMe`
and `TestNVMePostProbeSkipsNonStorageChassis`).
## ADL-019 — Redfish post-probe recovery is profile-owned acquisition policy
**Date:** 2026-03-18
**Context:** Numeric collection post-probe and direct NVMe `Disk.Bay` recovery were still
controlled by collector-core heuristics, which kept platform-specific acquisition behavior in
`redfish.go` and made vendor/topology refactoring incomplete.
**Decision:** Move expensive Redfish post-probe enablement into profile-owned acquisition policy.
The collector core may execute bounded post-probe loops, but profiles must explicitly enable:
- numeric collection post-probe
- direct NVMe `Disk.Bay` recovery
- sensor collection post-probe
**Consequences:**
- Generic collector flow no longer implicitly turns on storage/NVMe recovery for every platform.
- Supermicro-specific direct NVMe recovery and generic numeric collection recovery are now
regression-tested through profile fixtures.
- Future platform storage/post-probe behavior must be added through profile tuning, not new
vendor-shaped `if` branches in collector core.
## ADL-020 — Redfish critical plan-B activation is profile-owned recovery policy
**Date:** 2026-03-18
**Context:** `critical plan-B` and `profile plan-B` were still effectively always-on collector
behavior once paths were present, including critical collection member retry and slow numeric
child probing. That kept acquisition recovery semantics in `redfish.go` instead of the profile
layer.
**Decision:** Move plan-B activation into profile-owned recovery policy. Profiles must explicitly
enable:
- critical collection member retry
- slow numeric probing during critical plan-B
- profile-specific plan-B pass
**Consequences:**
- Recovery behavior is now observable in raw Redfish diagnostics alongside other tuning.
- Generic/fallback recovery remains available through profile policy instead of implicit collector
defaults.
- Future platform-specific plan-B behavior must be introduced through profile tuning and tests,
not through new unconditional collector branches.
## ADL-021 — Extra discovered-path storage seeds must be profile-scoped, not core-baseline
**Date:** 2026-03-18
**Context:** The collector core baseline seed list still contained storage-specific discovered-path
suffixes such as `SimpleStorage` and `Storage/IntelVROC/*`. These are useful on some platforms,
but they are acquisition extensions layered on top of discovered `Systems/*` resources, not part
of the minimal vendor-neutral Redfish baseline.
**Decision:** Move such discovered-path expansions into profile-owned scoped path policy. The
collector core keeps the vendor-neutral baseline; profiles may add extra system/chassis/manager
suffixes that are expanded over discovered members during acquisition planning.
**Consequences:**
- Platform-shaped storage discovery no longer lives in `redfish.go` baseline seed construction.
- Extra discovered-path branches are visible in plan diagnostics and fixture regression tests.
- Future model/vendor storage path expansions must be added through scoped profile policy instead
of editing the shared baseline seed list.
## ADL-022 — Adaptive prefetch eligibility is profile-owned policy
**Date:** 2026-03-18
**Context:** The adaptive prefetch executor was still driven by hardcoded include/exclude path
rules in `redfish.go`. That made GPU/storage/network prefetch shaping part of collector-core
knowledge rather than profile-owned acquisition policy.
**Decision:** Move prefetch eligibility rules into profile tuning. The collector core still runs
adaptive prefetch, but profiles provide:
- `IncludeSuffixes` for critical paths eligible for prefetch
- `ExcludeContains` for path shapes that must never be prefetched
**Consequences:**
- Prefetch behavior is now visible in raw Redfish diagnostics and test fixtures.
- Platform- or topology-specific prefetch shaping no longer requires editing collector-core
string lists.
- Future prefetch tuning must be introduced through profiles and regression tests.
## ADL-023 — Core critical baseline is roots-only; critical shaping is profile-owned
**Date:** 2026-03-18
**Context:** `redfishCriticalEndpoints(...)` still encoded a broad set of system/chassis/manager
critical branches directly in collector core. This mixed minimal crawl invariants with profile-
specific acquisition shaping.
**Decision:** Reduce collector-core critical baseline to vendor-neutral roots only:
- `/redfish/v1`
- discovered `Systems/*`
- discovered `Chassis/*`
- discovered `Managers/*`
Profiles now own additional critical shaping through:
- scoped critical suffix policy for discovered resources
- explicit top-level `CriticalPaths`
**Consequences:**
- Critical inventory breadth is now explained by the acquisition plan, not hidden in collector
helper defaults.
- Generic profile still provides the previous broad critical coverage, so behavior stays stable.
- Future critical-path tuning must be implemented in profiles and regression-tested there.
## ADL-024 — Live Redfish execution plans are resolved inside redfishprofile
**Date:** 2026-03-18
**Context:** Even after moving seeds, scoped paths, critical shaping, recovery, and prefetch
policy into profiles, `redfish.go` still manually merged discovered resources with those policy
fragments. That left acquisition-plan resolution logic in collector core.
**Decision:** Introduce `redfishprofile.ResolveAcquisitionPlan(...)` as the boundary between
profile planning and collector execution. `redfishprofile` now resolves:
- baseline seeds
- baseline critical roots
- scoped path expansions
- explicit profile seed/critical/plan-B paths
The collector core consumes the resolved plan and executes it.
**Consequences:**
- Acquisition planning logic is now testable in `redfishprofile` without going through the live
collector.
- `redfish.go` no longer owns path-resolution helpers for seeds/critical planning.
- This creates a clean next step toward true per-profile acquisition hooks beyond static policy
fragments.
## ADL-025 — Post-discovery acquisition refinement belongs to profile hooks
**Date:** 2026-03-18
**Context:** Some acquisition behavior depends not only on vendor/model hints, but on what the
lightweight Redfish discovery actually returned. Static absolute path lists in profile plans are
too rigid for such cases and reintroduce guessed platform knowledge.
**Decision:** Add a post-discovery acquisition refinement hook to Redfish profiles. Profiles may
mutate the resolved execution plan after discovered `Systems/*`, `Chassis/*`, and `Managers/*`
are known.
First concrete use:
- MSI now derives GPU chassis seeds and `.../Sensors` critical/plan-B paths from discovered
`Chassis/GPU*` resources instead of hardcoded `GPU1..GPU4` absolute paths in the static plan.
Additional use:
- Supermicro now derives `UpdateService/Oem/Supermicro/FirmwareInventory` critical/plan-B paths
from resource hints instead of carrying that absolute path in the static plan.
Additional use:
- Dell now derives `Managers/iDRAC.Embedded.*` acquisition paths from discovered manager
resources instead of carrying `Managers/iDRAC.Embedded.1` as a static absolute path.
**Consequences:**
- Profile modules can react to actual discovery results without pushing conditional logic back
into `redfish.go`.
- Diagnostics still show the final refined plan because the collector stores the refined plan,
not only the pre-refinement template.
- Future vendor-specific discovery-dependent acquisition behavior should be implemented through
this hook rather than new collector-core branches.
## ADL-026 — Replay analysis uses a resolved profile plan, not ad-hoc directives only
**Date:** 2026-03-18
**Context:** Replay still relied on a flat `AnalysisDirectives` struct assembled centrally,
while vendor-specific conditions often depended on the actual snapshot shape. That made analysis
behavior harder to explain and kept too much vendor logic in generic replay collectors.
**Decision:** Introduce `redfishprofile.ResolveAnalysisPlan(...)` for replay. The resolved
analysis plan contains:
- active match result
- resolved analysis directives
- analysis notes explaining snapshot-aware hook activation
Profiles may refine this plan using the snapshot and discovered resources before replay collectors
run.
First concrete uses:
- MSI enables processor-GPU fallback and MSI chassis lookup only when the snapshot actually
contains GPU processors and `Chassis/GPU*`
- HGX enables processor-GPU alias fallback from actual HGX/GPU_SXM topology signals in the snapshot
- Supermicro enables NVMe backplane and known-controller recovery from actual snapshot paths
**Consequences:**
- Replay behavior is now closer to the acquisition architecture: a resolved profile plan feeds the
executor.
- `redfish_analysis_plan` is stored in raw payload metadata for offline debugging.
- Future analysis-side vendor logic should move into profile refinement hooks instead of growing the
central directive builder.
## ADL-027 — Replay GPU/storage executors consume resolved analysis plans
**Date:** 2026-03-18
**Context:** Even after introducing `ResolveAnalysisPlan(...)`, replay GPU/storage collectors still
accepted a raw `AnalysisDirectives` struct. That preserved an implicit shortcut from the old design
and weakened the plan/executor boundary.
**Decision:** Replay GPU/storage executors now accept `redfishprofile.ResolvedAnalysisPlan`
directly. The executor reads resolved directives from the plan instead of being passed a standalone
directive bundle.
**Consequences:**
- GPU and storage replay execution now follows the same architectural pattern as acquisition:
resolve plan first, execute second.
- Future profile-owned execution helpers can use plan notes or additional resolved fields without
changing the executor API again.
- Remaining replay areas should migrate the same way instead of continuing to accept raw directive
structs.
## ADL-019 — isDeviceBoundFirmwareName must cover vendor-specific naming patterns per vendor
**Date:** 2026-03-12
@@ -604,3 +786,39 @@ presentation drift and duplicated UI logic.
- The host UI becomes a service shell around the viewer instead of maintaining its own
field-by-field tabs.
- `internal/chart` must be updated explicitly as a git submodule when the viewer changes.
---
## ADL-031 — Redfish uses profile-driven acquisition and unified ingest entrypoints
**Date:** 2026-03-17
**Context:**
Redfish collection had accumulated platform-specific probing in the shared collector path, while
upload and raw-export replay still entered analysis through direct handler branches. This made
vendor/model tuning harder to contain and increased regression risk when one topology needed a
special acquisition strategy.
**Decision:**
- Introduce `internal/ingest.Service` as the internal source-family entrypoint for archive parsing
and Redfish raw replay.
- Introduce `internal/collector/redfishprofile/` for Redfish profile matching and modular hooks.
- Split Redfish behavior into coordinated phases:
- acquisition planning during live collection
- analysis hooks during snapshot replay
- Use score-based profile matching. If confidence is low, enter fallback acquisition mode and
aggregate only safe additive profile probes.
- Allow profile modules to provide bounded acquisition tuning hints such as crawl cap, prefetch
behavior, and expensive post-probe toggles.
- Allow profile modules to own model-specific `CriticalPaths` and bounded `PlanBPaths` so vendor
recovery targets stop leaking into the collector core.
- Expose Redfish profile matching as structured diagnostics during live collection: logs must
contain all module scores, and collect job status must expose active modules for the UI.
**Consequences:**
- Server handlers stop owning parser-vs-replay branching details directly.
- Vendor/model-specific Redfish logic gets an explicit module boundary.
- Unknown-vendor Redfish collection becomes slower but more complete by design.
- Tactical Redfish fixes should move into profile modules instead of widening generic replay logic.
- Repo-owned compact fixtures under `internal/collector/redfishprofile/testdata/`, derived from
representative raw-export snapshots, are used to lock profile matching and acquisition tuning
for known MSI and Supermicro-family shapes.