refactor: unified ingest pipeline + modular Redfish profile framework
Implement the full architectural plan: unified ingest.Service entry point for archive and Redfish payloads, modular redfishprofile package with composable profiles (generic, ami-family, msi, supermicro, dell, hgx-topology), score-based profile matching with fallback expansion mode, and profile-driven acquisition/analysis plans. Vendor-specific logic moved out of common executors and into profile hooks. GPU chassis lookup strategies and known storage recovery collections (IntelVROC/HA-RAID/MRVL) now live in ResolvedAnalysisPlan, populated by profiles at analysis time. Replay helpers read from the plan; no hardcoded path lists remain in generic code. Also splits redfish_replay.go into domain modules (gpu, storage, inventory, fru, profiles) and adds full fixture/matcher/directive test coverage including Dell, AMI, unknown-vendor fallback, and deterministic ordering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -20,6 +20,7 @@ LOGPile remains responsible for upload, collection, parsing, normalization, and
|
||||
```text
|
||||
cmd/logpile/main.go entrypoint and CLI flags
|
||||
internal/server/ HTTP handlers, jobs, upload/export flows
|
||||
internal/ingest/ source-family orchestration for upload and raw replay
|
||||
internal/collector/ live collection and Redfish replay
|
||||
internal/analyzer/ shared analysis helpers
|
||||
internal/parser/ archive extraction and parser dispatch
|
||||
@@ -50,18 +51,21 @@ Failed or canceled jobs do not overwrite the previous dataset.
|
||||
### Upload
|
||||
|
||||
1. `POST /api/upload` receives multipart field `archive`
|
||||
2. JSON inputs are checked for raw-export package or `AnalysisResult` snapshot
|
||||
3. Non-JSON inputs go through `parser.BMCParser`
|
||||
4. Archive metadata is normalized onto `AnalysisResult`
|
||||
5. Result becomes the current in-memory dataset
|
||||
2. `internal/ingest.Service` resolves the source family
|
||||
3. JSON inputs are checked for raw-export package or `AnalysisResult` snapshot
|
||||
4. Non-JSON archives go through the archive parser family
|
||||
5. Archive metadata is normalized onto `AnalysisResult`
|
||||
6. Result becomes the current in-memory dataset
|
||||
|
||||
### Live collect
|
||||
|
||||
1. `POST /api/collect` validates request fields
|
||||
2. Server creates an async job and returns `202 Accepted`
|
||||
3. Selected collector gathers raw data
|
||||
4. For Redfish, collector saves `raw_payloads.redfish_tree`
|
||||
5. Result is normalized, source metadata applied, and state replaced on success
|
||||
4. For Redfish, collector runs minimal discovery, matches Redfish profiles, and builds an acquisition plan
|
||||
5. Collector applies profile tuning hints (for example crawl breadth, prefetch, bounded plan-B passes)
|
||||
6. Collector saves `raw_payloads.redfish_tree` plus acquisition diagnostics
|
||||
7. Result is normalized, source metadata applied, and state replaced on success
|
||||
|
||||
### Batch convert
|
||||
|
||||
@@ -76,6 +80,10 @@ Failed or canceled jobs do not overwrite the previous dataset.
|
||||
Live Redfish collection and offline Redfish re-analysis must use the same replay path.
|
||||
The collector first captures `raw_payloads.redfish_tree`, then the replay logic builds the normalized result.
|
||||
|
||||
Redfish is being split into two coordinated phases:
|
||||
- acquisition: profile-driven snapshot collection strategy
|
||||
- analysis: replay over the saved snapshot with the same profile framework
|
||||
|
||||
## PCI IDs lookup
|
||||
|
||||
Lookup order:
|
||||
|
||||
@@ -6,6 +6,12 @@ Core files:
|
||||
- `registry.go` for protocol registration
|
||||
- `redfish.go` for live collection
|
||||
- `redfish_replay.go` for replay from raw payloads
|
||||
- `redfish_replay_gpu.go` for profile-driven GPU replay collectors and GPU fallback helpers
|
||||
- `redfish_replay_storage.go` for profile-driven storage replay collectors and storage recovery helpers
|
||||
- `redfish_replay_inventory.go` for replay inventory collectors (PCIe, NIC, BMC MAC, NIC enrichment)
|
||||
- `redfish_replay_fru.go` for board fallback helpers and Assembly/FRU replay extraction
|
||||
- `redfish_replay_profiles.go` for profile-driven replay helpers and vendor-aware recovery helpers
|
||||
- `redfishprofile/` for Redfish profile matching and acquisition/analysis hooks
|
||||
- `ipmi_mock.go` for the placeholder IPMI implementation
|
||||
- `types.go` for request/progress contracts
|
||||
|
||||
@@ -50,11 +56,72 @@ It discovers and follows Redfish resources dynamically from root collections suc
|
||||
- `Chassis`
|
||||
- `Managers`
|
||||
|
||||
After minimal discovery the collector builds `MatchSignals` and selects a Redfish profile mode:
|
||||
- `matched` when one or more profiles score with high confidence
|
||||
- `fallback` when vendor/platform confidence is low; in this mode the collector aggregates safe additive profile probes to maximize snapshot completeness
|
||||
|
||||
Profile modules may contribute:
|
||||
- primary acquisition seeds
|
||||
- bounded `PlanBPaths` for secondary recovery
|
||||
- critical paths
|
||||
- acquisition notes/diagnostics
|
||||
- tuning hints such as snapshot document cap, prefetch behavior, and expensive post-probe toggles
|
||||
- post-probe policy for numeric collection recovery, direct NVMe `Disk.Bay` recovery, and sensor post-probe enablement
|
||||
- recovery policy for critical collection member retry, slow numeric plan-B probing, and profile-specific plan-B activation
|
||||
- scoped path policy for discovered `Systems/*`, `Chassis/*`, and `Managers/*` branches when a profile needs extra seeds/critical targets beyond the vendor-neutral core set
|
||||
- prefetch policy for which critical paths are eligible for adaptive prefetch and which path shapes are explicitly excluded
|
||||
|
||||
Model- or topology-specific `CriticalPaths` and profile `PlanBPaths` must live in the profile
|
||||
module that owns the behavior. The collector core may execute those paths, but it should not
|
||||
hardcode vendor-specific recovery targets.
|
||||
The same rule applies to expensive post-probe decisions: the collector core may execute bounded
|
||||
post-probe loops, but profiles own whether those loops are enabled for a given platform shape.
|
||||
The same rule applies to critical recovery passes: the collector core may run bounded plan-B
|
||||
loops, but profiles own whether member retry, slow numeric recovery, and profile-specific plan-B
|
||||
passes are enabled.
|
||||
When a profile needs extra discovered-path branches such as storage controller subtrees, it must
|
||||
provide them as scoped suffix policy rather than by hardcoding platform-shaped suffixes into the
|
||||
collector core baseline seed list.
|
||||
The same applies to prefetch shaping: the collector core may execute adaptive prefetch, but
|
||||
profiles own the include/exclude rules for which critical paths should participate.
|
||||
The same applies to critical inventory shaping: the collector core should keep only a minimal
|
||||
vendor-neutral critical baseline, while profiles own additional system/chassis/manager critical
|
||||
suffixes and top-level critical targets.
|
||||
Resolved live acquisition plans should be built inside `redfishprofile/`, not by hand in
|
||||
`redfish.go`. The collector core should receive discovered resources plus the selected profile
|
||||
plan and then execute the resolved seed/critical paths.
|
||||
When profile behavior depends on what discovery actually returned, use a post-discovery
|
||||
refinement hook in `redfishprofile/` instead of hardcoding guessed absolute paths in the static
|
||||
plan. MSI GPU chassis refinement is the reference example.
|
||||
|
||||
Live Redfish collection must expose profile-match diagnostics:
|
||||
- collector logs must include the selected modules and score for every known module
|
||||
- job status responses must carry structured `active_modules` and `module_scores`
|
||||
- the collect page should render active modules as chips from structured status data, not by
|
||||
parsing log lines
|
||||
|
||||
On replay, profile-derived analysis directives may enable vendor-specific inventory linking
|
||||
helpers such as processor-GPU fallback, chassis-ID alias resolution, and bounded storage recovery.
|
||||
Replay should now resolve a structured analysis plan inside `redfishprofile/`, analogous to the
|
||||
live acquisition plan. The replay core may execute collectors against the resolved directives, but
|
||||
snapshot-aware vendor decisions should live in profile analysis hooks, not in `redfish_replay.go`.
|
||||
GPU and storage replay executors should consume the resolved analysis plan directly, not a raw
|
||||
`AnalysisDirectives` struct, so the boundary between planning and execution stays explicit.
|
||||
|
||||
Profile matching and acquisition tuning must be regression-tested against repo-owned compact
|
||||
fixtures under `internal/collector/redfishprofile/testdata/`, derived from representative
|
||||
raw-export snapshots, for at least MSI and Supermicro shapes.
|
||||
When multiple raw-export snapshots exist for the same platform, profile selection must remain
|
||||
stable across those sibling fixtures unless the topology actually changes.
|
||||
Analysis-plan metadata should be stored in replay raw payloads so vendor hook activation is
|
||||
debuggable offline.
|
||||
|
||||
### Stored raw data
|
||||
|
||||
Important raw payloads:
|
||||
- `raw_payloads.redfish_tree`
|
||||
- `raw_payloads.redfish_fetch_errors`
|
||||
- `raw_payloads.redfish_profiles`
|
||||
- `raw_payloads.source_timezone` when available
|
||||
|
||||
### Snapshot crawler rules
|
||||
@@ -68,7 +135,7 @@ Important raw payloads:
|
||||
|
||||
When changing collection logic:
|
||||
|
||||
1. Prefer alternate-path support over vendor hardcoding
|
||||
1. Prefer profile modules over ad-hoc vendor branches in the collector core
|
||||
2. Keep expensive probing bounded
|
||||
3. Deduplicate by serial, then BDF, then location/model fallbacks
|
||||
4. Preserve replay determinism from saved raw payloads
|
||||
|
||||
@@ -274,6 +274,188 @@ for `Enclosure`, `RackMount`, and any unrecognised type (fail-safe).
|
||||
both the excluded types and the storage-capable types (see `TestChassisTypeCanHaveNVMe`
|
||||
and `TestNVMePostProbeSkipsNonStorageChassis`).
|
||||
|
||||
## ADL-019 — Redfish post-probe recovery is profile-owned acquisition policy
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** Numeric collection post-probe and direct NVMe `Disk.Bay` recovery were still
|
||||
controlled by collector-core heuristics, which kept platform-specific acquisition behavior in
|
||||
`redfish.go` and made vendor/topology refactoring incomplete.
|
||||
**Decision:** Move expensive Redfish post-probe enablement into profile-owned acquisition policy.
|
||||
The collector core may execute bounded post-probe loops, but profiles must explicitly enable:
|
||||
- numeric collection post-probe
|
||||
- direct NVMe `Disk.Bay` recovery
|
||||
- sensor collection post-probe
|
||||
**Consequences:**
|
||||
- Generic collector flow no longer implicitly turns on storage/NVMe recovery for every platform.
|
||||
- Supermicro-specific direct NVMe recovery and generic numeric collection recovery are now
|
||||
regression-tested through profile fixtures.
|
||||
- Future platform storage/post-probe behavior must be added through profile tuning, not new
|
||||
vendor-shaped `if` branches in collector core.
|
||||
|
||||
## ADL-020 — Redfish critical plan-B activation is profile-owned recovery policy
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** `critical plan-B` and `profile plan-B` were still effectively always-on collector
|
||||
behavior once paths were present, including critical collection member retry and slow numeric
|
||||
child probing. That kept acquisition recovery semantics in `redfish.go` instead of the profile
|
||||
layer.
|
||||
**Decision:** Move plan-B activation into profile-owned recovery policy. Profiles must explicitly
|
||||
enable:
|
||||
- critical collection member retry
|
||||
- slow numeric probing during critical plan-B
|
||||
- profile-specific plan-B pass
|
||||
**Consequences:**
|
||||
- Recovery behavior is now observable in raw Redfish diagnostics alongside other tuning.
|
||||
- Generic/fallback recovery remains available through profile policy instead of implicit collector
|
||||
defaults.
|
||||
- Future platform-specific plan-B behavior must be introduced through profile tuning and tests,
|
||||
not through new unconditional collector branches.
|
||||
|
||||
## ADL-021 — Extra discovered-path storage seeds must be profile-scoped, not core-baseline
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** The collector core baseline seed list still contained storage-specific discovered-path
|
||||
suffixes such as `SimpleStorage` and `Storage/IntelVROC/*`. These are useful on some platforms,
|
||||
but they are acquisition extensions layered on top of discovered `Systems/*` resources, not part
|
||||
of the minimal vendor-neutral Redfish baseline.
|
||||
**Decision:** Move such discovered-path expansions into profile-owned scoped path policy. The
|
||||
collector core keeps the vendor-neutral baseline; profiles may add extra system/chassis/manager
|
||||
suffixes that are expanded over discovered members during acquisition planning.
|
||||
**Consequences:**
|
||||
- Platform-shaped storage discovery no longer lives in `redfish.go` baseline seed construction.
|
||||
- Extra discovered-path branches are visible in plan diagnostics and fixture regression tests.
|
||||
- Future model/vendor storage path expansions must be added through scoped profile policy instead
|
||||
of editing the shared baseline seed list.
|
||||
|
||||
## ADL-022 — Adaptive prefetch eligibility is profile-owned policy
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** The adaptive prefetch executor was still driven by hardcoded include/exclude path
|
||||
rules in `redfish.go`. That made GPU/storage/network prefetch shaping part of collector-core
|
||||
knowledge rather than profile-owned acquisition policy.
|
||||
**Decision:** Move prefetch eligibility rules into profile tuning. The collector core still runs
|
||||
adaptive prefetch, but profiles provide:
|
||||
- `IncludeSuffixes` for critical paths eligible for prefetch
|
||||
- `ExcludeContains` for path shapes that must never be prefetched
|
||||
**Consequences:**
|
||||
- Prefetch behavior is now visible in raw Redfish diagnostics and test fixtures.
|
||||
- Platform- or topology-specific prefetch shaping no longer requires editing collector-core
|
||||
string lists.
|
||||
- Future prefetch tuning must be introduced through profiles and regression tests.
|
||||
|
||||
## ADL-023 — Core critical baseline is roots-only; critical shaping is profile-owned
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** `redfishCriticalEndpoints(...)` still encoded a broad set of system/chassis/manager
|
||||
critical branches directly in collector core. This mixed minimal crawl invariants with profile-
|
||||
specific acquisition shaping.
|
||||
**Decision:** Reduce collector-core critical baseline to vendor-neutral roots only:
|
||||
- `/redfish/v1`
|
||||
- discovered `Systems/*`
|
||||
- discovered `Chassis/*`
|
||||
- discovered `Managers/*`
|
||||
|
||||
Profiles now own additional critical shaping through:
|
||||
- scoped critical suffix policy for discovered resources
|
||||
- explicit top-level `CriticalPaths`
|
||||
**Consequences:**
|
||||
- Critical inventory breadth is now explained by the acquisition plan, not hidden in collector
|
||||
helper defaults.
|
||||
- Generic profile still provides the previous broad critical coverage, so behavior stays stable.
|
||||
- Future critical-path tuning must be implemented in profiles and regression-tested there.
|
||||
|
||||
## ADL-024 — Live Redfish execution plans are resolved inside redfishprofile
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** Even after moving seeds, scoped paths, critical shaping, recovery, and prefetch
|
||||
policy into profiles, `redfish.go` still manually merged discovered resources with those policy
|
||||
fragments. That left acquisition-plan resolution logic in collector core.
|
||||
**Decision:** Introduce `redfishprofile.ResolveAcquisitionPlan(...)` as the boundary between
|
||||
profile planning and collector execution. `redfishprofile` now resolves:
|
||||
- baseline seeds
|
||||
- baseline critical roots
|
||||
- scoped path expansions
|
||||
- explicit profile seed/critical/plan-B paths
|
||||
|
||||
The collector core consumes the resolved plan and executes it.
|
||||
**Consequences:**
|
||||
- Acquisition planning logic is now testable in `redfishprofile` without going through the live
|
||||
collector.
|
||||
- `redfish.go` no longer owns path-resolution helpers for seeds/critical planning.
|
||||
- This creates a clean next step toward true per-profile acquisition hooks beyond static policy
|
||||
fragments.
|
||||
|
||||
## ADL-025 — Post-discovery acquisition refinement belongs to profile hooks
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** Some acquisition behavior depends not only on vendor/model hints, but on what the
|
||||
lightweight Redfish discovery actually returned. Static absolute path lists in profile plans are
|
||||
too rigid for such cases and reintroduce guessed platform knowledge.
|
||||
**Decision:** Add a post-discovery acquisition refinement hook to Redfish profiles. Profiles may
|
||||
mutate the resolved execution plan after discovered `Systems/*`, `Chassis/*`, and `Managers/*`
|
||||
are known.
|
||||
|
||||
First concrete use:
|
||||
- MSI now derives GPU chassis seeds and `.../Sensors` critical/plan-B paths from discovered
|
||||
`Chassis/GPU*` resources instead of hardcoded `GPU1..GPU4` absolute paths in the static plan.
|
||||
Additional use:
|
||||
- Supermicro now derives `UpdateService/Oem/Supermicro/FirmwareInventory` critical/plan-B paths
|
||||
from resource hints instead of carrying that absolute path in the static plan.
|
||||
Additional use:
|
||||
- Dell now derives `Managers/iDRAC.Embedded.*` acquisition paths from discovered manager
|
||||
resources instead of carrying `Managers/iDRAC.Embedded.1` as a static absolute path.
|
||||
**Consequences:**
|
||||
- Profile modules can react to actual discovery results without pushing conditional logic back
|
||||
into `redfish.go`.
|
||||
- Diagnostics still show the final refined plan because the collector stores the refined plan,
|
||||
not only the pre-refinement template.
|
||||
- Future vendor-specific discovery-dependent acquisition behavior should be implemented through
|
||||
this hook rather than new collector-core branches.
|
||||
|
||||
## ADL-026 — Replay analysis uses a resolved profile plan, not ad-hoc directives only
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** Replay still relied on a flat `AnalysisDirectives` struct assembled centrally,
|
||||
while vendor-specific conditions often depended on the actual snapshot shape. That made analysis
|
||||
behavior harder to explain and kept too much vendor logic in generic replay collectors.
|
||||
**Decision:** Introduce `redfishprofile.ResolveAnalysisPlan(...)` for replay. The resolved
|
||||
analysis plan contains:
|
||||
- active match result
|
||||
- resolved analysis directives
|
||||
- analysis notes explaining snapshot-aware hook activation
|
||||
|
||||
Profiles may refine this plan using the snapshot and discovered resources before replay collectors
|
||||
run.
|
||||
|
||||
First concrete uses:
|
||||
- MSI enables processor-GPU fallback and MSI chassis lookup only when the snapshot actually
|
||||
contains GPU processors and `Chassis/GPU*`
|
||||
- HGX enables processor-GPU alias fallback from actual HGX/GPU_SXM topology signals in the snapshot
|
||||
- Supermicro enables NVMe backplane and known-controller recovery from actual snapshot paths
|
||||
**Consequences:**
|
||||
- Replay behavior is now closer to the acquisition architecture: a resolved profile plan feeds the
|
||||
executor.
|
||||
- `redfish_analysis_plan` is stored in raw payload metadata for offline debugging.
|
||||
- Future analysis-side vendor logic should move into profile refinement hooks instead of growing the
|
||||
central directive builder.
|
||||
|
||||
## ADL-027 — Replay GPU/storage executors consume resolved analysis plans
|
||||
|
||||
**Date:** 2026-03-18
|
||||
**Context:** Even after introducing `ResolveAnalysisPlan(...)`, replay GPU/storage collectors still
|
||||
accepted a raw `AnalysisDirectives` struct. That preserved an implicit shortcut from the old design
|
||||
and weakened the plan/executor boundary.
|
||||
**Decision:** Replay GPU/storage executors now accept `redfishprofile.ResolvedAnalysisPlan`
|
||||
directly. The executor reads resolved directives from the plan instead of being passed a standalone
|
||||
directive bundle.
|
||||
**Consequences:**
|
||||
- GPU and storage replay execution now follows the same architectural pattern as acquisition:
|
||||
resolve plan first, execute second.
|
||||
- Future profile-owned execution helpers can use plan notes or additional resolved fields without
|
||||
changing the executor API again.
|
||||
- Remaining replay areas should migrate the same way instead of continuing to accept raw directive
|
||||
structs.
|
||||
|
||||
## ADL-019 — isDeviceBoundFirmwareName must cover vendor-specific naming patterns per vendor
|
||||
|
||||
**Date:** 2026-03-12
|
||||
@@ -604,3 +786,39 @@ presentation drift and duplicated UI logic.
|
||||
- The host UI becomes a service shell around the viewer instead of maintaining its own
|
||||
field-by-field tabs.
|
||||
- `internal/chart` must be updated explicitly as a git submodule when the viewer changes.
|
||||
|
||||
---
|
||||
|
||||
## ADL-031 — Redfish uses profile-driven acquisition and unified ingest entrypoints
|
||||
|
||||
**Date:** 2026-03-17
|
||||
**Context:**
|
||||
Redfish collection had accumulated platform-specific probing in the shared collector path, while
|
||||
upload and raw-export replay still entered analysis through direct handler branches. This made
|
||||
vendor/model tuning harder to contain and increased regression risk when one topology needed a
|
||||
special acquisition strategy.
|
||||
|
||||
**Decision:**
|
||||
- Introduce `internal/ingest.Service` as the internal source-family entrypoint for archive parsing
|
||||
and Redfish raw replay.
|
||||
- Introduce `internal/collector/redfishprofile/` for Redfish profile matching and modular hooks.
|
||||
- Split Redfish behavior into coordinated phases:
|
||||
- acquisition planning during live collection
|
||||
- analysis hooks during snapshot replay
|
||||
- Use score-based profile matching. If confidence is low, enter fallback acquisition mode and
|
||||
aggregate only safe additive profile probes.
|
||||
- Allow profile modules to provide bounded acquisition tuning hints such as crawl cap, prefetch
|
||||
behavior, and expensive post-probe toggles.
|
||||
- Allow profile modules to own model-specific `CriticalPaths` and bounded `PlanBPaths` so vendor
|
||||
recovery targets stop leaking into the collector core.
|
||||
- Expose Redfish profile matching as structured diagnostics during live collection: logs must
|
||||
contain all module scores, and collect job status must expose active modules for the UI.
|
||||
|
||||
**Consequences:**
|
||||
- Server handlers stop owning parser-vs-replay branching details directly.
|
||||
- Vendor/model-specific Redfish logic gets an explicit module boundary.
|
||||
- Unknown-vendor Redfish collection becomes slower but more complete by design.
|
||||
- Tactical Redfish fixes should move into profile modules instead of widening generic replay logic.
|
||||
- Repo-owned compact fixtures under `internal/collector/redfishprofile/testdata/`, derived from
|
||||
representative raw-export snapshots, are used to lock profile matching and acquisition tuning
|
||||
for known MSI and Supermicro-family shapes.
|
||||
|
||||
Reference in New Issue
Block a user