Implement the full architectural plan: unified ingest.Service entry point for archive and Redfish payloads, modular redfishprofile package with composable profiles (generic, ami-family, msi, supermicro, dell, hgx-topology), score-based profile matching with fallback expansion mode, and profile-driven acquisition/analysis plans. Vendor-specific logic moved out of common executors and into profile hooks. GPU chassis lookup strategies and known storage recovery collections (IntelVROC/HA-RAID/MRVL) now live in ResolvedAnalysisPlan, populated by profiles at analysis time. Replay helpers read from the plan; no hardcoded path lists remain in generic code. Also splits redfish_replay.go into domain modules (gpu, storage, inventory, fru, profiles) and adds full fixture/matcher/directive test coverage including Dell, AMI, unknown-vendor fallback, and deterministic ordering. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
155 lines
7.7 KiB
Markdown
155 lines
7.7 KiB
Markdown
# 05 — Collectors
|
|
|
|
Collectors live in `internal/collector/`.
|
|
|
|
Core files:
|
|
- `registry.go` for protocol registration
|
|
- `redfish.go` for live collection
|
|
- `redfish_replay.go` for replay from raw payloads
|
|
- `redfish_replay_gpu.go` for profile-driven GPU replay collectors and GPU fallback helpers
|
|
- `redfish_replay_storage.go` for profile-driven storage replay collectors and storage recovery helpers
|
|
- `redfish_replay_inventory.go` for replay inventory collectors (PCIe, NIC, BMC MAC, NIC enrichment)
|
|
- `redfish_replay_fru.go` for board fallback helpers and Assembly/FRU replay extraction
|
|
- `redfish_replay_profiles.go` for profile-driven replay helpers and vendor-aware recovery helpers
|
|
- `redfishprofile/` for Redfish profile matching and acquisition/analysis hooks
|
|
- `ipmi_mock.go` for the placeholder IPMI implementation
|
|
- `types.go` for request/progress contracts
|
|
|
|
## Redfish collector
|
|
|
|
Status: active production path.
|
|
|
|
Request fields passed from the server:
|
|
- `host`
|
|
- `port`
|
|
- `username`
|
|
- `auth_type`
|
|
- credential field (`password` or token)
|
|
- `tls_mode`
|
|
- optional `power_on_if_host_off`
|
|
|
|
### Core rule
|
|
|
|
Live collection and replay must stay behaviorally aligned.
|
|
If the collector adds a fallback, probe, or normalization rule, replay must mirror it.
|
|
|
|
### Preflight and host power
|
|
|
|
- `Probe()` may be used before collection to verify API connectivity and current host `PowerState`
|
|
- if the host is off and the user chose power-on, the collector may issue `ComputerSystem.Reset`
|
|
with `ResetType=On`
|
|
- power-on attempts are bounded and logged
|
|
- after a successful power-on, the collector waits an extra stabilization window, then checks
|
|
`PowerState` again and only starts collection if the host is still on
|
|
- if the collector powered on the host itself for collection, it must attempt to power it back off
|
|
after collection completes
|
|
- if the host was already on before collection, the collector must not power it off afterward
|
|
- if power-on fails, collection still continues against the powered-off host
|
|
- all power-control decisions and attempts must be visible in the collection log so they are
|
|
preserved in raw-export bundles
|
|
|
|
### Discovery model
|
|
|
|
The collector does not rely on one fixed vendor tree.
|
|
It discovers and follows Redfish resources dynamically from root collections such as:
|
|
- `Systems`
|
|
- `Chassis`
|
|
- `Managers`
|
|
|
|
After minimal discovery the collector builds `MatchSignals` and selects a Redfish profile mode:
|
|
- `matched` when one or more profiles score with high confidence
|
|
- `fallback` when vendor/platform confidence is low; in this mode the collector aggregates safe additive profile probes to maximize snapshot completeness
|
|
|
|
Profile modules may contribute:
|
|
- primary acquisition seeds
|
|
- bounded `PlanBPaths` for secondary recovery
|
|
- critical paths
|
|
- acquisition notes/diagnostics
|
|
- tuning hints such as snapshot document cap, prefetch behavior, and expensive post-probe toggles
|
|
- post-probe policy for numeric collection recovery, direct NVMe `Disk.Bay` recovery, and sensor post-probe enablement
|
|
- recovery policy for critical collection member retry, slow numeric plan-B probing, and profile-specific plan-B activation
|
|
- scoped path policy for discovered `Systems/*`, `Chassis/*`, and `Managers/*` branches when a profile needs extra seeds/critical targets beyond the vendor-neutral core set
|
|
- prefetch policy for which critical paths are eligible for adaptive prefetch and which path shapes are explicitly excluded
|
|
|
|
Model- or topology-specific `CriticalPaths` and profile `PlanBPaths` must live in the profile
|
|
module that owns the behavior. The collector core may execute those paths, but it should not
|
|
hardcode vendor-specific recovery targets.
|
|
The same rule applies to expensive post-probe decisions: the collector core may execute bounded
|
|
post-probe loops, but profiles own whether those loops are enabled for a given platform shape.
|
|
The same rule applies to critical recovery passes: the collector core may run bounded plan-B
|
|
loops, but profiles own whether member retry, slow numeric recovery, and profile-specific plan-B
|
|
passes are enabled.
|
|
When a profile needs extra discovered-path branches such as storage controller subtrees, it must
|
|
provide them as scoped suffix policy rather than by hardcoding platform-shaped suffixes into the
|
|
collector core baseline seed list.
|
|
The same applies to prefetch shaping: the collector core may execute adaptive prefetch, but
|
|
profiles own the include/exclude rules for which critical paths should participate.
|
|
The same applies to critical inventory shaping: the collector core should keep only a minimal
|
|
vendor-neutral critical baseline, while profiles own additional system/chassis/manager critical
|
|
suffixes and top-level critical targets.
|
|
Resolved live acquisition plans should be built inside `redfishprofile/`, not by hand in
|
|
`redfish.go`. The collector core should receive discovered resources plus the selected profile
|
|
plan and then execute the resolved seed/critical paths.
|
|
When profile behavior depends on what discovery actually returned, use a post-discovery
|
|
refinement hook in `redfishprofile/` instead of hardcoding guessed absolute paths in the static
|
|
plan. MSI GPU chassis refinement is the reference example.
|
|
|
|
Live Redfish collection must expose profile-match diagnostics:
|
|
- collector logs must include the selected modules and score for every known module
|
|
- job status responses must carry structured `active_modules` and `module_scores`
|
|
- the collect page should render active modules as chips from structured status data, not by
|
|
parsing log lines
|
|
|
|
On replay, profile-derived analysis directives may enable vendor-specific inventory linking
|
|
helpers such as processor-GPU fallback, chassis-ID alias resolution, and bounded storage recovery.
|
|
Replay should now resolve a structured analysis plan inside `redfishprofile/`, analogous to the
|
|
live acquisition plan. The replay core may execute collectors against the resolved directives, but
|
|
snapshot-aware vendor decisions should live in profile analysis hooks, not in `redfish_replay.go`.
|
|
GPU and storage replay executors should consume the resolved analysis plan directly, not a raw
|
|
`AnalysisDirectives` struct, so the boundary between planning and execution stays explicit.
|
|
|
|
Profile matching and acquisition tuning must be regression-tested against repo-owned compact
|
|
fixtures under `internal/collector/redfishprofile/testdata/`, derived from representative
|
|
raw-export snapshots, for at least MSI and Supermicro shapes.
|
|
When multiple raw-export snapshots exist for the same platform, profile selection must remain
|
|
stable across those sibling fixtures unless the topology actually changes.
|
|
Analysis-plan metadata should be stored in replay raw payloads so vendor hook activation is
|
|
debuggable offline.
|
|
|
|
### Stored raw data
|
|
|
|
Important raw payloads:
|
|
- `raw_payloads.redfish_tree`
|
|
- `raw_payloads.redfish_fetch_errors`
|
|
- `raw_payloads.redfish_profiles`
|
|
- `raw_payloads.source_timezone` when available
|
|
|
|
### Snapshot crawler rules
|
|
|
|
- bounded by `LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`
|
|
- prioritized toward high-value inventory paths
|
|
- tolerant of expected vendor-specific failures
|
|
- normalizes `@odata.id` values before queueing
|
|
|
|
### Redfish implementation guidance
|
|
|
|
When changing collection logic:
|
|
|
|
1. Prefer profile modules over ad-hoc vendor branches in the collector core
|
|
2. Keep expensive probing bounded
|
|
3. Deduplicate by serial, then BDF, then location/model fallbacks
|
|
4. Preserve replay determinism from saved raw payloads
|
|
5. Add tests for both the motivating topology and a negative case
|
|
|
|
### Known vendor fallbacks
|
|
|
|
- empty standard drive collections may trigger bounded `Disk.Bay` probing
|
|
- `Storage.Links.Enclosures[*]` may be followed to recover physical drives
|
|
- `PowerSubsystem/PowerSupplies` is preferred over legacy `Power` when available
|
|
|
|
## IPMI collector
|
|
|
|
Status: mock scaffold only.
|
|
|
|
It remains registered for protocol completeness, but it is not a real collection path.
|