9.1 KiB
05 — Collectors
Collectors live in internal/collector/.
Core files:
registry.gofor protocol registrationredfish.gofor live collectionredfish_replay.gofor replay from raw payloadsredfish_replay_gpu.gofor profile-driven GPU replay collectors and GPU fallback helpersredfish_replay_storage.gofor profile-driven storage replay collectors and storage recovery helpersredfish_replay_inventory.gofor replay inventory collectors (PCIe, NIC, BMC MAC, NIC enrichment)redfish_replay_fru.gofor board fallback helpers and Assembly/FRU replay extractionredfish_replay_profiles.gofor profile-driven replay helpers and vendor-aware recovery helpersredfishprofile/for Redfish profile matching and acquisition/analysis hooksipmi_mock.gofor the placeholder IPMI implementationtypes.gofor request/progress contracts
Redfish collector
Status: active production path.
Request fields passed from the server:
hostportusernameauth_type- credential field (
passwordor token) tls_mode- optional
power_on_if_host_off - optional
debug_payloadsfor extended diagnostics
Core rule
Live collection and replay must stay behaviorally aligned. If the collector adds a fallback, probe, or normalization rule, replay must mirror it.
Preflight and host power
Probe()is used before collection to verify API connectivity and report current hostPowerState- if the host is off, the collector logs a warning and proceeds with collection; inventory data may be incomplete when the host is powered off
- power-on and power-off are not performed by the collector
Skip hung requests
Redfish collection uses a two-level context model:
ctx— job lifetime context, cancelled only on explicit job cancelcollectCtx— collection phase context, derived fromctx; covers snapshot, prefetch, and plan-B
collectCtx is cancelled when the user presses "Пропустить зависшие" (skip hung).
On skip, all in-flight HTTP requests in the current phase are aborted immediately via context
cancellation, the crawler and plan-B loops exit, and execution proceeds to the replay phase using
whatever was collected in rawTree. The result is partial but valid.
The skip signal travels: UI button → POST /api/collect/{id}/skip → JobManager.SkipJob() →
closes skipCh → goroutine in Collect() → cancelCollect().
The skip button is visible during running state and hidden once the job reaches a terminal state.
Extended diagnostics toggle
The live collect form exposes a user-facing checkbox for extended diagnostics.
- default collection prioritizes inventory completeness and bounded runtime
- when extended diagnostics is off, heavy HGX component-chassis critical plan-B retries
(
Assembly,Accelerators,Drives,NetworkAdapters,PCIeDevices) are skipped - when extended diagnostics is on, those retries are allowed and extra debug payloads are collected
This toggle is intended for operator-driven deep diagnostics on problematic hosts, not for the default path.
Discovery model
The collector does not rely on one fixed vendor tree. It discovers and follows Redfish resources dynamically from root collections such as:
SystemsChassisManagers
After minimal discovery the collector builds MatchSignals and selects a Redfish profile mode:
matchedwhen one or more profiles score with high confidencefallbackwhen vendor/platform confidence is low; in this mode the collector aggregates safe additive profile probes to maximize snapshot completeness
Profile modules may contribute:
- primary acquisition seeds
- bounded
PlanBPathsfor secondary recovery - critical paths
- acquisition notes/diagnostics
- tuning hints such as snapshot document cap, prefetch behavior, and expensive post-probe toggles
- post-probe policy for numeric collection recovery, direct NVMe
Disk.Bayrecovery, and sensor post-probe enablement - recovery policy for critical collection member retry, slow numeric plan-B probing, and profile-specific plan-B activation
- scoped path policy for discovered
Systems/*,Chassis/*, andManagers/*branches when a profile needs extra seeds/critical targets beyond the vendor-neutral core set - prefetch policy for which critical paths are eligible for adaptive prefetch and which path shapes are explicitly excluded
Model- or topology-specific CriticalPaths and profile PlanBPaths must live in the profile
module that owns the behavior. The collector core may execute those paths, but it should not
hardcode vendor-specific recovery targets.
The same rule applies to expensive post-probe decisions: the collector core may execute bounded
post-probe loops, but profiles own whether those loops are enabled for a given platform shape.
The same rule applies to critical recovery passes: the collector core may run bounded plan-B
loops, but profiles own whether member retry, slow numeric recovery, and profile-specific plan-B
passes are enabled.
When a profile needs extra discovered-path branches such as storage controller subtrees, it must
provide them as scoped suffix policy rather than by hardcoding platform-shaped suffixes into the
collector core baseline seed list.
The same applies to prefetch shaping: the collector core may execute adaptive prefetch, but
profiles own the include/exclude rules for which critical paths should participate.
The same applies to critical inventory shaping: the collector core should keep only a minimal
vendor-neutral critical baseline, while profiles own additional system/chassis/manager critical
suffixes and top-level critical targets.
Resolved live acquisition plans should be built inside redfishprofile/, not by hand in
redfish.go. The collector core should receive discovered resources plus the selected profile
plan and then execute the resolved seed/critical paths.
When profile behavior depends on what discovery actually returned, use a post-discovery
refinement hook in redfishprofile/ instead of hardcoding guessed absolute paths in the static
plan. MSI GPU chassis refinement is the reference example.
Live Redfish collection must expose profile-match diagnostics:
- collector logs must include the selected modules and score for every known module
- job status responses must carry structured
active_modulesandmodule_scores - the collect page should render active modules as chips from structured status data, not by parsing log lines
Profile matching may use stable platform grammar signals in addition to vendor strings:
- discovered member/resource naming from lightweight discovery collections
- firmware inventory member IDs
- OEM action names and linked target paths embedded in discovery documents
- replay-only snapshot hints such as OEM assembly/type markers when they are present in
raw_payloads.redfish_tree
On replay, profile-derived analysis directives may enable vendor-specific inventory linking
helpers such as processor-GPU fallback, chassis-ID alias resolution, and bounded storage recovery.
Replay should now resolve a structured analysis plan inside redfishprofile/, analogous to the
live acquisition plan. The replay core may execute collectors against the resolved directives, but
snapshot-aware vendor decisions should live in profile analysis hooks, not in redfish_replay.go.
GPU and storage replay executors should consume the resolved analysis plan directly, not a raw
AnalysisDirectives struct, so the boundary between planning and execution stays explicit.
Profile matching and acquisition tuning must be regression-tested against repo-owned compact
fixtures under internal/collector/redfishprofile/testdata/, derived from representative
raw-export snapshots, for at least MSI and Supermicro shapes.
When multiple raw-export snapshots exist for the same platform, profile selection must remain
stable across those sibling fixtures unless the topology actually changes.
Analysis-plan metadata should be stored in replay raw payloads so vendor hook activation is
debuggable offline.
Stored raw data
Important raw payloads:
raw_payloads.redfish_treeraw_payloads.redfish_fetch_errorsraw_payloads.redfish_profilesraw_payloads.source_timezonewhen available
Snapshot crawler rules
- bounded by
LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS - prioritized toward high-value inventory paths
- tolerant of expected vendor-specific failures
- normalizes
@odata.idvalues before queueing
Redfish implementation guidance
When changing collection logic:
- Prefer profile modules over ad-hoc vendor branches in the collector core
- Keep expensive probing bounded
- Deduplicate by serial, then BDF, then location/model fallbacks
- Preserve replay determinism from saved raw payloads
- Add tests for both the motivating topology and a negative case
Known vendor fallbacks
- empty standard drive collections may trigger bounded
Disk.Bayprobing Storage.Links.Enclosures[*]may be followed to recover physical drivesPowerSubsystem/PowerSuppliesis preferred over legacyPowerwhen available
IPMI collector
Status: mock scaffold only.
It remains registered for protocol completeness, but it is not a real collection path.