Remove power-on and power-off functionality from the Redfish collector;
keep host power-state detection and show a warning in the UI when the
host is powered off before collection starts.
Add a "Пропустить зависшие" (skip hung) button that lets the user abort
stuck Redfish collection phases without losing already-collected data.
Introduces a two-level context model in Collect(): the outer job context
covers the full lifecycle including replay; an inner collectCtx covers
snapshot, prefetch, and plan-B phases only. Closing the skipCh cancels
collectCtx immediately — aborts all in-flight HTTP requests and exits
plan-B loops — then replay runs on whatever rawTree was collected.
Signal path: UI → POST /api/collect/{id}/skip → JobManager.SkipJob()
→ close(skipCh) → goroutine in Collect() → cancelCollect().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
171 lines
8.5 KiB
Markdown
171 lines
8.5 KiB
Markdown
# 05 — Collectors
|
|
|
|
Collectors live in `internal/collector/`.
|
|
|
|
Core files:
|
|
- `registry.go` for protocol registration
|
|
- `redfish.go` for live collection
|
|
- `redfish_replay.go` for replay from raw payloads
|
|
- `redfish_replay_gpu.go` for profile-driven GPU replay collectors and GPU fallback helpers
|
|
- `redfish_replay_storage.go` for profile-driven storage replay collectors and storage recovery helpers
|
|
- `redfish_replay_inventory.go` for replay inventory collectors (PCIe, NIC, BMC MAC, NIC enrichment)
|
|
- `redfish_replay_fru.go` for board fallback helpers and Assembly/FRU replay extraction
|
|
- `redfish_replay_profiles.go` for profile-driven replay helpers and vendor-aware recovery helpers
|
|
- `redfishprofile/` for Redfish profile matching and acquisition/analysis hooks
|
|
- `ipmi_mock.go` for the placeholder IPMI implementation
|
|
- `types.go` for request/progress contracts
|
|
|
|
## Redfish collector
|
|
|
|
Status: active production path.
|
|
|
|
Request fields passed from the server:
|
|
- `host`
|
|
- `port`
|
|
- `username`
|
|
- `auth_type`
|
|
- credential field (`password` or token)
|
|
- `tls_mode`
|
|
- optional `power_on_if_host_off`
|
|
|
|
### Core rule
|
|
|
|
Live collection and replay must stay behaviorally aligned.
|
|
If the collector adds a fallback, probe, or normalization rule, replay must mirror it.
|
|
|
|
### Preflight and host power
|
|
|
|
- `Probe()` is used before collection to verify API connectivity and report current host `PowerState`
|
|
- if the host is off, the collector logs a warning and proceeds with collection; inventory data may
|
|
be incomplete when the host is powered off
|
|
- power-on and power-off are not performed by the collector
|
|
|
|
### Skip hung requests
|
|
|
|
Redfish collection uses a two-level context model:
|
|
|
|
- `ctx` — job lifetime context, cancelled only on explicit job cancel
|
|
- `collectCtx` — collection phase context, derived from `ctx`; covers snapshot, prefetch, and plan-B
|
|
|
|
`collectCtx` is cancelled when the user presses "Пропустить зависшие" (skip hung).
|
|
On skip, all in-flight HTTP requests in the current phase are aborted immediately via context
|
|
cancellation, the crawler and plan-B loops exit, and execution proceeds to the replay phase using
|
|
whatever was collected in `rawTree`. The result is partial but valid.
|
|
|
|
The skip signal travels: UI button → `POST /api/collect/{id}/skip` → `JobManager.SkipJob()` →
|
|
closes `skipCh` → goroutine in `Collect()` → `cancelCollect()`.
|
|
|
|
The skip button is visible during `running` state and hidden once the job reaches a terminal state.
|
|
|
|
### Discovery model
|
|
|
|
The collector does not rely on one fixed vendor tree.
|
|
It discovers and follows Redfish resources dynamically from root collections such as:
|
|
- `Systems`
|
|
- `Chassis`
|
|
- `Managers`
|
|
|
|
After minimal discovery the collector builds `MatchSignals` and selects a Redfish profile mode:
|
|
- `matched` when one or more profiles score with high confidence
|
|
- `fallback` when vendor/platform confidence is low; in this mode the collector aggregates safe additive profile probes to maximize snapshot completeness
|
|
|
|
Profile modules may contribute:
|
|
- primary acquisition seeds
|
|
- bounded `PlanBPaths` for secondary recovery
|
|
- critical paths
|
|
- acquisition notes/diagnostics
|
|
- tuning hints such as snapshot document cap, prefetch behavior, and expensive post-probe toggles
|
|
- post-probe policy for numeric collection recovery, direct NVMe `Disk.Bay` recovery, and sensor post-probe enablement
|
|
- recovery policy for critical collection member retry, slow numeric plan-B probing, and profile-specific plan-B activation
|
|
- scoped path policy for discovered `Systems/*`, `Chassis/*`, and `Managers/*` branches when a profile needs extra seeds/critical targets beyond the vendor-neutral core set
|
|
- prefetch policy for which critical paths are eligible for adaptive prefetch and which path shapes are explicitly excluded
|
|
|
|
Model- or topology-specific `CriticalPaths` and profile `PlanBPaths` must live in the profile
|
|
module that owns the behavior. The collector core may execute those paths, but it should not
|
|
hardcode vendor-specific recovery targets.
|
|
The same rule applies to expensive post-probe decisions: the collector core may execute bounded
|
|
post-probe loops, but profiles own whether those loops are enabled for a given platform shape.
|
|
The same rule applies to critical recovery passes: the collector core may run bounded plan-B
|
|
loops, but profiles own whether member retry, slow numeric recovery, and profile-specific plan-B
|
|
passes are enabled.
|
|
When a profile needs extra discovered-path branches such as storage controller subtrees, it must
|
|
provide them as scoped suffix policy rather than by hardcoding platform-shaped suffixes into the
|
|
collector core baseline seed list.
|
|
The same applies to prefetch shaping: the collector core may execute adaptive prefetch, but
|
|
profiles own the include/exclude rules for which critical paths should participate.
|
|
The same applies to critical inventory shaping: the collector core should keep only a minimal
|
|
vendor-neutral critical baseline, while profiles own additional system/chassis/manager critical
|
|
suffixes and top-level critical targets.
|
|
Resolved live acquisition plans should be built inside `redfishprofile/`, not by hand in
|
|
`redfish.go`. The collector core should receive discovered resources plus the selected profile
|
|
plan and then execute the resolved seed/critical paths.
|
|
When profile behavior depends on what discovery actually returned, use a post-discovery
|
|
refinement hook in `redfishprofile/` instead of hardcoding guessed absolute paths in the static
|
|
plan. MSI GPU chassis refinement is the reference example.
|
|
|
|
Live Redfish collection must expose profile-match diagnostics:
|
|
- collector logs must include the selected modules and score for every known module
|
|
- job status responses must carry structured `active_modules` and `module_scores`
|
|
- the collect page should render active modules as chips from structured status data, not by
|
|
parsing log lines
|
|
|
|
Profile matching may use stable platform grammar signals in addition to vendor strings:
|
|
- discovered member/resource naming from lightweight discovery collections
|
|
- firmware inventory member IDs
|
|
- OEM action names and linked target paths embedded in discovery documents
|
|
- replay-only snapshot hints such as OEM assembly/type markers when they are present in
|
|
`raw_payloads.redfish_tree`
|
|
|
|
On replay, profile-derived analysis directives may enable vendor-specific inventory linking
|
|
helpers such as processor-GPU fallback, chassis-ID alias resolution, and bounded storage recovery.
|
|
Replay should now resolve a structured analysis plan inside `redfishprofile/`, analogous to the
|
|
live acquisition plan. The replay core may execute collectors against the resolved directives, but
|
|
snapshot-aware vendor decisions should live in profile analysis hooks, not in `redfish_replay.go`.
|
|
GPU and storage replay executors should consume the resolved analysis plan directly, not a raw
|
|
`AnalysisDirectives` struct, so the boundary between planning and execution stays explicit.
|
|
|
|
Profile matching and acquisition tuning must be regression-tested against repo-owned compact
|
|
fixtures under `internal/collector/redfishprofile/testdata/`, derived from representative
|
|
raw-export snapshots, for at least MSI and Supermicro shapes.
|
|
When multiple raw-export snapshots exist for the same platform, profile selection must remain
|
|
stable across those sibling fixtures unless the topology actually changes.
|
|
Analysis-plan metadata should be stored in replay raw payloads so vendor hook activation is
|
|
debuggable offline.
|
|
|
|
### Stored raw data
|
|
|
|
Important raw payloads:
|
|
- `raw_payloads.redfish_tree`
|
|
- `raw_payloads.redfish_fetch_errors`
|
|
- `raw_payloads.redfish_profiles`
|
|
- `raw_payloads.source_timezone` when available
|
|
|
|
### Snapshot crawler rules
|
|
|
|
- bounded by `LOGPILE_REDFISH_SNAPSHOT_MAX_DOCS`
|
|
- prioritized toward high-value inventory paths
|
|
- tolerant of expected vendor-specific failures
|
|
- normalizes `@odata.id` values before queueing
|
|
|
|
### Redfish implementation guidance
|
|
|
|
When changing collection logic:
|
|
|
|
1. Prefer profile modules over ad-hoc vendor branches in the collector core
|
|
2. Keep expensive probing bounded
|
|
3. Deduplicate by serial, then BDF, then location/model fallbacks
|
|
4. Preserve replay determinism from saved raw payloads
|
|
5. Add tests for both the motivating topology and a negative case
|
|
|
|
### Known vendor fallbacks
|
|
|
|
- empty standard drive collections may trigger bounded `Disk.Bay` probing
|
|
- `Storage.Links.Enclosures[*]` may be followed to recover physical drives
|
|
- `PowerSubsystem/PowerSupplies` is preferred over legacy `Power` when available
|
|
|
|
## IPMI collector
|
|
|
|
Status: mock scaffold only.
|
|
|
|
It remains registered for protocol completeness, but it is not a real collection path.
|