Lenovo ThinkSystem SR650 V3 (and similar XCC-based servers) caused
collection runs of 23+ minutes because the BMC exposes two large high-
error-rate subtrees in the snapshot BFS:
- Chassis/1/Sensors: 315 individual sensor members, 282/315 failing,
~3.7s per request → ~19 minutes wasted. These documents are never
read by any LOGPile parser (thermal/power data comes from aggregate
Chassis/*/Thermal and Chassis/*/Power endpoints).
- Chassis/1/Oem/Lenovo: 75 requests (LEDs×47, Slots×26, etc.),
68/75 failing → 8+ minutes wasted on non-inventory data.
Add a Lenovo profile (matched on SystemManufacturer/OEMNamespace "Lenovo")
that sets SnapshotExcludeContains to block individual sensor documents and
non-inventory Lenovo OEM subtrees from the snapshot BFS queue. Also sets
rate policy thresholds appropriate for XCC BMC latency (p95 often 3-5s).
Add SnapshotExcludeContains []string to AcquisitionTuning and check it
in the snapshot enqueue closure in redfish.go.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement the full architectural plan: unified ingest.Service entry point
for archive and Redfish payloads, modular redfishprofile package with
composable profiles (generic, ami-family, msi, supermicro, dell,
hgx-topology), score-based profile matching with fallback expansion mode,
and profile-driven acquisition/analysis plans.
Vendor-specific logic moved out of common executors and into profile hooks.
GPU chassis lookup strategies and known storage recovery collections
(IntelVROC/HA-RAID/MRVL) now live in ResolvedAnalysisPlan, populated by
profiles at analysis time. Replay helpers read from the plan; no hardcoded
path lists remain in generic code.
Also splits redfish_replay.go into domain modules (gpu, storage, inventory,
fru, profiles) and adds full fixture/matcher/directive test coverage
including Dell, AMI, unknown-vendor fallback, and deterministic ordering.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>