redfish: skip NVMe bay probe for non-storage chassis types (Module/Component/Zone)
On Supermicro HGX systems (SYS-A21GE-NBRT) ~35 sub-chassis (GPU, NVSwitch, PCIeRetimer, ERoT/IRoT, BMC, FPGA) all carry ChassisType=Module/Component/Zone and expose empty /Drives collections. shouldAdaptiveNVMeProbe returned true for all of them, triggering 35 × 384 = 13 440 HTTP requests → ~22 min wasted per collection (more than half of total 35 min collection time). Fix: chassisTypeCanHaveNVMe returns false for Module, Component, Zone. The candidate selection loop in collectRawRedfishTree now checks the parent chassis doc before adding a /Drives path to the probe list. Enclosure (NVMe backplane), RackMount, and unknown types are unaffected. Tests: - TestChassisTypeCanHaveNVMe: table-driven, covers excluded and storage-capable types - TestNVMePostProbeSkipsNonStorageChassis: topology integration, GPU chassis + backplane with empty /Drives → exactly 1 candidate selected (backplane only) Docs: - ADL-018 in bible-local/10-decisions.md - Candidate-selection test matrix in bible-local/09-testing.md - SYS-A21GE-NBRT baseline row in docs/test_server_collection_memory.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -253,4 +253,25 @@ at parse time before storing in any model struct. Use the regex
|
||||
|
||||
---
|
||||
|
||||
## ADL-018 — NVMe bay probe must be restricted to storage-capable chassis types
|
||||
|
||||
**Date:** 2026-03-12
|
||||
**Context:** `shouldAdaptiveNVMeProbe` was introduced in `2fa4a12` to recover NVMe drives on
|
||||
Supermicro BMCs that expose empty `Drives` collections but serve disks at direct `Disk.Bay.N`
|
||||
paths. The function returns `true` for any chassis with an empty `Members` array. On
|
||||
Supermicro HGX systems (SYS-A21GE-NBRT and similar) ~35 sub-chassis (GPU, NVSwitch,
|
||||
PCIeRetimer, ERoT, IRoT, BMC, FPGA) all carry `ChassisType=Module/Component/Zone` and
|
||||
expose empty `/Drives` collections. Without filtering, each triggered 384 HTTP requests →
|
||||
13 440 requests ≈ 22 minutes of pure I/O waste per collection.
|
||||
**Decision:** Before probing `Disk.Bay.N` candidates for a chassis, check its `ChassisType`
|
||||
via `chassisTypeCanHaveNVMe`. Skip if type is `Module`, `Component`, or `Zone`. Keep probing
|
||||
for `Enclosure`, `RackMount`, and any unrecognised type (fail-safe).
|
||||
**Consequences:**
|
||||
- On HGX systems post-probe NVMe goes from ~22 min to effectively zero.
|
||||
- NVMe backplane recovery (`Enclosure` type) is unaffected.
|
||||
- Any new chassis type that hosts NVMe storage is covered by the default `true` path.
|
||||
- `chassisTypeCanHaveNVMe` and the candidate-selection loop must have unit tests covering
|
||||
both the excluded types and the storage-capable types (see `TestChassisTypeCanHaveNVMe`
|
||||
and `TestNVMePostProbeSkipsNonStorageChassis`).
|
||||
|
||||
<!-- Add new decisions below this line using the format above -->
|
||||
|
||||
Reference in New Issue
Block a user