redfish: skip NVMe bay probe for non-storage chassis types (Module/Component/Zone)
On Supermicro HGX systems (SYS-A21GE-NBRT) ~35 sub-chassis (GPU, NVSwitch, PCIeRetimer, ERoT/IRoT, BMC, FPGA) all carry ChassisType=Module/Component/Zone and expose empty /Drives collections. shouldAdaptiveNVMeProbe returned true for all of them, triggering 35 × 384 = 13 440 HTTP requests → ~22 min wasted per collection (more than half of total 35 min collection time). Fix: chassisTypeCanHaveNVMe returns false for Module, Component, Zone. The candidate selection loop in collectRawRedfishTree now checks the parent chassis doc before adding a /Drives path to the probe list. Enclosure (NVMe backplane), RackMount, and unknown types are unaffected. Tests: - TestChassisTypeCanHaveNVMe: table-driven, covers excluded and storage-capable types - TestNVMePostProbeSkipsNonStorageChassis: topology integration, GPU chassis + backplane with empty /Drives → exactly 1 candidate selected (backplane only) Docs: - ADL-018 in bible-local/10-decisions.md - Candidate-selection test matrix in bible-local/09-testing.md - SYS-A21GE-NBRT baseline row in docs/test_server_collection_memory.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -79,3 +79,34 @@ When you write a new filter/dedup/classify function, ask:
|
||||
3. If I change the dedup key logic, do existing tests still exercise the old correct behavior?
|
||||
|
||||
If any answer is "no" — add the missing test before committing.
|
||||
|
||||
## Collector candidate-selection functions — mandatory coverage
|
||||
|
||||
Any function that selects paths for an expensive operation (probing, crawling, plan-B retry)
|
||||
**must** have tests covering:
|
||||
|
||||
| Axis | What to test | Why |
|
||||
|------|-------------|-----|
|
||||
| **Positive** | Paths that should be selected ARE selected | Proves the feature works |
|
||||
| **Negative** | Paths that should be excluded ARE excluded | Prevents runaway I/O |
|
||||
| **Topology integration** | Given a realistic `out` map, the count of selected paths matches expectations | Catches implicit coupling between the selector and the surrounding data shape |
|
||||
|
||||
### Worked example — NVMe post-probe regression (2026-03-12)
|
||||
|
||||
`shouldAdaptiveNVMeProbe` was added in `2fa4a12` for Supermicro NVMe backplanes that return
|
||||
`Members: []` but serve disks at `Disk.Bay.N` paths. No topology-level test was added.
|
||||
|
||||
When SYS-A21GE-NBRT (HGX B200) arrived, its 35 sub-chassis (GPU, NVSwitch, PCIeRetimer,
|
||||
ERoT, IRoT, BMC, FPGA) all have `ChassisType=Module/Component/Zone` and empty `/Drives` →
|
||||
all 35 passed the filter → 35 × 384 = 13 440 HTTP requests → 22 min extra per collection.
|
||||
|
||||
A topology integration test (`TestNVMePostProbeSkipsNonStorageChassis`) would have caught
|
||||
this at commit time: given GPU chassis + backplane, exactly 1 candidate must be selected.
|
||||
|
||||
**Required test matrix for any path-selection function:**
|
||||
|
||||
```
|
||||
TestXxx_SelectsTargetPath — the path that motivated the code IS selected
|
||||
TestXxx_SkipsIrrelevantPath — a path that must never be selected IS skipped
|
||||
TestXxx_TopologyCount — given a realistic multi-chassis map, selected count = N
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user