redfish: skip NVMe bay probe for non-storage chassis types (Module/Component/Zone)

On Supermicro HGX systems (SYS-A21GE-NBRT) ~35 sub-chassis (GPU, NVSwitch,
PCIeRetimer, ERoT/IRoT, BMC, FPGA) all carry ChassisType=Module/Component/Zone
and expose empty /Drives collections. shouldAdaptiveNVMeProbe returned true for
all of them, triggering 35 × 384 = 13 440 HTTP requests → ~22 min wasted per
collection (more than half of total 35 min collection time).

Fix: chassisTypeCanHaveNVMe returns false for Module, Component, Zone. The
candidate selection loop in collectRawRedfishTree now checks the parent chassis
doc before adding a /Drives path to the probe list. Enclosure (NVMe backplane),
RackMount, and unknown types are unaffected.

Tests:
- TestChassisTypeCanHaveNVMe: table-driven, covers excluded and storage-capable types
- TestNVMePostProbeSkipsNonStorageChassis: topology integration, GPU chassis +
  backplane with empty /Drives → exactly 1 candidate selected (backplane only)

Docs:
- ADL-018 in bible-local/10-decisions.md
- Candidate-selection test matrix in bible-local/09-testing.md
- SYS-A21GE-NBRT baseline row in docs/test_server_collection_memory.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mikhail Chusavitin
2026-03-12 13:38:29 +03:00
parent a9f58b3cf4
commit 1eb639e6bf
5 changed files with 187 additions and 0 deletions

View File

@@ -999,6 +999,17 @@ func (c *RedfishConnector) collectRawRedfishTree(ctx context.Context, client *ht
if !shouldAdaptiveNVMeProbe(doc) {
continue
}
// Skip chassis types that cannot contain NVMe storage (e.g. GPU modules,
// RoT components, NVSwitch zones on HGX systems) to avoid probing hundreds
// of Disk.Bay.N URLs against chassis that will never have drives.
chassisPath := strings.TrimSuffix(normalized, "/Drives")
if chassisDocAny, ok := out[chassisPath]; ok {
if chassisDoc, ok := chassisDocAny.(map[string]interface{}); ok {
if !chassisTypeCanHaveNVMe(asString(chassisDoc["ChassisType"])) {
continue
}
}
}
driveCollections = append(driveCollections, normalized)
}
sort.Strings(driveCollections)
@@ -1316,6 +1327,21 @@ func shouldAdaptiveNVMeProbe(collectionDoc map[string]interface{}) bool {
return !redfishCollectionHasExplicitMembers(collectionDoc)
}
// chassisTypeCanHaveNVMe returns false for Redfish ChassisType values that
// represent compute/network/management sub-modules with no storage capability.
// Used to skip expensive Disk.Bay.N probing on HGX GPU, NVSwitch, PCIeRetimer,
// RoT and similar component chassis that expose an empty /Drives collection.
func chassisTypeCanHaveNVMe(chassisType string) bool {
switch strings.ToLower(strings.TrimSpace(chassisType)) {
case "module", // GPU SXM, NVLinkManagementNIC, PCIeRetimer
"component", // ERoT, IRoT, BMC, FPGA sub-chassis
"zone": // HGX_Chassis_0 fabric zone
return false
default:
return true
}
}
func redfishCollectionHasNumericMemberRefs(memberRefs []string) bool {
for _, memberPath := range memberRefs {
if redfishPathTailIsNumeric(memberPath) {