Files
logpile/docs/test_server_collection_memory.md
Mikhail Chusavitin 1eb639e6bf redfish: skip NVMe bay probe for non-storage chassis types (Module/Component/Zone)
On Supermicro HGX systems (SYS-A21GE-NBRT) ~35 sub-chassis (GPU, NVSwitch,
PCIeRetimer, ERoT/IRoT, BMC, FPGA) all carry ChassisType=Module/Component/Zone
and expose empty /Drives collections. shouldAdaptiveNVMeProbe returned true for
all of them, triggering 35 × 384 = 13 440 HTTP requests → ~22 min wasted per
collection (more than half of total 35 min collection time).

Fix: chassisTypeCanHaveNVMe returns false for Module, Component, Zone. The
candidate selection loop in collectRawRedfishTree now checks the parent chassis
doc before adding a /Drives path to the probe list. Enclosure (NVMe backplane),
RackMount, and unknown types are unaffected.

Tests:
- TestChassisTypeCanHaveNVMe: table-driven, covers excluded and storage-capable types
- TestNVMePostProbeSkipsNonStorageChassis: topology integration, GPU chassis +
  backplane with empty /Drives → exactly 1 candidate selected (backplane only)

Docs:
- ADL-018 in bible-local/10-decisions.md
- Candidate-selection test matrix in bible-local/09-testing.md
- SYS-A21GE-NBRT baseline row in docs/test_server_collection_memory.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:38:29 +03:00

2.6 KiB
Raw Blame History

Test Server Collection Memory

Keep this table updated after each test-server run.

Definition:

  • Collection Time = total Redfish collection duration from collect.log.
  • Speed = Documents / seconds.
  • Metrics Collected = sum of Counts fields (cpus + memory + storage + pcie + gpus + nics + psus + firmware).
  • n/a means the log does not contain enough timestamp metadata to calculate duration/speed.

Server Model: NF5688M7

Date (UTC) App Version Collection Time Documents Speed Metrics Collected Notes
2026-02-28 v1.7.1-12-g612058e (612058e) 10m10s (610s) 228 0.37 docs/s 98 2026-02-28 (SERVER MODEL) - 23E100043.zip
2026-02-28 v1.7.1-11-ge0146ad (e0146ad) 9m36s (576s) 138 0.24 docs/s 110 2026-02-28 (SERVER MODEL) - 23E100042.zip
2026-02-28 v1.7.1-10-g9a30705 (9a30705) 20m47s (1247s) 106 0.09 docs/s 97 2026-02-28 (SERVER MODEL) - 23E100053.zip
2026-02-28 v1.7.1 (6c19a58) 15m08s (908s) 184 0.20 docs/s 96 2026-02-28 (DDR5 DIMM) - 23E100051.zip
2026-02-28 v1.7.0 (ddab93a) n/a 193 n/a 61 2026-02-28 (NULL) - 23E100051.zip
2026-02-28 v1.7.0 (ddab93a) n/a 291 n/a 61 2026-02-28 (NULL) - 23E100206.zip

Server Model: SYS-A21GE-NBRT (Supermicro HGX B200)

HGX note: this model has ~40 sub-chassis (GPU, NVSwitch, PCIeRetimer, ERoT/IRoT, BMC, FPGA) all exposing empty /Drives collections. Post-probe NVMe must skip ChassisType=Module/Component/Zone or it probes 35 × 384 = 13 440 URLs → ~22 min wasted. Fixed in the commit that added chassisTypeCanHaveNVMe (2026-03-12). Expected post-probe NVMe time after fix: <5s.

Date (UTC) App Version Collection Time Documents Speed Metrics Collected Notes
2026-03-12 v1.8.0-6-ga9f58b3 (a9f58b3) 35m28s (2128s) — before fix 3197 1.50 docs/s 140 2026-03-12 (SYS-A21GE-NBRT) - A936564X5C17287.zip

Server Model: KR1280-X2-A0-R0-00

Date (UTC) App Version Collection Time Documents Speed Metrics Collected Notes
2026-02-28 v1.7.1-12-g612058e (612058e) 6m15s (375s) 185 0.49 docs/s 46 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657.zip
2026-02-28 v1.7.1-9-g8dbbec3-dirty (8dbbec3) 6m16s (376s) 165 0.44 docs/s 46 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657-2.zip
2026-02-28 v1.7.1-7-gc52fea2 (c52fea2) 10m51s (651s) 227 0.35 docs/s 40 2026-02-28 (KR1280-X2-A0-R0-00) - 23D401657 copy.zip