On Supermicro HGX systems (SYS-A21GE-NBRT) ~35 sub-chassis (GPU, NVSwitch, PCIeRetimer, ERoT/IRoT, BMC, FPGA) all carry ChassisType=Module/Component/Zone and expose empty /Drives collections. shouldAdaptiveNVMeProbe returned true for all of them, triggering 35 × 384 = 13 440 HTTP requests → ~22 min wasted per collection (more than half of total 35 min collection time). Fix: chassisTypeCanHaveNVMe returns false for Module, Component, Zone. The candidate selection loop in collectRawRedfishTree now checks the parent chassis doc before adding a /Drives path to the probe list. Enclosure (NVMe backplane), RackMount, and unknown types are unaffected. Tests: - TestChassisTypeCanHaveNVMe: table-driven, covers excluded and storage-capable types - TestNVMePostProbeSkipsNonStorageChassis: topology integration, GPU chassis + backplane with empty /Drives → exactly 1 candidate selected (backplane only) Docs: - ADL-018 in bible-local/10-decisions.md - Candidate-selection test matrix in bible-local/09-testing.md - SYS-A21GE-NBRT baseline row in docs/test_server_collection_memory.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.7 KiB
09 — Testing
Required before merge
go test ./...
All tests must pass before any change is merged.
Where to add tests
| Change area | Test location |
|---|---|
| Collectors | internal/collector/*_test.go |
| HTTP handlers | internal/server/*_test.go |
| Exporters | internal/exporter/*_test.go |
| Parsers | internal/parser/vendors/<vendor>/*_test.go |
Exporter tests
The Reanimator exporter has comprehensive coverage:
| Test file | Coverage |
|---|---|
reanimator_converter_test.go |
Unit tests per conversion function |
reanimator_integration_test.go |
Full export with realistic AnalysisResult |
Run exporter tests only:
go test ./internal/exporter/...
go test ./internal/exporter/... -v -run Reanimator
go test ./internal/exporter/... -cover
Guidelines
- Prefer table-driven tests for parsing logic (multiple input variants).
- Do not rely on network access in unit tests.
- Test both the happy path and edge cases (missing fields, empty collections).
- When adding a new vendor parser, include at minimum:
Detect()test with a positive and a negative sample file list.Parse()test with a minimal but representative archive.
Dedup and filtering functions — mandatory coverage
Any function that deduplicates, filters, or classifies hardware inventory items must have tests covering all three axes before the code is considered done:
| Axis | What to test | Why |
|---|---|---|
| True positive | Items that ARE duplicates are collapsed to one | Proves the function works |
| True negative | Items that are NOT duplicates are kept separate | Proves the function doesn't over-collapse |
| Counter-case | The scenario that motivated the original code still works after changes | Prevents regression from future fixes |
Worked example — GPU dedup regression (2026-03-11)
collectGPUsFromProcessors was added for MSI (chassis Id matches processor Id).
No tests → when Supermicro HGX arrived (chassis Id = "HGX_GPU_SXM_1", processor Id = "GPU_SXM_1"),
the chassis lookup silently returned nothing, serial stayed empty, UUID was new → 8 duplicate GPUs.
Simultaneously, fixing gpuDocDedupKey to use slot|model before path collapsed two distinct
GraphicsControllers GPUs with the same model into one — breaking an existing test that had no
counter-case for the path-fallback scenario.
Required test matrix for any dedup function:
TestXxx_CollapsesDuplicates — same item via two sources → 1 result
TestXxx_KeepsDistinct — two different items with same model → 2 results
TestXxx_<VendorThatMotivated> — the specific vendor/setup that triggered the code
Practical rule
When you write a new filter/dedup/classify function, ask:
- Does my test cover the vendor that motivated this code?
- Does my test cover a different vendor or naming convention where the function must NOT fire?
- If I change the dedup key logic, do existing tests still exercise the old correct behavior?
If any answer is "no" — add the missing test before committing.
Collector candidate-selection functions — mandatory coverage
Any function that selects paths for an expensive operation (probing, crawling, plan-B retry) must have tests covering:
| Axis | What to test | Why |
|---|---|---|
| Positive | Paths that should be selected ARE selected | Proves the feature works |
| Negative | Paths that should be excluded ARE excluded | Prevents runaway I/O |
| Topology integration | Given a realistic out map, the count of selected paths matches expectations |
Catches implicit coupling between the selector and the surrounding data shape |
Worked example — NVMe post-probe regression (2026-03-12)
shouldAdaptiveNVMeProbe was added in 2fa4a12 for Supermicro NVMe backplanes that return
Members: [] but serve disks at Disk.Bay.N paths. No topology-level test was added.
When SYS-A21GE-NBRT (HGX B200) arrived, its 35 sub-chassis (GPU, NVSwitch, PCIeRetimer,
ERoT, IRoT, BMC, FPGA) all have ChassisType=Module/Component/Zone and empty /Drives →
all 35 passed the filter → 35 × 384 = 13 440 HTTP requests → 22 min extra per collection.
A topology integration test (TestNVMePostProbeSkipsNonStorageChassis) would have caught
this at commit time: given GPU chassis + backplane, exactly 1 candidate must be selected.
Required test matrix for any path-selection function:
TestXxx_SelectsTargetPath — the path that motivated the code IS selected
TestXxx_SkipsIrrelevantPath — a path that must never be selected IS skipped
TestXxx_TopologyCount — given a realistic multi-chassis map, selected count = N