Files

Mikhail Chusavitin a9f58b3cf4 redfish: fix GPU duplication on Supermicro HGX, exclude NVSwitch, restore path dedup

Three bugs, all related to GPU dedup in the Redfish replay pipeline:

1. collectGPUsFromProcessors (redfish_replay.go): GPU-type Processor entries
   (Systems/HGX_Baseboard_0/Processors/GPU_SXM_N) were not deduplicated against
   existing PCIeDevice GPUs on Supermicro HGX. The chassis-ID lookup keyed on
   processor Id ("GPU_SXM_1") but the chassis is named "HGX_GPU_SXM_1" — lookup
   returned nothing, serial stayed empty, UUID was unseen → 8 duplicate GPU rows.
   Fix: read SerialNumber directly from the Processor doc first; chassis lookup
   is now a fallback override (as it was designed for MSI).

2. looksLikeGPU (redfish.go): NVSwitch PCIe devices (Model="NVSwitch",
   Manufacturer="NVIDIA") were classified as GPUs because "nvidia" matched the
   GPU hint list. Fix: early return false when Model contains "nvswitch".

3. gpuDocDedupKey (redfish.go): commit 9df29b1 changed the dedup key to prefer
   slot|model before path, which collapsed two distinct GPUs with identical model
   names in GraphicsControllers into one entry. Fix: only serial and BDF are used
   as cross-path stable dedup keys; fall back to Redfish path when neither is
   present. This also restores TestReplayCollectGPUs_DedupUsesRedfishPathBeforeHeuristics
   which had been broken on main since 9df29b1.

Added tests:
- TestCollectGPUsFromProcessors_SupermicroHGX: Processor GPU dedup when
  chassis-ID naming convention does not match processor Id
- TestReplayCollectGPUs_DedupCrossChassisSerial: same GPU via two Chassis
  PCIeDevice trees with matching serials → collapsed to one
- TestLooksLikeGPU_NVSwitchExcluded: NVSwitch is not a GPU

Added rule to bible-local/09-testing.md: dedup/filter/classify functions must
cover true-positive, true-negative, and the vendor counter-case axes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-11 15:09:27 +03:00

3.1 KiB

Raw Blame History

09 — Testing

Required before merge

go test ./...

All tests must pass before any change is merged.

Where to add tests

Change area	Test location
Collectors	`internal/collector/*_test.go`
HTTP handlers	`internal/server/*_test.go`
Exporters	`internal/exporter/*_test.go`
Parsers	`internal/parser/vendors/<vendor>/*_test.go`

Exporter tests

The Reanimator exporter has comprehensive coverage:

Test file	Coverage
`reanimator_converter_test.go`	Unit tests per conversion function
`reanimator_integration_test.go`	Full export with realistic `AnalysisResult`

Run exporter tests only:

go test ./internal/exporter/...
go test ./internal/exporter/... -v -run Reanimator
go test ./internal/exporter/... -cover

Guidelines

Prefer table-driven tests for parsing logic (multiple input variants).
Do not rely on network access in unit tests.
Test both the happy path and edge cases (missing fields, empty collections).
When adding a new vendor parser, include at minimum:
- Detect() test with a positive and a negative sample file list.
- Parse() test with a minimal but representative archive.

Dedup and filtering functions — mandatory coverage

Any function that deduplicates, filters, or classifies hardware inventory items must have tests covering all three axes before the code is considered done:

Axis	What to test	Why
True positive	Items that ARE duplicates are collapsed to one	Proves the function works
True negative	Items that are NOT duplicates are kept separate	Proves the function doesn't over-collapse
Counter-case	The scenario that motivated the original code still works after changes	Prevents regression from future fixes

Worked example — GPU dedup regression (2026-03-11)

collectGPUsFromProcessors was added for MSI (chassis Id matches processor Id). No tests → when Supermicro HGX arrived (chassis Id = "HGX_GPU_SXM_1", processor Id = "GPU_SXM_1"), the chassis lookup silently returned nothing, serial stayed empty, UUID was new → 8 duplicate GPUs.

Simultaneously, fixing gpuDocDedupKey to use slot|model before path collapsed two distinct GraphicsControllers GPUs with the same model into one — breaking an existing test that had no counter-case for the path-fallback scenario.

Required test matrix for any dedup function:

TestXxx_CollapsesDuplicates   — same item via two sources → 1 result
TestXxx_KeepsDistinct          — two different items with same model → 2 results
TestXxx_<VendorThatMotivated>  — the specific vendor/setup that triggered the code

Practical rule

When you write a new filter/dedup/classify function, ask:

Does my test cover the vendor that motivated this code?
Does my test cover a different vendor or naming convention where the function must NOT fire?
If I change the dedup key logic, do existing tests still exercise the old correct behavior?

If any answer is "no" — add the missing test before committing.

3.1 KiB Raw Blame History