docs: document firmware filter regression pattern in bible (ADL-019)
Root cause analysis for device-bound firmware leaking into hardware.firmware on Supermicro Redfish (SYS-A21GE-NBRT HGX B200): - collectFirmwareInventory (6c19a58) had no coverage for Supermicro naming. isDeviceBoundFirmwareName checked "gpu " / "nic " (space-terminated) while Supermicro uses "GPU1 System Slot0" / "NIC1 System Slot0 ..." (digit suffix). -9c5512dadded _fw_gpu_ / _fw_nvswitch_ / _inforom_gpu_ patterns to fix HGX, but checked DeviceName which contains "Software Inventory" (from Redfish Name), not the firmware Id. Dead code from day one. 09-testing.md: add firmware filter worked example and rule #4 — verify the filter checks the field that the collector actually populates. 10-decisions.md: ADL-019 — isDeviceBoundFirmwareName must be extended per vendor with a test case per vendor format before shipping. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -71,12 +71,34 @@ TestXxx_KeepsDistinct — two different items with same model → 2 res
|
||||
TestXxx_<VendorThatMotivated> — the specific vendor/setup that triggered the code
|
||||
```
|
||||
|
||||
### Worked example — firmware filter regression (2026-03-12)
|
||||
|
||||
`collectFirmwareInventory` was added in `6c19a58` without coverage for Supermicro naming.
|
||||
`isDeviceBoundFirmwareName` had patterns for Dell-style names (`"GPU SomeDevice"`, `"NIC OnboardLAN"`)
|
||||
but Supermicro Redfish uses `"GPU1 System Slot0"` and `"NIC1 System Slot0 ..."` — digit follows
|
||||
immediately after the type prefix. 29 device-bound entries leaked into `hardware.firmware`.
|
||||
|
||||
`9c5512d` attempted to fix this with HGX ID patterns (`_fw_gpu_`, etc.) in the wrong field:
|
||||
the filter checked `DeviceName` but `collectFirmwareInventory` populates it from `Name` first
|
||||
(`"Software Inventory"` for all HGX per-component slots), not from the `Id` field that contains
|
||||
the firmware ID like `"HGX_FW_GPU_SXM_1"`. The patterns were effectively dead code from day one.
|
||||
|
||||
**Required test matrix for any filter function:**
|
||||
|
||||
```
|
||||
TestXxx_FiltersDeviceBound_Dell — Dell-style names that motivated the original code
|
||||
TestXxx_FiltersDeviceBound_Supermicro — Supermicro names with digit suffix (GPU1/NIC1)
|
||||
TestXxx_KeepsSystemLevel — BIOS, BMC, CPLD names must NOT be filtered
|
||||
```
|
||||
|
||||
### Practical rule
|
||||
|
||||
When you write a new filter/dedup/classify function, ask:
|
||||
1. Does my test cover the vendor that motivated this code?
|
||||
2. Does my test cover a *different* vendor or naming convention where the function must NOT fire?
|
||||
3. If I change the dedup key logic, do existing tests still exercise the old correct behavior?
|
||||
4. When the filter checks a field on a model struct, does my test verify that the field is
|
||||
actually populated by the collector? (Dead-code filter pattern: `9c5512d` `_fw_gpu_` check.)
|
||||
|
||||
If any answer is "no" — add the missing test before committing.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user