- Single "Подключиться" button flow: probe first, then show collect options
- Power management checkboxes: power on before / stop after collect
- Modal confirmation when enabling shutdown on already-powered-on host
- StopHostAfterCollect flag: host shuts down only when explicitly requested
- TCP ping (10 attempts, min 3 successes) before Redfish probe
- Debug payloads checkbox (Oem/Ami/Inventory/Crc, off by default)
- Remove platform_config BIOS settings collection (unreliable on AMI)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Collect hardware event logs (last 7 days) from Systems and Managers/SEL LogServices
- Parse AMI raw IPMI dump messages into readable descriptions (Sensor_Type: Event_Type)
- Filter out audit/journal/non-hardware log services; only SEL from Managers
- MSI ghost GPU filter: exclude processor GPU entries with temperature=0 when host is powered on
- Reanimator collected_at uses InventoryData/Status.LastModifiedTime (30-day fallback)
- Invalidate Redfish inventory CRC groups before host power-on
- Log inventory LastModifiedTime age in collection logs
- Drop SecureBoot collection (SecureBootMode, SecureBootDatabases) — not hardware inventory
- Add build version to UI footer via template
- Add MSI Redfish API reference doc to bible-local/docs/
ADL-032–ADL-035
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matches on ServiceRootVendor "xFusion" and OEM namespace "xFusion"
(score 90+). Enables GenericGraphicsControllerDedup unconditionally and
ProcessorGPUFallback when GPU-type processors are present in the snapshot
(xFusion G5500 V7 exposes H200s simultaneously in PCIeDevices,
GraphicsControllers, and Processors/Gpu* — all three need dedup).
Without this profile, xFusion fell into fallback mode which activated all
vendor profiles (Supermicro, HGX, MSI, Dell) unnecessarily. Now resolves
to matched mode with targeted acquisition tuning (120k cap, 75s baseline).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement the full architectural plan: unified ingest.Service entry point
for archive and Redfish payloads, modular redfishprofile package with
composable profiles (generic, ami-family, msi, supermicro, dell,
hgx-topology), score-based profile matching with fallback expansion mode,
and profile-driven acquisition/analysis plans.
Vendor-specific logic moved out of common executors and into profile hooks.
GPU chassis lookup strategies and known storage recovery collections
(IntelVROC/HA-RAID/MRVL) now live in ResolvedAnalysisPlan, populated by
profiles at analysis time. Replay helpers read from the plan; no hardcoded
path lists remain in generic code.
Also splits redfish_replay.go into domain modules (gpu, storage, inventory,
fru, profiles) and adds full fixture/matcher/directive test coverage
including Dell, AMI, unknown-vendor fallback, and deterministic ordering.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause analysis for device-bound firmware leaking into hardware.firmware
on Supermicro Redfish (SYS-A21GE-NBRT HGX B200):
- collectFirmwareInventory (6c19a58) had no coverage for Supermicro naming.
isDeviceBoundFirmwareName checked "gpu " / "nic " (space-terminated) while
Supermicro uses "GPU1 System Slot0" / "NIC1 System Slot0 ..." (digit suffix).
- 9c5512d added _fw_gpu_ / _fw_nvswitch_ / _inforom_gpu_ patterns to fix HGX,
but checked DeviceName which contains "Software Inventory" (from Redfish Name),
not the firmware Id. Dead code from day one.
09-testing.md: add firmware filter worked example and rule #4 — verify the
filter checks the field that the collector actually populates.
10-decisions.md: ADL-019 — isDeviceBoundFirmwareName must be extended per
vendor with a test case per vendor format before shipping.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
isDeviceBoundFirmwareName did not catch Supermicro FirmwareInventory naming
conventions where a digit follows the type prefix directly ("GPU1 System Slot0",
"NIC1 System Slot0 AOM-DP805-IO") instead of a space. Also missing: "Power supply N",
"NVMeController N", and "Software Inventory" (generic label for all HGX per-component
firmware slots — GPU, NVSwitch, PCIeRetimer, ERoT, InfoROM, etc.).
On SYS-A21GE-NBRT (HGX B200) this caused 29 device-bound entries to leak into
hardware.firmware: 8 GPU, 9 NIC, 1 NVMe, 6 PSU, 4 PCIeSwitch, 1 Software Inventory.
Fix: extend isDeviceBoundFirmwareName with patterns for all four new cases.
Add TestIsDeviceBoundFirmwareName covering both excluded and kept entries.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On Supermicro HGX systems (SYS-A21GE-NBRT) ~35 sub-chassis (GPU, NVSwitch,
PCIeRetimer, ERoT/IRoT, BMC, FPGA) all carry ChassisType=Module/Component/Zone
and expose empty /Drives collections. shouldAdaptiveNVMeProbe returned true for
all of them, triggering 35 × 384 = 13 440 HTTP requests → ~22 min wasted per
collection (more than half of total 35 min collection time).
Fix: chassisTypeCanHaveNVMe returns false for Module, Component, Zone. The
candidate selection loop in collectRawRedfishTree now checks the parent chassis
doc before adding a /Drives path to the probe list. Enclosure (NVMe backplane),
RackMount, and unknown types are unaffected.
Tests:
- TestChassisTypeCanHaveNVMe: table-driven, covers excluded and storage-capable types
- TestNVMePostProbeSkipsNonStorageChassis: topology integration, GPU chassis +
backplane with empty /Drives → exactly 1 candidate selected (backplane only)
Docs:
- ADL-018 in bible-local/10-decisions.md
- Candidate-selection test matrix in bible-local/09-testing.md
- SYS-A21GE-NBRT baseline row in docs/test_server_collection_memory.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three bugs, all related to GPU dedup in the Redfish replay pipeline:
1. collectGPUsFromProcessors (redfish_replay.go): GPU-type Processor entries
(Systems/HGX_Baseboard_0/Processors/GPU_SXM_N) were not deduplicated against
existing PCIeDevice GPUs on Supermicro HGX. The chassis-ID lookup keyed on
processor Id ("GPU_SXM_1") but the chassis is named "HGX_GPU_SXM_1" — lookup
returned nothing, serial stayed empty, UUID was unseen → 8 duplicate GPU rows.
Fix: read SerialNumber directly from the Processor doc first; chassis lookup
is now a fallback override (as it was designed for MSI).
2. looksLikeGPU (redfish.go): NVSwitch PCIe devices (Model="NVSwitch",
Manufacturer="NVIDIA") were classified as GPUs because "nvidia" matched the
GPU hint list. Fix: early return false when Model contains "nvswitch".
3. gpuDocDedupKey (redfish.go): commit 9df29b1 changed the dedup key to prefer
slot|model before path, which collapsed two distinct GPUs with identical model
names in GraphicsControllers into one entry. Fix: only serial and BDF are used
as cross-path stable dedup keys; fall back to Redfish path when neither is
present. This also restores TestReplayCollectGPUs_DedupUsesRedfishPathBeforeHeuristics
which had been broken on main since 9df29b1.
Added tests:
- TestCollectGPUsFromProcessors_SupermicroHGX: Processor GPU dedup when
chassis-ID naming convention does not match processor Id
- TestReplayCollectGPUs_DedupCrossChassisSerial: same GPU via two Chassis
PCIeDevice trees with matching serials → collapsed to one
- TestLooksLikeGPU_NVSwitchExcluded: NVSwitch is not a GPU
Added rule to bible-local/09-testing.md: dedup/filter/classify functions must
cover true-positive, true-negative, and the vendor counter-case axes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add /redfish/v1 to redfishCriticalEndpoints so plan-B retries the service
root if it failed during the main crawl. Also downgrade the missing-root
error in ReplayRedfishFromRawPayloads from fatal to a warning so analysis
can complete with defaults when the root doc was not recovered.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Supermicro HGX exposes each GPU under both Chassis/1/PCIeDevices and a
dedicated Chassis/HGX_GPU_SXM_N/PCIeDevices. gpuDocDedupKey was keying
by @odata.id path, so identical GPUs with the same serial were not
deduplicated across sources. Now stable identifiers (serial → BDF →
slot+model) take priority over path.
Also includes Inspur parser improvements: NVMe model/serial enrichment
from devicefrusdr.log and audit.log, RAID drive slot normalization to
BP notation, PSU slot normalization, BMC/CPLD/VR firmware from RESTful
version info section, and parser version bump to 1.8.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On folder selection, filter out duplicate files before conversion:
- First pass: same basename → skip (same filename in different subdirs)
- Second pass: same SHA-256 hash → skip (identical content, different path)
Duplicates are excluded from the convert queue and shown as a warning
in the summary with reason (same name / same content).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Skip FQDD prefixes that are internal AMD EPYC fabric or devices
already captured with richer data from other DCIM views:
- HostBridge/P2PBridge/ISABridge/SMBus.Embedded: AMD internal bus
- AHCI.Embedded: AMD FCH SATA (chipset, not a slot)
- Video.Embedded: BMC Matrox G200eW3, not user-visible
- NIC.Embedded: duplicates DCIM_NICView entries (no model/MAC in PCIe view)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- isUnidentifiablePCIeDevice: skip PCIe entries with generic class
(SingleFunction/MultiFunction) and no model/serial/VendorID — eliminates
PCH bridges, root ports and other bus infrastructure that MSI BMC
enumerates exhaustively (59→9 entries on CG480-S5063)
- collectPCIeDevices: skip entries where looksLikeGPU — prevents GPU
devices from appearing in both hw.GPUs and hw.PCIeDevices (fixed
Inspur H100 duplicate)
- dedupeCanonicalDevices: secondary model+manufacturer match for noKey
items (no serial, no BDF) — merges NetworkAdapter entries into
matching PCIe device entries; isGenericDeviceClass helper for
DeviceClass identity check (fixed Inspur ENFI1100-T4 duplicate)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Parser / archive:
- Add .sds extension as tar-format alias (archive.go)
- Add tests for multipart upload size limits (multipart_limits_test.go)
- Remove supermicro crashdump parser (ADL-015)
Dell parser:
- Remove GPU duplicates from PCIeDevices (DCIM_VideoView vs DCIM_PCIDeviceView
both list the same GPU; VideoView record is authoritative)
Server:
- Add LOGPILE_CONVERT_MAX_MB env var for independent convert batch size limit
- Improve "file too large" error message with current limit value
Web:
- Add CONVERT_MAX_FILES_PER_BATCH = 1000 cap
- Minor UI copy and CSS fixes
Bible:
- bible-local/06-parsers.md: add pci.ids enrichment rule (enrich model from
pciids when name is empty but vendor_id+device_id are present)
- Sync bible submodule and local overview/architecture docs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Dell NICView: strip " - XX:XX:XX:XX:XX:XX" suffix from ProductName
(Dell TSR embeds MAC in this field for every NIC port)
- Dell SoftwareIdentity: same strip applied to ElementName; store FQDD
in FirmwareInfo.Description so exporter can filter device-bound entries
- Exporter: add isDeviceBoundFirmwareFQDD() to filter firmware entries
whose Description matches NIC./PSU./Disk./RAID.Backplane./GPU. FQDD
prefixes (prevents device firmware from appearing in hardware.firmware)
- Exporter: extend isDeviceBoundFirmwareName() to filter HGX GPU/NVSwitch
firmware inventory IDs (_fw_gpu_, _fw_nvswitch_, _inforom_gpu_)
- Inspur: remove HDD firmware from Hardware.Firmware — already present
in Storage.Firmware, duplicating it violates ADL-016
- bible-local/06-parsers.md: document firmware and MAC stripping rules
- bible-local/10-decisions.md: add ADL-016 (device-bound firmware) and
ADL-017 (vendor-embedded MAC in model name fields)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add bible.git as submodule at bible/
- Move docs/bible/ → bible-local/ (project-specific architecture)
- Update CLAUDE.md to reference both bible/ and bible-local/
- Add AGENTS.md for Codex with same structure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>