fix(redfish): gate hgx diagnostic plan-b by debug toggle
This commit is contained in:
@@ -58,6 +58,7 @@ Responses:
|
||||
|
||||
Optional request field:
|
||||
- `power_on_if_host_off`: when `true`, Redfish collection may power on the host before collection if preflight found it powered off
|
||||
- `debug_payloads`: when `true`, collector keeps extra diagnostic payloads and enables extended plan-B retries for slow HGX component inventory branches (`Assembly`, `Accelerators`, `Drives`, `NetworkAdapters`, `PCIeDevices`)
|
||||
|
||||
### `POST /api/collect/probe`
|
||||
|
||||
|
||||
@@ -27,6 +27,7 @@ Request fields passed from the server:
|
||||
- credential field (`password` or token)
|
||||
- `tls_mode`
|
||||
- optional `power_on_if_host_off`
|
||||
- optional `debug_payloads` for extended diagnostics
|
||||
|
||||
### Core rule
|
||||
|
||||
@@ -57,6 +58,17 @@ closes `skipCh` → goroutine in `Collect()` → `cancelCollect()`.
|
||||
|
||||
The skip button is visible during `running` state and hidden once the job reaches a terminal state.
|
||||
|
||||
### Extended diagnostics toggle
|
||||
|
||||
The live collect form exposes a user-facing checkbox for extended diagnostics.
|
||||
|
||||
- default collection prioritizes inventory completeness and bounded runtime
|
||||
- when extended diagnostics is off, heavy HGX component-chassis critical plan-B retries
|
||||
(`Assembly`, `Accelerators`, `Drives`, `NetworkAdapters`, `PCIeDevices`) are skipped
|
||||
- when extended diagnostics is on, those retries are allowed and extra debug payloads are collected
|
||||
|
||||
This toggle is intended for operator-driven deep diagnostics on problematic hosts, not for the default path.
|
||||
|
||||
### Discovery model
|
||||
|
||||
The collector does not rely on one fixed vendor tree.
|
||||
|
||||
@@ -1120,3 +1120,20 @@ incomplete for UI and Reanimator consumers.
|
||||
- System firmware such as BIOS and iBMC versions survives xFusion file exports.
|
||||
- xFusion archives participate more reliably in canonical device/export flows without special UI
|
||||
cases.
|
||||
|
||||
---
|
||||
|
||||
## ADL-043 — Extended HGX diagnostic plan-B is opt-in from the live collect form
|
||||
|
||||
**Date:** 2026-04-13
|
||||
**Context:** Some Supermicro HGX Redfish targets expose slow or hanging component-chassis inventory
|
||||
collections during critical plan-B, especially under `Chassis/HGX_*` for `Assembly`,
|
||||
`Accelerators`, `Drives`, `NetworkAdapters`, and `PCIeDevices`. Default collection should not
|
||||
block operators on deep diagnostic retries that are useful mainly for troubleshooting.
|
||||
**Decision:** Keep the normal snapshot/replay path unchanged, but gate those heavy HGX
|
||||
component-chassis critical plan-B retries behind the existing live-collect `debug_payloads` flag,
|
||||
presented in the UI as "Сбор расширенных данных для диагностики".
|
||||
**Consequences:**
|
||||
- Default live collection skips those heavy diagnostic plan-B retries and reaches replay faster.
|
||||
- Operators can explicitly opt into the slower diagnostic path when they need deeper collection.
|
||||
- The same user-facing toggle continues to enable extra debug payload capture for troubleshooting.
|
||||
|
||||
Reference in New Issue
Block a user