Compare commits
12 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1162ccd22e | ||
|
|
3887df6547 | ||
|
|
a82fb227e5 | ||
| c9969fc3da | |||
| 89b6701f43 | |||
| b04877549a | |||
| 8ca173c99b | |||
| f19a3454fa | |||
|
|
becdca1d7e | ||
|
|
e10440ae32 | ||
| 5c2a21aff1 | |||
|
|
9df13327aa |
@@ -58,6 +58,7 @@ Responses:
|
||||
|
||||
Optional request field:
|
||||
- `power_on_if_host_off`: when `true`, Redfish collection may power on the host before collection if preflight found it powered off
|
||||
- `debug_payloads`: when `true`, collector keeps extra diagnostic payloads and enables extended plan-B retries for slow HGX component inventory branches (`Assembly`, `Accelerators`, `Drives`, `NetworkAdapters`, `PCIeDevices`)
|
||||
|
||||
### `POST /api/collect/probe`
|
||||
|
||||
|
||||
@@ -27,6 +27,7 @@ Request fields passed from the server:
|
||||
- credential field (`password` or token)
|
||||
- `tls_mode`
|
||||
- optional `power_on_if_host_off`
|
||||
- optional `debug_payloads` for extended diagnostics
|
||||
|
||||
### Core rule
|
||||
|
||||
@@ -35,18 +36,38 @@ If the collector adds a fallback, probe, or normalization rule, replay must mirr
|
||||
|
||||
### Preflight and host power
|
||||
|
||||
- `Probe()` may be used before collection to verify API connectivity and current host `PowerState`
|
||||
- if the host is off and the user chose power-on, the collector may issue `ComputerSystem.Reset`
|
||||
with `ResetType=On`
|
||||
- power-on attempts are bounded and logged
|
||||
- after a successful power-on, the collector waits an extra stabilization window, then checks
|
||||
`PowerState` again and only starts collection if the host is still on
|
||||
- if the collector powered on the host itself for collection, it must attempt to power it back off
|
||||
after collection completes
|
||||
- if the host was already on before collection, the collector must not power it off afterward
|
||||
- if power-on fails, collection still continues against the powered-off host
|
||||
- all power-control decisions and attempts must be visible in the collection log so they are
|
||||
preserved in raw-export bundles
|
||||
- `Probe()` is used before collection to verify API connectivity and report current host `PowerState`
|
||||
- if the host is off, the collector logs a warning and proceeds with collection; inventory data may
|
||||
be incomplete when the host is powered off
|
||||
- power-on and power-off are not performed by the collector
|
||||
|
||||
### Skip hung requests
|
||||
|
||||
Redfish collection uses a two-level context model:
|
||||
|
||||
- `ctx` — job lifetime context, cancelled only on explicit job cancel
|
||||
- `collectCtx` — collection phase context, derived from `ctx`; covers snapshot, prefetch, and plan-B
|
||||
|
||||
`collectCtx` is cancelled when the user presses "Пропустить зависшие" (skip hung).
|
||||
On skip, all in-flight HTTP requests in the current phase are aborted immediately via context
|
||||
cancellation, the crawler and plan-B loops exit, and execution proceeds to the replay phase using
|
||||
whatever was collected in `rawTree`. The result is partial but valid.
|
||||
|
||||
The skip signal travels: UI button → `POST /api/collect/{id}/skip` → `JobManager.SkipJob()` →
|
||||
closes `skipCh` → goroutine in `Collect()` → `cancelCollect()`.
|
||||
|
||||
The skip button is visible during `running` state and hidden once the job reaches a terminal state.
|
||||
|
||||
### Extended diagnostics toggle
|
||||
|
||||
The live collect form exposes a user-facing checkbox for extended diagnostics.
|
||||
|
||||
- default collection prioritizes inventory completeness and bounded runtime
|
||||
- when extended diagnostics is off, heavy HGX component-chassis critical plan-B retries
|
||||
(`Assembly`, `Accelerators`, `Drives`, `NetworkAdapters`, `PCIeDevices`) are skipped
|
||||
- when extended diagnostics is on, those retries are allowed and extra debug payloads are collected
|
||||
|
||||
This toggle is intended for operator-driven deep diagnostics on problematic hosts, not for the default path.
|
||||
|
||||
### Discovery model
|
||||
|
||||
|
||||
@@ -55,6 +55,7 @@ When `vendor_id` and `device_id` are known but the model name is missing or gene
|
||||
| `h3c_g6` | H3C SDS G6 bundles | Similar flow with G6-specific files |
|
||||
| `hpe_ilo_ahs` | HPE iLO Active Health System (`.ahs`) | Proprietary `ABJR` container with gzip-compressed `zbb` members; parser combines SMBIOS-style inventory strings and embedded Redfish storage JSON |
|
||||
| `inspur` | onekeylog archives | FRU/SDR plus optional Redis enrichment |
|
||||
| `lenovo_xcc` | Lenovo XCC mini-log ZIP archives | JSON inventory + platform event logs |
|
||||
| `nvidia` | HGX Field Diagnostics | GPU- and fabric-heavy diagnostic input |
|
||||
| `nvidia_bug_report` | `nvidia-bug-report-*.log.gz` | dmidecode, lspci, NVIDIA driver sections |
|
||||
| `unraid` | Unraid diagnostics/log bundles | Server and storage-focused parsing |
|
||||
@@ -194,6 +195,7 @@ and `LogDump/` trees.
|
||||
| Reanimator Easy Bee | `easy_bee` | Ready | `bee-support-*.tar.gz` support bundles |
|
||||
| HPE iLO AHS | `hpe_ilo_ahs` | Ready | iLO 6 `.ahs` exports |
|
||||
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
|
||||
| Lenovo XCC mini-log | `lenovo_xcc` | Ready | ThinkSystem SR650 V3 XCC mini-log ZIP |
|
||||
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
|
||||
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
|
||||
| Unraid | `unraid` | Ready | Unraid diagnostics archives |
|
||||
|
||||
@@ -57,6 +57,11 @@ Current behavior:
|
||||
7. Packages any already-present binaries from `bin/`
|
||||
8. Generates `SHA256SUMS.txt`
|
||||
|
||||
Release tag format:
|
||||
- project release tags use `vN.M`
|
||||
- do not create `vN.M.P` tags for LOGPile releases
|
||||
- release artifacts and `main.version` inherit the exact git tag string
|
||||
|
||||
Important limitation:
|
||||
- `scripts/release.sh` does not run `make build-all` for you
|
||||
- if you want Linux or additional macOS archives in the release directory, build them before running the script
|
||||
|
||||
@@ -1120,3 +1120,37 @@ incomplete for UI and Reanimator consumers.
|
||||
- System firmware such as BIOS and iBMC versions survives xFusion file exports.
|
||||
- xFusion archives participate more reliably in canonical device/export flows without special UI
|
||||
cases.
|
||||
|
||||
---
|
||||
|
||||
## ADL-043 — Extended HGX diagnostic plan-B is opt-in from the live collect form
|
||||
|
||||
**Date:** 2026-04-13
|
||||
**Context:** Some Supermicro HGX Redfish targets expose slow or hanging component-chassis inventory
|
||||
collections during critical plan-B, especially under `Chassis/HGX_*` for `Assembly`,
|
||||
`Accelerators`, `Drives`, `NetworkAdapters`, and `PCIeDevices`. Default collection should not
|
||||
block operators on deep diagnostic retries that are useful mainly for troubleshooting.
|
||||
**Decision:** Keep the normal snapshot/replay path unchanged, but gate those heavy HGX
|
||||
component-chassis critical plan-B retries behind the existing live-collect `debug_payloads` flag,
|
||||
presented in the UI as "Сбор расширенных данных для диагностики".
|
||||
**Consequences:**
|
||||
- Default live collection skips those heavy diagnostic plan-B retries and reaches replay faster.
|
||||
- Operators can explicitly opt into the slower diagnostic path when they need deeper collection.
|
||||
- The same user-facing toggle continues to enable extra debug payload capture for troubleshooting.
|
||||
|
||||
---
|
||||
|
||||
## ADL-044 — LOGPile project release tags use `vN.M`
|
||||
|
||||
**Date:** 2026-04-13
|
||||
**Context:** The repository accumulated release tags in `vN.M.P` form, while the shared module
|
||||
versioning contract in `bible/rules/patterns/module-versioning/contract.md` standardizes version
|
||||
shape as `N.M`. Release tooling reads the git tag verbatim into build metadata and release
|
||||
artifacts, so inconsistent tag shape leaks directly into packaged versions.
|
||||
**Decision:** Use `vN.M` for LOGPile project release tags going forward. Do not create new
|
||||
`vN.M.P` tags for repository releases. Build metadata, release directory names, and release notes
|
||||
continue to inherit the exact git tag string from `git describe --tags`.
|
||||
**Consequences:**
|
||||
- Future project releases have a two-component version string such as `v1.12`.
|
||||
- Release artifacts and `--version` output stay aligned with the tag shape without extra mapping.
|
||||
- Existing historical `vN.M.P` tags remain as-is unless explicitly rewritten.
|
||||
|
||||
@@ -112,12 +112,11 @@ func (c *RedfishConnector) Probe(ctx context.Context, req Request) (*ProbeResult
|
||||
}
|
||||
powerState := redfishSystemPowerState(systemDoc)
|
||||
return &ProbeResult{
|
||||
Reachable: true,
|
||||
Protocol: "redfish",
|
||||
HostPowerState: powerState,
|
||||
HostPoweredOn: isRedfishHostPoweredOn(powerState),
|
||||
PowerControlAvailable: redfishResetActionTarget(systemDoc) != "",
|
||||
SystemPath: primarySystem,
|
||||
Reachable: true,
|
||||
Protocol: "redfish",
|
||||
HostPowerState: powerState,
|
||||
HostPoweredOn: isRedfishHostPoweredOn(powerState),
|
||||
SystemPath: primarySystem,
|
||||
}, nil
|
||||
}
|
||||
|
||||
@@ -160,17 +159,6 @@ func (c *RedfishConnector) Collect(ctx context.Context, req Request, emit Progre
|
||||
|
||||
systemPaths := c.discoverMemberPaths(discoveryCtx, snapshotClient, req, baseURL, "/redfish/v1/Systems", "/redfish/v1/Systems/1")
|
||||
primarySystem := firstNonEmptyPath(systemPaths, "/redfish/v1/Systems/1")
|
||||
if primarySystem != "" {
|
||||
c.ensureHostPowerForCollection(ctx, snapshotClient, req, baseURL, primarySystem, emit)
|
||||
}
|
||||
defer func() {
|
||||
if primarySystem == "" || !req.StopHostAfterCollect {
|
||||
return
|
||||
}
|
||||
shutdownCtx, cancel := context.WithTimeout(context.Background(), 45*time.Second)
|
||||
defer cancel()
|
||||
c.restoreHostPowerAfterCollection(shutdownCtx, snapshotClient, req, baseURL, primarySystem, emit)
|
||||
}()
|
||||
chassisPaths := c.discoverMemberPaths(discoveryCtx, snapshotClient, req, baseURL, "/redfish/v1/Chassis", "/redfish/v1/Chassis/1")
|
||||
managerPaths := c.discoverMemberPaths(discoveryCtx, snapshotClient, req, baseURL, "/redfish/v1/Managers", "/redfish/v1/Managers/1")
|
||||
primaryChassis := firstNonEmptyPath(chassisPaths, "/redfish/v1/Chassis/1")
|
||||
@@ -269,12 +257,35 @@ func (c *RedfishConnector) Collect(ctx context.Context, req Request, emit Progre
|
||||
emit(Progress{Status: "running", Progress: 80, Message: "Redfish: подготовка расширенного snapshot...", CurrentPhase: "snapshot", ETASeconds: acquisitionPlan.Tuning.ETABaseline.SnapshotSeconds})
|
||||
emit(Progress{Status: "running", Progress: 90, Message: "Redfish: сбор расширенного snapshot...", CurrentPhase: "snapshot", ETASeconds: acquisitionPlan.Tuning.ETABaseline.SnapshotSeconds})
|
||||
}
|
||||
// collectCtx covers all data-fetching phases (snapshot, prefetch, plan-B).
|
||||
// Cancelling it via the skip signal aborts only the collection phases while
|
||||
// leaving the replay phase intact so results from already-fetched data are preserved.
|
||||
collectCtx, cancelCollect := context.WithCancel(ctx)
|
||||
defer cancelCollect()
|
||||
if req.SkipHungCh != nil {
|
||||
go func() {
|
||||
select {
|
||||
case <-req.SkipHungCh:
|
||||
if emit != nil {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 97,
|
||||
Message: "Redfish: пропуск зависших запросов, анализ уже собранных данных...",
|
||||
})
|
||||
}
|
||||
log.Printf("redfish: skip-hung triggered, cancelling collection phases")
|
||||
cancelCollect()
|
||||
case <-ctx.Done():
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
c.debugSnapshotf("snapshot crawl start host=%s port=%d", req.Host, req.Port)
|
||||
rawTree, fetchErrors, postProbeMetrics, snapshotTimingSummary := c.collectRawRedfishTree(withRedfishTelemetryPhase(ctx, "snapshot"), snapshotClient, req, baseURL, seedPaths, acquisitionPlan.Tuning, emit)
|
||||
rawTree, fetchErrors, postProbeMetrics, snapshotTimingSummary := c.collectRawRedfishTree(withRedfishTelemetryPhase(collectCtx, "snapshot"), snapshotClient, req, baseURL, seedPaths, acquisitionPlan.Tuning, emit)
|
||||
c.debugSnapshotf("snapshot crawl done docs=%d", len(rawTree))
|
||||
fetchErrMap := redfishFetchErrorListToMap(fetchErrors)
|
||||
|
||||
prefetchedCritical, prefetchMetrics := c.prefetchCriticalRedfishDocs(withRedfishTelemetryPhase(ctx, "prefetch"), prefetchClient, req, baseURL, criticalPaths, rawTree, fetchErrMap, acquisitionPlan.Tuning, emit)
|
||||
prefetchedCritical, prefetchMetrics := c.prefetchCriticalRedfishDocs(withRedfishTelemetryPhase(collectCtx, "prefetch"), prefetchClient, req, baseURL, criticalPaths, rawTree, fetchErrMap, acquisitionPlan.Tuning, emit)
|
||||
for p, doc := range prefetchedCritical {
|
||||
if _, exists := rawTree[p]; exists {
|
||||
continue
|
||||
@@ -295,10 +306,10 @@ func (c *RedfishConnector) Collect(ctx context.Context, req Request, emit Progre
|
||||
prefetchMetrics.Duration.Round(time.Millisecond),
|
||||
firstNonEmpty(prefetchMetrics.SkipReason, "-"),
|
||||
)
|
||||
if recoveredN := c.recoverCriticalRedfishDocsPlanB(withRedfishTelemetryPhase(ctx, "critical_plan_b"), criticalClient, req, baseURL, criticalPaths, rawTree, fetchErrMap, acquisitionPlan.Tuning, emit); recoveredN > 0 {
|
||||
if recoveredN := c.recoverCriticalRedfishDocsPlanB(withRedfishTelemetryPhase(collectCtx, "critical_plan_b"), criticalClient, req, baseURL, criticalPaths, rawTree, fetchErrMap, acquisitionPlan.Tuning, emit); recoveredN > 0 {
|
||||
c.debugSnapshotf("critical plan-b recovered docs=%d", recoveredN)
|
||||
}
|
||||
if recoveredN := c.recoverProfilePlanBDocs(withRedfishTelemetryPhase(ctx, "profile_plan_b"), criticalClient, req, baseURL, acquisitionPlan, rawTree, emit); recoveredN > 0 {
|
||||
if recoveredN := c.recoverProfilePlanBDocs(withRedfishTelemetryPhase(collectCtx, "profile_plan_b"), criticalClient, req, baseURL, acquisitionPlan, rawTree, emit); recoveredN > 0 {
|
||||
c.debugSnapshotf("profile plan-b recovered docs=%d", recoveredN)
|
||||
}
|
||||
// Hide transient fetch errors for endpoints that were eventually recovered into rawTree.
|
||||
@@ -334,8 +345,9 @@ func (c *RedfishConnector) Collect(ctx context.Context, req Request, emit Progre
|
||||
"manager_critical_suffixes": acquisitionPlan.ScopedPaths.ManagerCriticalSuffixes,
|
||||
},
|
||||
"tuning": map[string]any{
|
||||
"snapshot_max_documents": acquisitionPlan.Tuning.SnapshotMaxDocuments,
|
||||
"snapshot_workers": acquisitionPlan.Tuning.SnapshotWorkers,
|
||||
"snapshot_max_documents": acquisitionPlan.Tuning.SnapshotMaxDocuments,
|
||||
"snapshot_workers": acquisitionPlan.Tuning.SnapshotWorkers,
|
||||
"snapshot_exclude_contains": acquisitionPlan.Tuning.SnapshotExcludeContains,
|
||||
"prefetch_workers": acquisitionPlan.Tuning.PrefetchWorkers,
|
||||
"prefetch_enabled": boolPointerValue(acquisitionPlan.Tuning.PrefetchEnabled),
|
||||
"nvme_post_probe": boolPointerValue(acquisitionPlan.Tuning.NVMePostProbeEnabled),
|
||||
@@ -485,231 +497,6 @@ func (c *RedfishConnector) Collect(ctx context.Context, req Request, emit Progre
|
||||
return result, nil
|
||||
}
|
||||
|
||||
func (c *RedfishConnector) ensureHostPowerForCollection(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string, emit ProgressFn) (hostOn bool, poweredOnByCollector bool) {
|
||||
systemDoc, err := c.getJSON(ctx, client, req, baseURL, systemPath)
|
||||
if err != nil {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 18, Message: "Redfish: не удалось проверить PowerState host, сбор продолжается без power-control"})
|
||||
}
|
||||
return false, false
|
||||
}
|
||||
|
||||
powerState := redfishSystemPowerState(systemDoc)
|
||||
if isRedfishHostPoweredOn(powerState) {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 18, Message: fmt.Sprintf("Redfish: host включен (%s)", firstNonEmpty(powerState, "On"))})
|
||||
}
|
||||
return true, false
|
||||
}
|
||||
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 18, Message: fmt.Sprintf("Redfish: host выключен (%s)", firstNonEmpty(powerState, "Off"))})
|
||||
}
|
||||
if !req.PowerOnIfHostOff {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 19, Message: "Redfish: включение host не запрошено, сбор продолжается на выключенном host"})
|
||||
}
|
||||
return false, false
|
||||
}
|
||||
|
||||
// Invalidate all inventory CRC groups before powering on so the BMC accepts
|
||||
// fresh inventory from the host after boot. Best-effort: failure is logged but
|
||||
// does not block power-on.
|
||||
c.invalidateRedfishInventory(ctx, client, req, baseURL, systemPath, emit)
|
||||
|
||||
resetTarget := redfishResetActionTarget(systemDoc)
|
||||
resetType := redfishPickResetType(systemDoc, "On", "ForceOn")
|
||||
if resetTarget == "" || resetType == "" {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 19, Message: "Redfish: action ComputerSystem.Reset недоступен, сбор продолжается на выключенном host"})
|
||||
}
|
||||
return false, false
|
||||
}
|
||||
|
||||
waitWindows := []time.Duration{5 * time.Second, 10 * time.Second, 30 * time.Second}
|
||||
for i, waitFor := range waitWindows {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 19, Message: fmt.Sprintf("Redfish: попытка включения host (%d/%d), ожидание %s", i+1, len(waitWindows), waitFor)})
|
||||
}
|
||||
if err := c.postJSON(ctx, client, req, baseURL, resetTarget, map[string]any{"ResetType": resetType}); err != nil {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 19, Message: fmt.Sprintf("Redfish: включение host не удалось (%v)", err)})
|
||||
}
|
||||
continue
|
||||
}
|
||||
if c.waitForHostPowerState(ctx, client, req, baseURL, systemPath, true, waitFor) {
|
||||
if !c.waitForStablePoweredOnHost(ctx, client, req, baseURL, systemPath, emit) {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 20, Message: "Redfish: host включился, но не подтвердил стабильное состояние; сбор продолжается на выключенном host"})
|
||||
}
|
||||
return false, false
|
||||
}
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 20, Message: "Redfish: host успешно включен и стабилен перед сбором"})
|
||||
}
|
||||
return true, true
|
||||
}
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 20, Message: fmt.Sprintf("Redfish: host не включился за %s", waitFor)})
|
||||
}
|
||||
}
|
||||
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 20, Message: "Redfish: host не удалось включить, сбор продолжается на выключенном host"})
|
||||
}
|
||||
return false, false
|
||||
}
|
||||
|
||||
func (c *RedfishConnector) waitForStablePoweredOnHost(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string, emit ProgressFn) bool {
|
||||
stabilizationDelay := redfishPowerOnStabilizationDelay()
|
||||
if stabilizationDelay > 0 {
|
||||
if emit != nil {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 20,
|
||||
Message: fmt.Sprintf("Redfish: host включен, ожидание стабилизации %s перед началом сбора", stabilizationDelay),
|
||||
})
|
||||
}
|
||||
timer := time.NewTimer(stabilizationDelay)
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
timer.Stop()
|
||||
return false
|
||||
case <-timer.C:
|
||||
timer.Stop()
|
||||
}
|
||||
}
|
||||
if emit != nil {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 20,
|
||||
Message: "Redfish: повторная проверка PowerState после стабилизации host",
|
||||
})
|
||||
}
|
||||
if !c.waitForHostPowerState(ctx, client, req, baseURL, systemPath, true, 5*time.Second) {
|
||||
return false
|
||||
}
|
||||
|
||||
// After the initial stabilization wait, the BMC may still be populating its
|
||||
// hardware inventory (PCIeDevices, memory summary). Poll readiness with
|
||||
// increasing back-off (default: +60s, +120s), then warn and proceed.
|
||||
readinessWaits := redfishBMCReadinessWaits()
|
||||
for attempt, extraWait := range readinessWaits {
|
||||
ready, reason := c.isBMCInventoryReady(ctx, client, req, baseURL, systemPath)
|
||||
if ready {
|
||||
if emit != nil {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 20,
|
||||
Message: fmt.Sprintf("Redfish: BMC готов (%s)", reason),
|
||||
})
|
||||
}
|
||||
return true
|
||||
}
|
||||
if emit != nil {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 20,
|
||||
Message: fmt.Sprintf("Redfish: BMC не готов (%s), ожидание %s (попытка %d/%d)", reason, extraWait, attempt+1, len(readinessWaits)),
|
||||
})
|
||||
}
|
||||
timer := time.NewTimer(extraWait)
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
timer.Stop()
|
||||
return false
|
||||
case <-timer.C:
|
||||
timer.Stop()
|
||||
}
|
||||
if emit != nil {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 20,
|
||||
Message: fmt.Sprintf("Redfish: повторная проверка готовности BMC (%d/%d)...", attempt+1, len(readinessWaits)),
|
||||
})
|
||||
}
|
||||
}
|
||||
ready, reason := c.isBMCInventoryReady(ctx, client, req, baseURL, systemPath)
|
||||
if !ready {
|
||||
if emit != nil {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 20,
|
||||
Message: fmt.Sprintf("Redfish: WARNING — BMC не подтвердил готовность (%s), сбор может быть неполным", reason),
|
||||
})
|
||||
}
|
||||
} else if emit != nil {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 20,
|
||||
Message: fmt.Sprintf("Redfish: BMC готов (%s)", reason),
|
||||
})
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
// isBMCInventoryReady checks whether the BMC has finished populating its
|
||||
// hardware inventory after a power-on. Returns (ready, reason).
|
||||
// It considers the BMC ready if either the system memory summary reports
|
||||
// a non-zero total or the PCIeDevices collection is non-empty.
|
||||
func (c *RedfishConnector) isBMCInventoryReady(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string) (bool, string) {
|
||||
systemDoc, err := c.getJSON(ctx, client, req, baseURL, systemPath)
|
||||
if err != nil {
|
||||
return false, "не удалось прочитать System"
|
||||
}
|
||||
if summary, ok := systemDoc["MemorySummary"].(map[string]interface{}); ok {
|
||||
if asFloat(summary["TotalSystemMemoryGiB"]) > 0 {
|
||||
return true, "MemorySummary заполнен"
|
||||
}
|
||||
}
|
||||
pcieDoc, err := c.getJSON(ctx, client, req, baseURL, joinPath(systemPath, "/PCIeDevices"))
|
||||
if err == nil {
|
||||
if asInt(pcieDoc["Members@odata.count"]) > 0 {
|
||||
return true, "PCIeDevices не пуст"
|
||||
}
|
||||
if members, ok := pcieDoc["Members"].([]interface{}); ok && len(members) > 0 {
|
||||
return true, "PCIeDevices не пуст"
|
||||
}
|
||||
}
|
||||
return false, "MemorySummary=0, PCIeDevices пуст"
|
||||
}
|
||||
|
||||
func (c *RedfishConnector) restoreHostPowerAfterCollection(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string, emit ProgressFn) {
|
||||
systemDoc, err := c.getJSON(ctx, client, req, baseURL, systemPath)
|
||||
if err != nil {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 100, Message: "Redfish: не удалось повторно прочитать system перед выключением host"})
|
||||
}
|
||||
return
|
||||
}
|
||||
resetTarget := redfishResetActionTarget(systemDoc)
|
||||
resetType := redfishPickResetType(systemDoc, "GracefulShutdown", "ForceOff", "PushPowerButton")
|
||||
if resetTarget == "" || resetType == "" {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 100, Message: "Redfish: выключение host после сбора недоступно"})
|
||||
}
|
||||
return
|
||||
}
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 100, Message: "Redfish: выключаем host после завершения сбора"})
|
||||
}
|
||||
if err := c.postJSON(ctx, client, req, baseURL, resetTarget, map[string]any{"ResetType": resetType}); err != nil {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 100, Message: fmt.Sprintf("Redfish: не удалось выключить host после сбора (%v)", err)})
|
||||
}
|
||||
return
|
||||
}
|
||||
if c.waitForHostPowerState(ctx, client, req, baseURL, systemPath, false, 20*time.Second) {
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 100, Message: "Redfish: host выключен после завершения сбора"})
|
||||
}
|
||||
return
|
||||
}
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 100, Message: "Redfish: не удалось подтвердить выключение host после сбора"})
|
||||
}
|
||||
}
|
||||
|
||||
// collectDebugPayloads fetches vendor-specific diagnostic endpoints on a best-effort basis.
|
||||
// Results are stored in rawPayloads["redfish_debug_payloads"] and exported with the bundle.
|
||||
// Enabled only when Request.DebugPayloads is true.
|
||||
@@ -724,50 +511,6 @@ func (c *RedfishConnector) collectDebugPayloads(ctx context.Context, client *htt
|
||||
return out
|
||||
}
|
||||
|
||||
// invalidateRedfishInventory POSTs to the AMI/MSI InventoryCrc endpoint to zero out
|
||||
// all known CRC groups before a host power-on. This causes the BMC to accept fresh
|
||||
// inventory from the host after boot, preventing stale inventory (ghost GPUs, wrong
|
||||
// BIOS version, etc.) from persisting across hardware changes.
|
||||
// Best-effort: any error is logged and the call silently returns.
|
||||
func (c *RedfishConnector) invalidateRedfishInventory(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string, emit ProgressFn) {
|
||||
crcPath := joinPath(systemPath, "/Oem/Ami/Inventory/Crc")
|
||||
body := map[string]any{
|
||||
"GroupCrcList": []map[string]any{
|
||||
{"CPU": 0},
|
||||
{"DIMM": 0},
|
||||
{"PCIE": 0},
|
||||
},
|
||||
}
|
||||
if err := c.postJSON(ctx, client, req, baseURL, crcPath, body); err != nil {
|
||||
log.Printf("redfish: inventory invalidation skipped (not AMI/MSI or endpoint unavailable): %v", err)
|
||||
return
|
||||
}
|
||||
log.Printf("redfish: inventory CRC groups invalidated at %s before host power-on", crcPath)
|
||||
if emit != nil {
|
||||
emit(Progress{Status: "running", Progress: 19, Message: "Redfish: инвентарь BMC инвалидирован перед включением host (все CRC группы сброшены)"})
|
||||
}
|
||||
}
|
||||
|
||||
func (c *RedfishConnector) waitForHostPowerState(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string, wantOn bool, timeout time.Duration) bool {
|
||||
deadline := time.Now().Add(timeout)
|
||||
for {
|
||||
systemDoc, err := c.getJSON(ctx, client, req, baseURL, systemPath)
|
||||
if err == nil {
|
||||
if isRedfishHostPoweredOn(redfishSystemPowerState(systemDoc)) == wantOn {
|
||||
return true
|
||||
}
|
||||
}
|
||||
if time.Now().After(deadline) {
|
||||
return false
|
||||
}
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return false
|
||||
case <-time.After(1 * time.Second):
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func firstNonEmptyPath(paths []string, fallback string) string {
|
||||
for _, p := range paths {
|
||||
if strings.TrimSpace(p) != "" {
|
||||
@@ -799,49 +542,6 @@ func redfishSystemPowerState(systemDoc map[string]interface{}) string {
|
||||
return ""
|
||||
}
|
||||
|
||||
func redfishResetActionTarget(systemDoc map[string]interface{}) string {
|
||||
if systemDoc == nil {
|
||||
return ""
|
||||
}
|
||||
actions, _ := systemDoc["Actions"].(map[string]interface{})
|
||||
reset, _ := actions["#ComputerSystem.Reset"].(map[string]interface{})
|
||||
target := strings.TrimSpace(asString(reset["target"]))
|
||||
if target != "" {
|
||||
return target
|
||||
}
|
||||
odataID := strings.TrimSpace(asString(systemDoc["@odata.id"]))
|
||||
if odataID == "" {
|
||||
return ""
|
||||
}
|
||||
return joinPath(odataID, "/Actions/ComputerSystem.Reset")
|
||||
}
|
||||
|
||||
func redfishPickResetType(systemDoc map[string]interface{}, preferred ...string) string {
|
||||
actions, _ := systemDoc["Actions"].(map[string]interface{})
|
||||
reset, _ := actions["#ComputerSystem.Reset"].(map[string]interface{})
|
||||
allowedRaw, _ := reset["ResetType@Redfish.AllowableValues"].([]interface{})
|
||||
if len(allowedRaw) == 0 {
|
||||
if len(preferred) > 0 {
|
||||
return preferred[0]
|
||||
}
|
||||
return ""
|
||||
}
|
||||
allowed := make([]string, 0, len(allowedRaw))
|
||||
for _, item := range allowedRaw {
|
||||
if v := strings.TrimSpace(asString(item)); v != "" {
|
||||
allowed = append(allowed, v)
|
||||
}
|
||||
}
|
||||
for _, want := range preferred {
|
||||
for _, have := range allowed {
|
||||
if strings.EqualFold(want, have) {
|
||||
return have
|
||||
}
|
||||
}
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
func (c *RedfishConnector) postJSON(ctx context.Context, client *http.Client, req Request, baseURL, resourcePath string, payload map[string]any) error {
|
||||
body, err := json.Marshal(payload)
|
||||
if err != nil {
|
||||
@@ -1644,6 +1344,11 @@ func (c *RedfishConnector) collectRawRedfishTree(ctx context.Context, client *ht
|
||||
if !shouldCrawlPath(path) {
|
||||
return
|
||||
}
|
||||
for _, pattern := range tuning.SnapshotExcludeContains {
|
||||
if pattern != "" && strings.Contains(path, pattern) {
|
||||
return
|
||||
}
|
||||
}
|
||||
mu.Lock()
|
||||
if len(seen) >= maxDocuments {
|
||||
mu.Unlock()
|
||||
@@ -2597,34 +2302,6 @@ func redfishCriticalSlowGap() time.Duration {
|
||||
return 1200 * time.Millisecond
|
||||
}
|
||||
|
||||
func redfishPowerOnStabilizationDelay() time.Duration {
|
||||
if v := strings.TrimSpace(os.Getenv("LOGPILE_REDFISH_POWERON_STABILIZATION")); v != "" {
|
||||
if d, err := time.ParseDuration(v); err == nil && d >= 0 {
|
||||
return d
|
||||
}
|
||||
}
|
||||
return 60 * time.Second
|
||||
}
|
||||
|
||||
// redfishBMCReadinessWaits returns the extra wait durations used when polling
|
||||
// BMC inventory readiness after power-on. Defaults: [60s, 120s].
|
||||
// Override with LOGPILE_REDFISH_BMC_READY_WAITS (comma-separated durations,
|
||||
// e.g. "60s,120s").
|
||||
func redfishBMCReadinessWaits() []time.Duration {
|
||||
if v := strings.TrimSpace(os.Getenv("LOGPILE_REDFISH_BMC_READY_WAITS")); v != "" {
|
||||
var out []time.Duration
|
||||
for _, part := range strings.Split(v, ",") {
|
||||
if d, err := time.ParseDuration(strings.TrimSpace(part)); err == nil && d >= 0 {
|
||||
out = append(out, d)
|
||||
}
|
||||
}
|
||||
if len(out) > 0 {
|
||||
return out
|
||||
}
|
||||
}
|
||||
return []time.Duration{60 * time.Second, 120 * time.Second}
|
||||
}
|
||||
|
||||
func redfishSnapshotMemoryRequestTimeout() time.Duration {
|
||||
if v := strings.TrimSpace(os.Getenv("LOGPILE_REDFISH_MEMORY_TIMEOUT")); v != "" {
|
||||
if d, err := time.ParseDuration(v); err == nil && d > 0 {
|
||||
@@ -3203,11 +2880,16 @@ func (c *RedfishConnector) recoverCriticalRedfishDocsPlanB(ctx context.Context,
|
||||
timings := newRedfishPathTimingCollector(4)
|
||||
var targets []string
|
||||
seenTargets := make(map[string]struct{})
|
||||
skippedDiagnosticTargets := 0
|
||||
addTarget := func(path string) {
|
||||
path = normalizeRedfishPath(path)
|
||||
if path == "" {
|
||||
return
|
||||
}
|
||||
if !shouldIncludeCriticalPlanBPath(req, path) {
|
||||
skippedDiagnosticTargets++
|
||||
return
|
||||
}
|
||||
if _, ok := seenTargets[path]; ok {
|
||||
return
|
||||
}
|
||||
@@ -3293,6 +2975,13 @@ func (c *RedfishConnector) recoverCriticalRedfishDocsPlanB(ctx context.Context,
|
||||
return 0
|
||||
}
|
||||
if emit != nil {
|
||||
if skippedDiagnosticTargets > 0 {
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
Progress: 97,
|
||||
Message: fmt.Sprintf("Redfish: расширенная диагностика выключена, пропущено %d тяжелых diagnostic endpoint", skippedDiagnosticTargets),
|
||||
})
|
||||
}
|
||||
totalETA := redfishCriticalCooldown() + estimatePlanBETA(len(targets))
|
||||
emit(Progress{
|
||||
Status: "running",
|
||||
@@ -3398,6 +3087,39 @@ func (c *RedfishConnector) recoverCriticalRedfishDocsPlanB(ctx context.Context,
|
||||
return recovered
|
||||
}
|
||||
|
||||
func shouldIncludeCriticalPlanBPath(req Request, path string) bool {
|
||||
if req.DebugPayloads {
|
||||
return true
|
||||
}
|
||||
return !isExtendedDiagnosticCriticalPlanBPath(path)
|
||||
}
|
||||
|
||||
func isExtendedDiagnosticCriticalPlanBPath(path string) bool {
|
||||
path = normalizeRedfishPath(path)
|
||||
if path == "" {
|
||||
return false
|
||||
}
|
||||
parts := strings.Split(strings.Trim(path, "/"), "/")
|
||||
if len(parts) < 5 || parts[0] != "redfish" || parts[1] != "v1" || parts[2] != "Chassis" {
|
||||
return false
|
||||
}
|
||||
if !strings.HasPrefix(parts[3], "HGX_") {
|
||||
return false
|
||||
}
|
||||
for _, suffix := range []string{
|
||||
"/Accelerators",
|
||||
"/Assembly",
|
||||
"/Drives",
|
||||
"/NetworkAdapters",
|
||||
"/PCIeDevices",
|
||||
} {
|
||||
if strings.HasSuffix(path, suffix) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func (c *RedfishConnector) recoverProfilePlanBDocs(ctx context.Context, client *http.Client, req Request, baseURL string, plan redfishprofile.AcquisitionPlan, rawTree map[string]interface{}, emit ProgressFn) int {
|
||||
if len(plan.PlanBPaths) == 0 || plan.Mode == redfishprofile.ModeFallback || !plan.Tuning.RecoveryPolicy.EnableProfilePlanB {
|
||||
return 0
|
||||
@@ -3917,7 +3639,7 @@ func parseNIC(doc map[string]interface{}) models.NetworkAdapter {
|
||||
}
|
||||
if pcieIf, ok := ctrl["PCIeInterface"].(map[string]interface{}); ok && linkWidth == 0 && maxLinkWidth == 0 && linkSpeed == "" && maxLinkSpeed == "" {
|
||||
linkWidth = asInt(pcieIf["LanesInUse"])
|
||||
maxLinkWidth = asInt(pcieIf["MaxLanes"])
|
||||
maxLinkWidth = firstNonZeroInt(asInt(pcieIf["MaxLanes"]), asInt(pcieIf["Maxlanes"]))
|
||||
linkSpeed = firstNonEmpty(asString(pcieIf["PCIeType"]), asString(pcieIf["CurrentLinkSpeedGTs"]), asString(pcieIf["CurrentLinkSpeed"]))
|
||||
maxLinkSpeed = firstNonEmpty(asString(pcieIf["MaxPCIeType"]), asString(pcieIf["MaxLinkSpeedGTs"]), asString(pcieIf["MaxLinkSpeed"]))
|
||||
}
|
||||
@@ -4030,6 +3752,9 @@ func enrichNICFromPCIe(nic *models.NetworkAdapter, pcieDoc map[string]interface{
|
||||
if strings.TrimSpace(nic.MaxLinkSpeed) == "" {
|
||||
nic.MaxLinkSpeed = firstNonEmpty(asString(pcieDoc["MaxLinkSpeedGTs"]), asString(pcieDoc["MaxLinkSpeed"]))
|
||||
}
|
||||
if nic.LinkWidth == 0 || nic.MaxLinkWidth == 0 || nic.LinkSpeed == "" || nic.MaxLinkSpeed == "" {
|
||||
redfishEnrichFromOEMxFusionPCIeLink(pcieDoc, &nic.LinkWidth, &nic.MaxLinkWidth, &nic.LinkSpeed, &nic.MaxLinkSpeed)
|
||||
}
|
||||
if normalizeRedfishIdentityField(nic.SerialNumber) == "" {
|
||||
nic.SerialNumber = findFirstNormalizedStringByKeys(pcieDoc, "SerialNumber")
|
||||
}
|
||||
@@ -4061,6 +3786,9 @@ func enrichNICFromPCIe(nic *models.NetworkAdapter, pcieDoc map[string]interface{
|
||||
if strings.TrimSpace(nic.MaxLinkSpeed) == "" {
|
||||
nic.MaxLinkSpeed = firstNonEmpty(asString(fn["MaxLinkSpeedGTs"]), asString(fn["MaxLinkSpeed"]))
|
||||
}
|
||||
if nic.LinkWidth == 0 || nic.MaxLinkWidth == 0 || nic.LinkSpeed == "" || nic.MaxLinkSpeed == "" {
|
||||
redfishEnrichFromOEMxFusionPCIeLink(fn, &nic.LinkWidth, &nic.MaxLinkWidth, &nic.LinkSpeed, &nic.MaxLinkSpeed)
|
||||
}
|
||||
if normalizeRedfishIdentityField(nic.SerialNumber) == "" {
|
||||
nic.SerialNumber = findFirstNormalizedStringByKeys(fn, "SerialNumber")
|
||||
}
|
||||
@@ -4627,6 +4355,21 @@ func parseGPUWithSupplementalDocs(doc map[string]interface{}, functionDocs []map
|
||||
gpu.DeviceID = asHexOrInt(doc["DeviceId"])
|
||||
}
|
||||
|
||||
if pcieIf, ok := doc["PCIeInterface"].(map[string]interface{}); ok {
|
||||
if gpu.CurrentLinkWidth == 0 {
|
||||
gpu.CurrentLinkWidth = asInt(pcieIf["LanesInUse"])
|
||||
}
|
||||
if gpu.MaxLinkWidth == 0 {
|
||||
gpu.MaxLinkWidth = firstNonZeroInt(asInt(pcieIf["MaxLanes"]), asInt(pcieIf["Maxlanes"]))
|
||||
}
|
||||
if gpu.CurrentLinkSpeed == "" {
|
||||
gpu.CurrentLinkSpeed = firstNonEmpty(asString(pcieIf["PCIeType"]), asString(pcieIf["CurrentLinkSpeedGTs"]), asString(pcieIf["CurrentLinkSpeed"]))
|
||||
}
|
||||
if gpu.MaxLinkSpeed == "" {
|
||||
gpu.MaxLinkSpeed = firstNonEmpty(asString(pcieIf["MaxPCIeType"]), asString(pcieIf["MaxLinkSpeedGTs"]), asString(pcieIf["MaxLinkSpeed"]))
|
||||
}
|
||||
}
|
||||
|
||||
for _, fn := range functionDocs {
|
||||
if gpu.BDF == "" {
|
||||
gpu.BDF = sanitizeRedfishBDF(asString(fn["FunctionId"]))
|
||||
@@ -4649,6 +4392,9 @@ func parseGPUWithSupplementalDocs(doc map[string]interface{}, functionDocs []map
|
||||
if gpu.CurrentLinkSpeed == "" {
|
||||
gpu.CurrentLinkSpeed = firstNonEmpty(asString(fn["CurrentLinkSpeedGTs"]), asString(fn["CurrentLinkSpeed"]))
|
||||
}
|
||||
if gpu.CurrentLinkWidth == 0 || gpu.MaxLinkWidth == 0 || gpu.CurrentLinkSpeed == "" || gpu.MaxLinkSpeed == "" {
|
||||
redfishEnrichFromOEMxFusionPCIeLink(fn, &gpu.CurrentLinkWidth, &gpu.MaxLinkWidth, &gpu.CurrentLinkSpeed, &gpu.MaxLinkSpeed)
|
||||
}
|
||||
}
|
||||
|
||||
if isMissingOrRawPCIModel(gpu.Model) {
|
||||
@@ -4709,6 +4455,9 @@ func parsePCIeDeviceWithSupplementalDocs(doc map[string]interface{}, functionDoc
|
||||
if dev.MaxLinkSpeed == "" {
|
||||
dev.MaxLinkSpeed = firstNonEmpty(asString(fn["MaxLinkSpeedGTs"]), asString(fn["MaxLinkSpeed"]))
|
||||
}
|
||||
if dev.LinkWidth == 0 || dev.MaxLinkWidth == 0 || dev.LinkSpeed == "" || dev.MaxLinkSpeed == "" {
|
||||
redfishEnrichFromOEMxFusionPCIeLink(fn, &dev.LinkWidth, &dev.MaxLinkWidth, &dev.LinkSpeed, &dev.MaxLinkSpeed)
|
||||
}
|
||||
}
|
||||
if dev.DeviceClass == "" || isGenericPCIeClassLabel(dev.DeviceClass) {
|
||||
dev.DeviceClass = firstNonEmpty(redfishFirstStringAcrossDocs(supplementalDocs, "DeviceType"), dev.DeviceClass)
|
||||
@@ -4958,6 +4707,59 @@ func buildBDFfromOemPublic(doc map[string]interface{}) string {
|
||||
return fmt.Sprintf("%04x:%02x:%02x.%x", segment, bus, dev, fn)
|
||||
}
|
||||
|
||||
// redfishEnrichFromOEMxFusionPCIeLink fills in missing PCIe link width/speed
|
||||
// from the xFusion OEM namespace. xFusion reports link width as a string like
|
||||
// "X8" in Oem.xFusion.LinkWidth / Oem.xFusion.LinkWidthAbility, and link speed
|
||||
// as a string like "Gen4 (16.0GT/s)" in Oem.xFusion.LinkSpeed /
|
||||
// Oem.xFusion.LinkSpeedAbility. These fields appear on PCIeFunction docs.
|
||||
func redfishEnrichFromOEMxFusionPCIeLink(doc map[string]interface{}, linkWidth, maxLinkWidth *int, linkSpeed, maxLinkSpeed *string) {
|
||||
oem, _ := doc["Oem"].(map[string]interface{})
|
||||
if oem == nil {
|
||||
return
|
||||
}
|
||||
xf, _ := oem["xFusion"].(map[string]interface{})
|
||||
if xf == nil {
|
||||
return
|
||||
}
|
||||
if *linkWidth == 0 {
|
||||
*linkWidth = parseXFusionLinkWidth(asString(xf["LinkWidth"]))
|
||||
}
|
||||
if *maxLinkWidth == 0 {
|
||||
*maxLinkWidth = parseXFusionLinkWidth(asString(xf["LinkWidthAbility"]))
|
||||
}
|
||||
if strings.TrimSpace(*linkSpeed) == "" {
|
||||
*linkSpeed = strings.TrimSpace(asString(xf["LinkSpeed"]))
|
||||
}
|
||||
if strings.TrimSpace(*maxLinkSpeed) == "" {
|
||||
*maxLinkSpeed = strings.TrimSpace(asString(xf["LinkSpeedAbility"]))
|
||||
}
|
||||
}
|
||||
|
||||
// parseXFusionLinkWidth converts an xFusion link-width string like "X8" or
|
||||
// "x16" to the integer lane count. Returns 0 for unrecognised values.
|
||||
func parseXFusionLinkWidth(s string) int {
|
||||
s = strings.TrimSpace(s)
|
||||
if s == "" {
|
||||
return 0
|
||||
}
|
||||
s = strings.TrimPrefix(strings.ToUpper(s), "X")
|
||||
v := asInt(s)
|
||||
if v <= 0 {
|
||||
return 0
|
||||
}
|
||||
return v
|
||||
}
|
||||
|
||||
// firstNonZeroInt returns the first argument that is non-zero.
|
||||
func firstNonZeroInt(vals ...int) int {
|
||||
for _, v := range vals {
|
||||
if v != 0 {
|
||||
return v
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
func normalizeRedfishIdentityField(v string) string {
|
||||
v = strings.TrimSpace(v)
|
||||
if v == "" {
|
||||
|
||||
@@ -50,11 +50,15 @@ func (c *RedfishConnector) collectRedfishLogEntries(ctx context.Context, client
|
||||
}
|
||||
|
||||
for _, systemPath := range systemPaths {
|
||||
collectFrom(joinPath(systemPath, "/LogServices"), isHardwareLogService)
|
||||
for _, logServicesPath := range c.redfishLinkedCollectionPaths(ctx, client, req, baseURL, systemPath, "LogServices") {
|
||||
collectFrom(logServicesPath, isHardwareLogService)
|
||||
}
|
||||
}
|
||||
// Managers hold the IPMI SEL on AMI/MSI BMCs — include only the "SEL" service.
|
||||
for _, managerPath := range managerPaths {
|
||||
collectFrom(joinPath(managerPath, "/LogServices"), isManagerSELService)
|
||||
for _, logServicesPath := range c.redfishLinkedCollectionPaths(ctx, client, req, baseURL, managerPath, "LogServices") {
|
||||
collectFrom(logServicesPath, isManagerSELService)
|
||||
}
|
||||
}
|
||||
|
||||
if len(out) > 0 {
|
||||
@@ -63,6 +67,42 @@ func (c *RedfishConnector) collectRedfishLogEntries(ctx context.Context, client
|
||||
return out
|
||||
}
|
||||
|
||||
func (c *RedfishConnector) redfishLinkedCollectionPaths(
|
||||
ctx context.Context,
|
||||
client *http.Client,
|
||||
req Request,
|
||||
baseURL, resourcePath, linkKey string,
|
||||
) []string {
|
||||
resourcePath = normalizeRedfishPath(resourcePath)
|
||||
if resourcePath == "" || strings.TrimSpace(linkKey) == "" {
|
||||
return nil
|
||||
}
|
||||
|
||||
seen := make(map[string]struct{}, 2)
|
||||
var out []string
|
||||
add := func(path string) {
|
||||
path = normalizeRedfishPath(path)
|
||||
if path == "" {
|
||||
return
|
||||
}
|
||||
if _, ok := seen[path]; ok {
|
||||
return
|
||||
}
|
||||
seen[path] = struct{}{}
|
||||
out = append(out, path)
|
||||
}
|
||||
|
||||
add(joinPath(resourcePath, "/"+strings.TrimSpace(linkKey)))
|
||||
|
||||
resourceDoc, err := c.getJSON(ctx, client, req, baseURL, resourcePath)
|
||||
if err == nil {
|
||||
if linked := redfishLinkedPath(resourceDoc, linkKey); linked != "" {
|
||||
add(linked)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// fetchRedfishLogEntriesWithPaging fetches entries from a LogEntry collection,
|
||||
// following nextLink pages. Stops early when entries older than cutoff are encountered
|
||||
// (assumes BMC returns entries newest-first, which is typical).
|
||||
@@ -182,7 +222,7 @@ func redfishLogServiceEntriesPath(svc map[string]interface{}) string {
|
||||
// Audit, authentication, and session events are excluded.
|
||||
func isHardwareLogEntry(entry map[string]interface{}) bool {
|
||||
entryType := strings.TrimSpace(asString(entry["EntryType"]))
|
||||
if strings.EqualFold(entryType, "Oem") {
|
||||
if strings.EqualFold(entryType, "Oem") && !strings.EqualFold(strings.TrimSpace(asString(entry["OemRecordFormat"])), "Lenovo") {
|
||||
return false
|
||||
}
|
||||
|
||||
@@ -362,6 +402,9 @@ func parseIPMIDumpKV(message string) map[string]string {
|
||||
// AMI/MSI BMCs often set Severity="OK" on all SEL records regardless of content,
|
||||
// so we fall back to inferring severity from SensorType when the explicit field is unhelpful.
|
||||
func redfishLogEntrySeverity(entry map[string]interface{}) models.Severity {
|
||||
if redfishLogEntryLooksLikeWarning(entry) {
|
||||
return models.SeverityWarning
|
||||
}
|
||||
// Newer Redfish uses MessageSeverity; older uses Severity.
|
||||
raw := strings.ToLower(firstNonEmpty(
|
||||
strings.TrimSpace(asString(entry["MessageSeverity"])),
|
||||
@@ -380,6 +423,16 @@ func redfishLogEntrySeverity(entry map[string]interface{}) models.Severity {
|
||||
}
|
||||
}
|
||||
|
||||
func redfishLogEntryLooksLikeWarning(entry map[string]interface{}) bool {
|
||||
joined := strings.ToLower(strings.TrimSpace(strings.Join([]string{
|
||||
asString(entry["Message"]),
|
||||
asString(entry["Name"]),
|
||||
asString(entry["SensorType"]),
|
||||
asString(entry["EntryCode"]),
|
||||
}, " ")))
|
||||
return strings.Contains(joined, "unqualified dimm")
|
||||
}
|
||||
|
||||
// redfishSeverityFromSensorType infers event severity from the IPMI/Redfish SensorType string.
|
||||
func redfishSeverityFromSensorType(sensorType string) models.Severity {
|
||||
switch strings.ToLower(sensorType) {
|
||||
|
||||
125
internal/collector/redfish_logentries_test.go
Normal file
125
internal/collector/redfish_logentries_test.go
Normal file
@@ -0,0 +1,125 @@
|
||||
package collector
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
)
|
||||
|
||||
func TestCollectRedfishLogEntries_UsesLinkedManagerLogServicesPath(t *testing.T) {
|
||||
mux := http.NewServeMux()
|
||||
register := func(path string, payload interface{}) {
|
||||
mux.HandleFunc(path, func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
_ = json.NewEncoder(w).Encode(payload)
|
||||
})
|
||||
}
|
||||
|
||||
register("/redfish/v1/Managers/1", map[string]interface{}{
|
||||
"Id": "1",
|
||||
"LogServices": map[string]interface{}{
|
||||
"@odata.id": "/redfish/v1/Systems/1/LogServices",
|
||||
},
|
||||
})
|
||||
register("/redfish/v1/Systems/1/LogServices", map[string]interface{}{
|
||||
"Members": []map[string]string{
|
||||
{"@odata.id": "/redfish/v1/Systems/1/LogServices/SEL"},
|
||||
},
|
||||
})
|
||||
register("/redfish/v1/Systems/1/LogServices/SEL", map[string]interface{}{
|
||||
"Id": "SEL",
|
||||
"Entries": map[string]interface{}{
|
||||
"@odata.id": "/redfish/v1/Systems/1/LogServices/SEL/Entries",
|
||||
},
|
||||
})
|
||||
register("/redfish/v1/Systems/1/LogServices/SEL/Entries", map[string]interface{}{
|
||||
"Members": []map[string]string{
|
||||
{"@odata.id": "/redfish/v1/Systems/1/LogServices/SEL/Entries/1"},
|
||||
},
|
||||
})
|
||||
register("/redfish/v1/Systems/1/LogServices/SEL/Entries/1", map[string]interface{}{
|
||||
"Id": "1",
|
||||
"Created": time.Now().UTC().Format(time.RFC3339),
|
||||
"Message": "System found Unqualified DIMM in slot DIMM A1",
|
||||
"MessageSeverity": "OK",
|
||||
"SensorType": "Memory",
|
||||
"EntryType": "Event",
|
||||
})
|
||||
|
||||
ts := httptest.NewServer(mux)
|
||||
defer ts.Close()
|
||||
|
||||
c := NewRedfishConnector()
|
||||
got := c.collectRedfishLogEntries(context.Background(), ts.Client(), Request{
|
||||
Host: ts.URL,
|
||||
Port: 443,
|
||||
Protocol: "redfish",
|
||||
Username: "admin",
|
||||
AuthType: "password",
|
||||
Password: "secret",
|
||||
TLSMode: "strict",
|
||||
}, ts.URL, nil, []string{"/redfish/v1/Managers/1"})
|
||||
|
||||
if len(got) != 1 {
|
||||
t.Fatalf("expected 1 collected log entry, got %d", len(got))
|
||||
}
|
||||
if got[0]["Message"] != "System found Unqualified DIMM in slot DIMM A1" {
|
||||
t.Fatalf("unexpected collected message: %#v", got[0]["Message"])
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseRedfishLogEntries_UnqualifiedDIMMBecomesWarning(t *testing.T) {
|
||||
rawPayloads := map[string]any{
|
||||
"redfish_log_entries": []any{
|
||||
map[string]any{
|
||||
"Id": "sel-1",
|
||||
"Created": "2026-04-13T12:00:00Z",
|
||||
"Message": "System found Unqualified DIMM in slot DIMM A1",
|
||||
"MessageSeverity": "OK",
|
||||
"SensorType": "Memory",
|
||||
"EntryType": "Event",
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
events := parseRedfishLogEntries(rawPayloads, time.Date(2026, 4, 13, 12, 30, 0, 0, time.UTC))
|
||||
if len(events) != 1 {
|
||||
t.Fatalf("expected 1 event, got %d", len(events))
|
||||
}
|
||||
if events[0].Severity != models.SeverityWarning {
|
||||
t.Fatalf("expected warning severity, got %q", events[0].Severity)
|
||||
}
|
||||
if events[0].Description != "System found Unqualified DIMM in slot DIMM A1" {
|
||||
t.Fatalf("unexpected description: %q", events[0].Description)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseRedfishLogEntries_LenovoOEMEntryIsKept(t *testing.T) {
|
||||
rawPayloads := map[string]any{
|
||||
"redfish_log_entries": []any{
|
||||
map[string]any{
|
||||
"Id": "plat-55",
|
||||
"Created": "2026-04-13T12:00:00Z",
|
||||
"Message": "DIMM A1 is unqualified",
|
||||
"MessageSeverity": "Warning",
|
||||
"SensorType": "Memory",
|
||||
"EntryType": "Oem",
|
||||
"OemRecordFormat": "Lenovo",
|
||||
"EntryCode": "Assert",
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
events := parseRedfishLogEntries(rawPayloads, time.Date(2026, 4, 13, 12, 30, 0, 0, time.UTC))
|
||||
if len(events) != 1 {
|
||||
t.Fatalf("expected 1 Lenovo OEM event, got %d", len(events))
|
||||
}
|
||||
if events[0].Severity != models.SeverityWarning {
|
||||
t.Fatalf("expected warning severity, got %q", events[0].Severity)
|
||||
}
|
||||
}
|
||||
57
internal/collector/redfish_planb_test.go
Normal file
57
internal/collector/redfish_planb_test.go
Normal file
@@ -0,0 +1,57 @@
|
||||
package collector
|
||||
|
||||
import "testing"
|
||||
|
||||
func TestShouldIncludeCriticalPlanBPath(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
req Request
|
||||
path string
|
||||
want bool
|
||||
}{
|
||||
{
|
||||
name: "skip hgx erot pcie without extended diagnostics",
|
||||
req: Request{},
|
||||
path: "/redfish/v1/Chassis/HGX_ERoT_NVSwitch_0/PCIeDevices",
|
||||
want: false,
|
||||
},
|
||||
{
|
||||
name: "skip hgx chassis assembly without extended diagnostics",
|
||||
req: Request{},
|
||||
path: "/redfish/v1/Chassis/HGX_Chassis_0/Assembly",
|
||||
want: false,
|
||||
},
|
||||
{
|
||||
name: "keep standard chassis inventory without extended diagnostics",
|
||||
req: Request{},
|
||||
path: "/redfish/v1/Chassis/1/PCIeDevices",
|
||||
want: true,
|
||||
},
|
||||
{
|
||||
name: "keep nvme storage backplane drives without extended diagnostics",
|
||||
req: Request{},
|
||||
path: "/redfish/v1/Chassis/NVMeSSD.0.Group.0.StorageBackplane/Drives",
|
||||
want: true,
|
||||
},
|
||||
{
|
||||
name: "keep system processors without extended diagnostics",
|
||||
req: Request{},
|
||||
path: "/redfish/v1/Systems/HGX_Baseboard_0/Processors",
|
||||
want: true,
|
||||
},
|
||||
{
|
||||
name: "include hgx erot pcie when extended diagnostics enabled",
|
||||
req: Request{DebugPayloads: true},
|
||||
path: "/redfish/v1/Chassis/HGX_ERoT_NVSwitch_0/PCIeDevices",
|
||||
want: true,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := shouldIncludeCriticalPlanBPath(tt.req, tt.path); got != tt.want {
|
||||
t.Fatalf("shouldIncludeCriticalPlanBPath(%q) = %v, want %v", tt.path, got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -265,9 +265,6 @@ func TestRedfishConnectorProbe(t *testing.T) {
|
||||
if got.HostPowerState != "Off" {
|
||||
t.Fatalf("expected power state Off, got %q", got.HostPowerState)
|
||||
}
|
||||
if !got.PowerControlAvailable {
|
||||
t.Fatalf("expected power control available")
|
||||
}
|
||||
}
|
||||
|
||||
func TestRedfishConnectorProbe_FallsBackToPowerSummary(t *testing.T) {
|
||||
@@ -330,225 +327,6 @@ func TestRedfishConnectorProbe_FallsBackToPowerSummary(t *testing.T) {
|
||||
if got.HostPowerState != "On" {
|
||||
t.Fatalf("expected power state On, got %q", got.HostPowerState)
|
||||
}
|
||||
if !got.PowerControlAvailable {
|
||||
t.Fatalf("expected power control available")
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnsureHostPowerForCollection_WaitsForStablePowerOn(t *testing.T) {
|
||||
t.Setenv("LOGPILE_REDFISH_POWERON_STABILIZATION", "1ms")
|
||||
t.Setenv("LOGPILE_REDFISH_BMC_READY_WAITS", "1ms,1ms")
|
||||
|
||||
powerState := "Off"
|
||||
resetCalls := 0
|
||||
|
||||
mux := http.NewServeMux()
|
||||
mux.HandleFunc("/redfish/v1/Systems/1", func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
_ = json.NewEncoder(w).Encode(map[string]interface{}{
|
||||
"@odata.id": "/redfish/v1/Systems/1",
|
||||
"PowerState": powerState,
|
||||
"MemorySummary": map[string]interface{}{
|
||||
"TotalSystemMemoryGiB": 128,
|
||||
},
|
||||
"Actions": map[string]interface{}{
|
||||
"#ComputerSystem.Reset": map[string]interface{}{
|
||||
"target": "/redfish/v1/Systems/1/Actions/ComputerSystem.Reset",
|
||||
"ResetType@Redfish.AllowableValues": []interface{}{"On"},
|
||||
},
|
||||
},
|
||||
})
|
||||
})
|
||||
mux.HandleFunc("/redfish/v1/Systems/1/Actions/ComputerSystem.Reset", func(w http.ResponseWriter, r *http.Request) {
|
||||
resetCalls++
|
||||
powerState = "On"
|
||||
w.WriteHeader(http.StatusOK)
|
||||
})
|
||||
|
||||
ts := httptest.NewTLSServer(mux)
|
||||
defer ts.Close()
|
||||
|
||||
u, err := url.Parse(ts.URL)
|
||||
if err != nil {
|
||||
t.Fatalf("parse server url: %v", err)
|
||||
}
|
||||
port := 443
|
||||
if u.Port() != "" {
|
||||
fmt.Sscanf(u.Port(), "%d", &port)
|
||||
}
|
||||
|
||||
c := NewRedfishConnector()
|
||||
hostOn, changed := c.ensureHostPowerForCollection(context.Background(), c.httpClientWithTimeout(Request{TLSMode: "insecure"}, 5*time.Second), Request{
|
||||
Host: u.Hostname(),
|
||||
Protocol: "redfish",
|
||||
Port: port,
|
||||
Username: "admin",
|
||||
AuthType: "password",
|
||||
Password: "secret",
|
||||
TLSMode: "insecure",
|
||||
PowerOnIfHostOff: true,
|
||||
}, ts.URL, "/redfish/v1/Systems/1", nil)
|
||||
if !hostOn || !changed {
|
||||
t.Fatalf("expected stable power-on result, got hostOn=%v changed=%v", hostOn, changed)
|
||||
}
|
||||
if resetCalls != 1 {
|
||||
t.Fatalf("expected one reset call, got %d", resetCalls)
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnsureHostPowerForCollection_FailsIfHostDoesNotStayOnAfterStabilization(t *testing.T) {
|
||||
t.Setenv("LOGPILE_REDFISH_POWERON_STABILIZATION", "1ms")
|
||||
t.Setenv("LOGPILE_REDFISH_BMC_READY_WAITS", "1ms,1ms")
|
||||
|
||||
powerState := "Off"
|
||||
|
||||
mux := http.NewServeMux()
|
||||
mux.HandleFunc("/redfish/v1/Systems/1", func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
current := powerState
|
||||
if powerState == "On" {
|
||||
powerState = "Off"
|
||||
}
|
||||
_ = json.NewEncoder(w).Encode(map[string]interface{}{
|
||||
"@odata.id": "/redfish/v1/Systems/1",
|
||||
"PowerState": current,
|
||||
"Actions": map[string]interface{}{
|
||||
"#ComputerSystem.Reset": map[string]interface{}{
|
||||
"target": "/redfish/v1/Systems/1/Actions/ComputerSystem.Reset",
|
||||
"ResetType@Redfish.AllowableValues": []interface{}{"On"},
|
||||
},
|
||||
},
|
||||
})
|
||||
})
|
||||
mux.HandleFunc("/redfish/v1/Systems/1/Actions/ComputerSystem.Reset", func(w http.ResponseWriter, r *http.Request) {
|
||||
powerState = "On"
|
||||
w.WriteHeader(http.StatusOK)
|
||||
})
|
||||
|
||||
ts := httptest.NewTLSServer(mux)
|
||||
defer ts.Close()
|
||||
|
||||
u, err := url.Parse(ts.URL)
|
||||
if err != nil {
|
||||
t.Fatalf("parse server url: %v", err)
|
||||
}
|
||||
port := 443
|
||||
if u.Port() != "" {
|
||||
fmt.Sscanf(u.Port(), "%d", &port)
|
||||
}
|
||||
|
||||
c := NewRedfishConnector()
|
||||
hostOn, changed := c.ensureHostPowerForCollection(context.Background(), c.httpClientWithTimeout(Request{TLSMode: "insecure"}, 5*time.Second), Request{
|
||||
Host: u.Hostname(),
|
||||
Protocol: "redfish",
|
||||
Port: port,
|
||||
Username: "admin",
|
||||
AuthType: "password",
|
||||
Password: "secret",
|
||||
TLSMode: "insecure",
|
||||
PowerOnIfHostOff: true,
|
||||
}, ts.URL, "/redfish/v1/Systems/1", nil)
|
||||
if hostOn || changed {
|
||||
t.Fatalf("expected unstable power-on result to fail, got hostOn=%v changed=%v", hostOn, changed)
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnsureHostPowerForCollection_UsesPowerSummaryState(t *testing.T) {
|
||||
t.Setenv("LOGPILE_REDFISH_POWERON_STABILIZATION", "1ms")
|
||||
t.Setenv("LOGPILE_REDFISH_BMC_READY_WAITS", "1ms,1ms")
|
||||
|
||||
powerState := "On"
|
||||
|
||||
mux := http.NewServeMux()
|
||||
mux.HandleFunc("/redfish/v1/Systems/1", func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
_ = json.NewEncoder(w).Encode(map[string]interface{}{
|
||||
"@odata.id": "/redfish/v1/Systems/1",
|
||||
"PowerSummary": map[string]interface{}{
|
||||
"PowerState": powerState,
|
||||
},
|
||||
"MemorySummary": map[string]interface{}{
|
||||
"TotalSystemMemoryGiB": 128,
|
||||
},
|
||||
"Actions": map[string]interface{}{
|
||||
"#ComputerSystem.Reset": map[string]interface{}{
|
||||
"target": "/redfish/v1/Systems/1/Actions/ComputerSystem.Reset",
|
||||
"ResetType@Redfish.AllowableValues": []interface{}{"On"},
|
||||
},
|
||||
},
|
||||
})
|
||||
})
|
||||
|
||||
ts := httptest.NewTLSServer(mux)
|
||||
defer ts.Close()
|
||||
|
||||
u, err := url.Parse(ts.URL)
|
||||
if err != nil {
|
||||
t.Fatalf("parse server url: %v", err)
|
||||
}
|
||||
port := 443
|
||||
if u.Port() != "" {
|
||||
fmt.Sscanf(u.Port(), "%d", &port)
|
||||
}
|
||||
|
||||
c := NewRedfishConnector()
|
||||
hostOn, changed := c.ensureHostPowerForCollection(context.Background(), c.httpClientWithTimeout(Request{TLSMode: "insecure"}, 5*time.Second), Request{
|
||||
Host: u.Hostname(),
|
||||
Protocol: "redfish",
|
||||
Port: port,
|
||||
Username: "admin",
|
||||
AuthType: "password",
|
||||
Password: "secret",
|
||||
TLSMode: "insecure",
|
||||
PowerOnIfHostOff: true,
|
||||
}, ts.URL, "/redfish/v1/Systems/1", nil)
|
||||
if !hostOn || changed {
|
||||
t.Fatalf("expected already-on host from PowerSummary, got hostOn=%v changed=%v", hostOn, changed)
|
||||
}
|
||||
}
|
||||
|
||||
func TestWaitForHostPowerState_UsesPowerSummaryState(t *testing.T) {
|
||||
powerState := "Off"
|
||||
mux := http.NewServeMux()
|
||||
mux.HandleFunc("/redfish/v1/Systems/1", func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
current := powerState
|
||||
if powerState == "Off" {
|
||||
powerState = "On"
|
||||
}
|
||||
_ = json.NewEncoder(w).Encode(map[string]interface{}{
|
||||
"@odata.id": "/redfish/v1/Systems/1",
|
||||
"PowerSummary": map[string]interface{}{
|
||||
"PowerState": current,
|
||||
},
|
||||
})
|
||||
})
|
||||
|
||||
ts := httptest.NewTLSServer(mux)
|
||||
defer ts.Close()
|
||||
|
||||
u, err := url.Parse(ts.URL)
|
||||
if err != nil {
|
||||
t.Fatalf("parse server url: %v", err)
|
||||
}
|
||||
port := 443
|
||||
if u.Port() != "" {
|
||||
fmt.Sscanf(u.Port(), "%d", &port)
|
||||
}
|
||||
|
||||
c := NewRedfishConnector()
|
||||
ok := c.waitForHostPowerState(context.Background(), c.httpClientWithTimeout(Request{TLSMode: "insecure"}, 5*time.Second), Request{
|
||||
Host: u.Hostname(),
|
||||
Protocol: "redfish",
|
||||
Port: port,
|
||||
Username: "admin",
|
||||
AuthType: "password",
|
||||
Password: "secret",
|
||||
TLSMode: "insecure",
|
||||
}, ts.URL, "/redfish/v1/Systems/1", true, 3*time.Second)
|
||||
if !ok {
|
||||
t.Fatalf("expected waitForHostPowerState to use PowerSummary")
|
||||
}
|
||||
}
|
||||
|
||||
func TestParsePCIeDeviceSlot_FromNestedRedfishSlotLocation(t *testing.T) {
|
||||
@@ -1563,6 +1341,48 @@ func TestParseNIC_PrefersControllerSlotLabelAndPCIeInterface(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseNIC_xFusionMaxlanesAndOEMLinkWidth(t *testing.T) {
|
||||
// xFusion uses "Maxlanes" (lowercase 'l') in PCIeInterface, not "MaxLanes".
|
||||
// xFusion also stores per-function link width as Oem.xFusion.LinkWidth = "X8".
|
||||
nic := parseNIC(map[string]interface{}{
|
||||
"Id": "OCPCard1",
|
||||
"Model": "ConnectX-6 Lx",
|
||||
"Controllers": []interface{}{
|
||||
map[string]interface{}{
|
||||
"PCIeInterface": map[string]interface{}{
|
||||
"LanesInUse": 8,
|
||||
"Maxlanes": 8, // xFusion uses lowercase 'l'
|
||||
"PCIeType": "Gen4",
|
||||
"MaxPCIeType": "Gen4",
|
||||
},
|
||||
},
|
||||
},
|
||||
})
|
||||
if nic.LinkWidth != 8 || nic.MaxLinkWidth != 8 {
|
||||
t.Fatalf("expected link widths 8/8 from xFusion Maxlanes, got current=%d max=%d", nic.LinkWidth, nic.MaxLinkWidth)
|
||||
}
|
||||
|
||||
// enrichNICFromPCIe: OEM xFusion LinkWidth on a PCIeFunction doc.
|
||||
nic2 := models.NetworkAdapter{}
|
||||
fnDoc := map[string]interface{}{
|
||||
"Oem": map[string]interface{}{
|
||||
"xFusion": map[string]interface{}{
|
||||
"LinkWidth": "X8",
|
||||
"LinkWidthAbility": "X8",
|
||||
"LinkSpeed": "Gen4 (16.0GT/s)",
|
||||
"LinkSpeedAbility": "Gen4 (16.0GT/s)",
|
||||
},
|
||||
},
|
||||
}
|
||||
enrichNICFromPCIe(&nic2, map[string]interface{}{}, []map[string]interface{}{fnDoc}, nil)
|
||||
if nic2.LinkWidth != 8 || nic2.MaxLinkWidth != 8 {
|
||||
t.Fatalf("expected link width 8 from xFusion OEM LinkWidth, got current=%d max=%d", nic2.LinkWidth, nic2.MaxLinkWidth)
|
||||
}
|
||||
if nic2.LinkSpeed != "Gen4 (16.0GT/s)" || nic2.MaxLinkSpeed != "Gen4 (16.0GT/s)" {
|
||||
t.Fatalf("expected link speed from xFusion OEM LinkSpeed, got current=%q max=%q", nic2.LinkSpeed, nic2.MaxLinkSpeed)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseNIC_DropsUnrealisticPortCount(t *testing.T) {
|
||||
nic := parseNIC(map[string]interface{}{
|
||||
"Id": "1",
|
||||
@@ -2995,6 +2815,28 @@ func TestReplayCollectGPUs_DedupUsesRedfishPathBeforeHeuristics(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseGPU_xFusionPCIeInterfaceMaxlanes(t *testing.T) {
|
||||
// xFusion GPU PCIeDevices (PCIeCard1..N) carry link width in PCIeInterface
|
||||
// with "Maxlanes" (lowercase 'l') rather than "MaxLanes".
|
||||
doc := map[string]interface{}{
|
||||
"Id": "PCIeCard1",
|
||||
"Model": "RTX PRO 6000",
|
||||
"PCIeInterface": map[string]interface{}{
|
||||
"LanesInUse": 16,
|
||||
"Maxlanes": 16,
|
||||
"PCIeType": "Gen5",
|
||||
"MaxPCIeType": "Gen5",
|
||||
},
|
||||
}
|
||||
gpu := parseGPU(doc, nil, 1)
|
||||
if gpu.CurrentLinkWidth != 16 || gpu.MaxLinkWidth != 16 {
|
||||
t.Fatalf("expected link widths 16/16 from PCIeInterface, got current=%d max=%d", gpu.CurrentLinkWidth, gpu.MaxLinkWidth)
|
||||
}
|
||||
if gpu.CurrentLinkSpeed != "Gen5" || gpu.MaxLinkSpeed != "Gen5" {
|
||||
t.Fatalf("expected link speeds Gen5/Gen5 from PCIeInterface, got current=%q max=%q", gpu.CurrentLinkSpeed, gpu.MaxLinkSpeed)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseGPU_UsesNestedOemSerialNumber(t *testing.T) {
|
||||
doc := map[string]interface{}{
|
||||
"Id": "GPU4",
|
||||
|
||||
@@ -326,6 +326,95 @@ func TestBuildAnalysisDirectives_SupermicroEnablesStorageRecovery(t *testing.T)
|
||||
}
|
||||
}
|
||||
|
||||
func TestMatchProfiles_LenovoXCCSelectsMatchedModeAndExcludesSensors(t *testing.T) {
|
||||
match := MatchProfiles(MatchSignals{
|
||||
SystemManufacturer: "Lenovo",
|
||||
ChassisManufacturer: "Lenovo",
|
||||
OEMNamespaces: []string{"Lenovo"},
|
||||
})
|
||||
if match.Mode != ModeMatched {
|
||||
t.Fatalf("expected matched mode, got %q", match.Mode)
|
||||
}
|
||||
found := false
|
||||
for _, profile := range match.Profiles {
|
||||
if profile.Name() == "lenovo" {
|
||||
found = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Fatal("expected lenovo profile to be selected")
|
||||
}
|
||||
|
||||
// Verify the acquisition plan excludes noisy Lenovo-specific snapshot paths.
|
||||
plan := BuildAcquisitionPlan(MatchSignals{
|
||||
SystemManufacturer: "Lenovo",
|
||||
ChassisManufacturer: "Lenovo",
|
||||
OEMNamespaces: []string{"Lenovo"},
|
||||
})
|
||||
wantExcluded := []string{
|
||||
"/Sensors/",
|
||||
"/Oem/Lenovo/LEDs/",
|
||||
"/Oem/Lenovo/Slots/",
|
||||
"/Oem/Lenovo/Configuration",
|
||||
"/NetworkProtocol/Oem/Lenovo/",
|
||||
"/VirtualMedia/",
|
||||
"/ThermalSubsystem/Fans/",
|
||||
}
|
||||
for _, want := range wantExcluded {
|
||||
found := false
|
||||
for _, ex := range plan.Tuning.SnapshotExcludeContains {
|
||||
if ex == want {
|
||||
found = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !found {
|
||||
t.Errorf("expected SnapshotExcludeContains to include %q, got %v", want, plan.Tuning.SnapshotExcludeContains)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestResolveAcquisitionPlan_LenovoFiltersNonInventoryChassisBranches(t *testing.T) {
|
||||
signals := MatchSignals{
|
||||
SystemManufacturer: "Lenovo",
|
||||
ChassisManufacturer: "Lenovo",
|
||||
OEMNamespaces: []string{"Lenovo"},
|
||||
ResourceHints: []string{
|
||||
"/redfish/v1/Chassis/1/Power",
|
||||
"/redfish/v1/Chassis/1/Thermal",
|
||||
"/redfish/v1/Chassis/1/NetworkAdapters",
|
||||
"/redfish/v1/Chassis/3",
|
||||
"/redfish/v1/Chassis/IO_Board",
|
||||
},
|
||||
}
|
||||
match := MatchProfiles(signals)
|
||||
plan := BuildAcquisitionPlan(signals)
|
||||
resolved := ResolveAcquisitionPlan(match, plan, DiscoveredResources{
|
||||
ChassisPaths: []string{
|
||||
"/redfish/v1/Chassis/1",
|
||||
"/redfish/v1/Chassis/3",
|
||||
"/redfish/v1/Chassis/IO_Board",
|
||||
},
|
||||
}, signals)
|
||||
|
||||
if !containsString(resolved.CriticalPaths, "/redfish/v1/Chassis/1/Power") {
|
||||
t.Fatal("expected primary Lenovo chassis power path to remain critical")
|
||||
}
|
||||
if containsString(resolved.CriticalPaths, "/redfish/v1/Chassis/3/Power") {
|
||||
t.Fatal("did not expect non-inventory Lenovo backplane chassis power path")
|
||||
}
|
||||
if containsString(resolved.CriticalPaths, "/redfish/v1/Chassis/IO_Board/Assembly") {
|
||||
t.Fatal("did not expect IO board assembly path without inventory hints")
|
||||
}
|
||||
if containsString(resolved.Plan.PlanBPaths, "/redfish/v1/Chassis/3/Assembly") {
|
||||
t.Fatal("did not expect non-inventory Lenovo chassis plan-b target")
|
||||
}
|
||||
if !containsString(resolved.CriticalPaths, "/redfish/v1/Chassis/3") {
|
||||
t.Fatal("expected chassis root to remain discoverable even when suffixes are filtered")
|
||||
}
|
||||
}
|
||||
|
||||
func TestMatchProfiles_OrderingIsDeterministic(t *testing.T) {
|
||||
signals := MatchSignals{
|
||||
SystemManufacturer: "Micro-Star International Co., Ltd.",
|
||||
|
||||
175
internal/collector/redfishprofile/profile_lenovo.go
Normal file
175
internal/collector/redfishprofile/profile_lenovo.go
Normal file
@@ -0,0 +1,175 @@
|
||||
package redfishprofile
|
||||
|
||||
import "strings"
|
||||
|
||||
func lenovoProfile() Profile {
|
||||
return staticProfile{
|
||||
name: "lenovo",
|
||||
priority: 20,
|
||||
safeForFallback: true,
|
||||
matchFn: func(s MatchSignals) int {
|
||||
score := 0
|
||||
if containsFold(s.SystemManufacturer, "lenovo") ||
|
||||
containsFold(s.ChassisManufacturer, "lenovo") {
|
||||
score += 80
|
||||
}
|
||||
for _, ns := range s.OEMNamespaces {
|
||||
if containsFold(ns, "lenovo") {
|
||||
score += 30
|
||||
break
|
||||
}
|
||||
}
|
||||
// Lenovo XClarity Controller (XCC) is the BMC product line.
|
||||
if containsFold(s.ServiceRootProduct, "xclarity") ||
|
||||
containsFold(s.ServiceRootProduct, "xcc") {
|
||||
score += 30
|
||||
}
|
||||
return min(score, 100)
|
||||
},
|
||||
extendAcquisition: func(plan *AcquisitionPlan, _ MatchSignals) {
|
||||
// Lenovo XCC BMC exposes Chassis/1/Sensors with hundreds of individual
|
||||
// sensor member documents (e.g. Chassis/1/Sensors/101L1). These are
|
||||
// not used by any LOGPile parser — thermal/power data is read from
|
||||
// the aggregate Chassis/*/Thermal and Chassis/*/Power endpoints. On
|
||||
// a real server they largely return errors, wasting many minutes.
|
||||
// Lenovo OEM subtrees under Oem/Lenovo/LEDs and Oem/Lenovo/Slots also
|
||||
// enumerate dozens of individual documents not relevant to inventory.
|
||||
ensureSnapshotExcludeContains(plan,
|
||||
"/Sensors/", // individual sensor docs (Chassis/1/Sensors/NNN)
|
||||
"/Oem/Lenovo/LEDs/", // individual LED status entries (~47 per server)
|
||||
"/Oem/Lenovo/Slots/", // individual slot detail entries (~26 per server)
|
||||
"/Oem/Lenovo/Metrics/", // operational metrics, not inventory
|
||||
"/Oem/Lenovo/History", // historical telemetry
|
||||
"/Oem/Lenovo/Configuration", // BMC config service, not inventory
|
||||
"/Oem/Lenovo/DateTimeService", // BMC time service config
|
||||
"/Oem/Lenovo/GroupService", // XCC fleet/group management state
|
||||
"/Oem/Lenovo/Recipients", // alert recipient config
|
||||
"/Oem/Lenovo/RemoteControl", // remote-media/session management
|
||||
"/Oem/Lenovo/RemoteMap", // remote-media mapping config
|
||||
"/Oem/Lenovo/SecureKeyLifecycleService", // key lifecycle/cert config
|
||||
"/Oem/Lenovo/ServerProfile", // profile export/import config
|
||||
"/Oem/Lenovo/ServiceData", // support/service metadata
|
||||
"/Oem/Lenovo/SsoCertificates", // SSO certificate config
|
||||
"/Oem/Lenovo/SystemGuard", // snapshot/history service
|
||||
"/Oem/Lenovo/Watchdogs", // watchdog config
|
||||
"/Oem/Lenovo/ScheduledPower", // power scheduling config
|
||||
"/Oem/Lenovo/BootSettings/BootOrder", // individual boot order lists
|
||||
"/NetworkProtocol/Oem/Lenovo/", // DNS/LDAP/SMTP/SNMP manager config
|
||||
"/PortForwardingMap/", // network port forwarding config
|
||||
"/VirtualMedia/", // virtual media inventory/config, not hardware
|
||||
"/Boot/Certificates", // secure boot certificate stores, not inventory
|
||||
"/ThermalSubsystem/Fans/", // per-fan member docs; replay uses aggregate Thermal only
|
||||
)
|
||||
// Lenovo XCC BMC is typically slow (p95 latency often 3-5s even under
|
||||
// normal load). Set rate thresholds that don't over-throttle on the
|
||||
// first few requests, and give the ETA estimator a realistic baseline.
|
||||
ensureRatePolicy(plan, AcquisitionRatePolicy{
|
||||
TargetP95LatencyMS: 2000,
|
||||
ThrottleP95LatencyMS: 4000,
|
||||
MinSnapshotWorkers: 2,
|
||||
MinPrefetchWorkers: 1,
|
||||
DisablePrefetchOnErrors: true,
|
||||
})
|
||||
ensureETABaseline(plan, AcquisitionETABaseline{
|
||||
DiscoverySeconds: 15,
|
||||
SnapshotSeconds: 120,
|
||||
PrefetchSeconds: 30,
|
||||
CriticalPlanBSeconds: 40,
|
||||
ProfilePlanBSeconds: 20,
|
||||
})
|
||||
addPlanNote(plan, "lenovo xcc acquisition extensions enabled: noisy sensor/oem paths excluded from snapshot")
|
||||
},
|
||||
refineAcquisition: func(resolved *ResolvedAcquisitionPlan, discovered DiscoveredResources, signals MatchSignals) {
|
||||
allowedChassis := lenovoAllowedInventoryChassis(discovered.ChassisPaths, signals.ResourceHints)
|
||||
resolved.SeedPaths = filterLenovoChassisInventoryPaths(resolved.SeedPaths, allowedChassis)
|
||||
resolved.CriticalPaths = filterLenovoChassisInventoryPaths(resolved.CriticalPaths, allowedChassis)
|
||||
resolved.Plan.SeedPaths = filterLenovoChassisInventoryPaths(resolved.Plan.SeedPaths, allowedChassis)
|
||||
resolved.Plan.CriticalPaths = filterLenovoChassisInventoryPaths(resolved.Plan.CriticalPaths, allowedChassis)
|
||||
resolved.Plan.PlanBPaths = filterLenovoChassisInventoryPaths(resolved.Plan.PlanBPaths, allowedChassis)
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
func lenovoAllowedInventoryChassis(chassisPaths, resourceHints []string) map[string]struct{} {
|
||||
allowed := make(map[string]struct{}, len(chassisPaths))
|
||||
for _, chassisPath := range chassisPaths {
|
||||
normalized := normalizePath(chassisPath)
|
||||
if normalized == "" {
|
||||
continue
|
||||
}
|
||||
if normalized == "/redfish/v1/Chassis/1" {
|
||||
allowed[normalized] = struct{}{}
|
||||
continue
|
||||
}
|
||||
for _, hint := range resourceHints {
|
||||
hint = normalizePath(hint)
|
||||
if !strings.HasPrefix(hint, normalized+"/") {
|
||||
continue
|
||||
}
|
||||
if lenovoHintLooksLikeChassisInventory(hint) {
|
||||
allowed[normalized] = struct{}{}
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
return allowed
|
||||
}
|
||||
|
||||
func lenovoHintLooksLikeChassisInventory(path string) bool {
|
||||
for _, suffix := range []string{
|
||||
"/Power",
|
||||
"/PowerSubsystem",
|
||||
"/PowerSubsystem/PowerSupplies",
|
||||
"/Thermal",
|
||||
"/ThresholdSensors",
|
||||
"/DiscreteSensors",
|
||||
"/SensorsList",
|
||||
"/NetworkAdapters",
|
||||
"/PCIeDevices",
|
||||
"/Drives",
|
||||
"/Assembly",
|
||||
} {
|
||||
if strings.HasSuffix(path, suffix) || strings.Contains(path, suffix+"/") {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func filterLenovoChassisInventoryPaths(paths []string, allowedChassis map[string]struct{}) []string {
|
||||
if len(paths) == 0 {
|
||||
return nil
|
||||
}
|
||||
out := make([]string, 0, len(paths))
|
||||
for _, path := range paths {
|
||||
normalized := normalizePath(path)
|
||||
chassis := lenovoPathChassisRoot(normalized)
|
||||
if chassis == "" {
|
||||
out = append(out, normalized)
|
||||
continue
|
||||
}
|
||||
if normalized == chassis {
|
||||
out = append(out, normalized)
|
||||
continue
|
||||
}
|
||||
if _, ok := allowedChassis[chassis]; ok {
|
||||
out = append(out, normalized)
|
||||
}
|
||||
}
|
||||
return dedupeSorted(out)
|
||||
}
|
||||
|
||||
func lenovoPathChassisRoot(path string) string {
|
||||
const prefix = "/redfish/v1/Chassis/"
|
||||
if !strings.HasPrefix(path, prefix) {
|
||||
return ""
|
||||
}
|
||||
rest := strings.TrimPrefix(path, prefix)
|
||||
if rest == "" {
|
||||
return ""
|
||||
}
|
||||
if idx := strings.IndexByte(rest, '/'); idx >= 0 {
|
||||
return prefix + rest[:idx]
|
||||
}
|
||||
return prefix + rest
|
||||
}
|
||||
@@ -56,6 +56,7 @@ func BuiltinProfiles() []Profile {
|
||||
supermicroProfile(),
|
||||
dellProfile(),
|
||||
hpeProfile(),
|
||||
lenovoProfile(),
|
||||
inspurGroupOEMPlatformsProfile(),
|
||||
hgxProfile(),
|
||||
xfusionProfile(),
|
||||
@@ -226,6 +227,10 @@ func ensurePrefetchPolicy(plan *AcquisitionPlan, policy AcquisitionPrefetchPolic
|
||||
addPlanPaths(&plan.Tuning.PrefetchPolicy.ExcludeContains, policy.ExcludeContains...)
|
||||
}
|
||||
|
||||
func ensureSnapshotExcludeContains(plan *AcquisitionPlan, patterns ...string) {
|
||||
addPlanPaths(&plan.Tuning.SnapshotExcludeContains, patterns...)
|
||||
}
|
||||
|
||||
func min(a, b int) int {
|
||||
if a < b {
|
||||
return a
|
||||
|
||||
@@ -53,16 +53,17 @@ type AcquisitionScopedPathPolicy struct {
|
||||
}
|
||||
|
||||
type AcquisitionTuning struct {
|
||||
SnapshotMaxDocuments int
|
||||
SnapshotWorkers int
|
||||
PrefetchEnabled *bool
|
||||
PrefetchWorkers int
|
||||
NVMePostProbeEnabled *bool
|
||||
RatePolicy AcquisitionRatePolicy
|
||||
ETABaseline AcquisitionETABaseline
|
||||
PostProbePolicy AcquisitionPostProbePolicy
|
||||
RecoveryPolicy AcquisitionRecoveryPolicy
|
||||
PrefetchPolicy AcquisitionPrefetchPolicy
|
||||
SnapshotMaxDocuments int
|
||||
SnapshotWorkers int
|
||||
SnapshotExcludeContains []string
|
||||
PrefetchEnabled *bool
|
||||
PrefetchWorkers int
|
||||
NVMePostProbeEnabled *bool
|
||||
RatePolicy AcquisitionRatePolicy
|
||||
ETABaseline AcquisitionETABaseline
|
||||
PostProbePolicy AcquisitionPostProbePolicy
|
||||
RecoveryPolicy AcquisitionRecoveryPolicy
|
||||
PrefetchPolicy AcquisitionPrefetchPolicy
|
||||
}
|
||||
|
||||
type AcquisitionRatePolicy struct {
|
||||
|
||||
@@ -15,9 +15,8 @@ type Request struct {
|
||||
Password string
|
||||
Token string
|
||||
TLSMode string
|
||||
PowerOnIfHostOff bool
|
||||
StopHostAfterCollect bool
|
||||
DebugPayloads bool
|
||||
DebugPayloads bool
|
||||
SkipHungCh <-chan struct{}
|
||||
}
|
||||
|
||||
type Progress struct {
|
||||
@@ -65,10 +64,9 @@ type PhaseTelemetry struct {
|
||||
type ProbeResult struct {
|
||||
Reachable bool
|
||||
Protocol string
|
||||
HostPowerState string
|
||||
HostPoweredOn bool
|
||||
PowerControlAvailable bool
|
||||
SystemPath string
|
||||
HostPowerState string
|
||||
HostPoweredOn bool
|
||||
SystemPath string
|
||||
}
|
||||
|
||||
type Connector interface {
|
||||
|
||||
@@ -1961,7 +1961,10 @@ func pcieDedupKey(item ReanimatorPCIe) string {
|
||||
slot := strings.ToLower(strings.TrimSpace(item.Slot))
|
||||
serial := strings.ToLower(strings.TrimSpace(item.SerialNumber))
|
||||
bdf := strings.ToLower(strings.TrimSpace(item.BDF))
|
||||
if slot != "" {
|
||||
// Generic slot names (e.g. "PCIe Device" from HGX BMC) are not unique
|
||||
// hardware positions — multiple distinct devices share the same name.
|
||||
// Fall through to serial/BDF so they are not incorrectly collapsed.
|
||||
if slot != "" && !isGenericPCIeSlotName(slot) {
|
||||
return "slot:" + slot
|
||||
}
|
||||
if serial != "" {
|
||||
@@ -1970,9 +1973,22 @@ func pcieDedupKey(item ReanimatorPCIe) string {
|
||||
if bdf != "" {
|
||||
return "bdf:" + bdf
|
||||
}
|
||||
if slot != "" {
|
||||
return "slot:" + slot
|
||||
}
|
||||
return strings.ToLower(strings.TrimSpace(item.DeviceClass)) + "|" + strings.ToLower(strings.TrimSpace(item.Model))
|
||||
}
|
||||
|
||||
// isGenericPCIeSlotName reports whether slot is a generic device-type label
|
||||
// rather than a unique hardware position identifier.
|
||||
func isGenericPCIeSlotName(slot string) bool {
|
||||
switch slot {
|
||||
case "pcie device", "pcie slot", "pcie":
|
||||
return true
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func pcieQualityScore(item ReanimatorPCIe) int {
|
||||
score := 0
|
||||
if strings.TrimSpace(item.SerialNumber) != "" {
|
||||
|
||||
@@ -733,6 +733,42 @@ func TestConvertPCIeDevices_SkipsDisplayControllerDuplicates(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestConvertPCIeDevices_PreservesAllGPUsWithGenericSlot(t *testing.T) {
|
||||
// Supermicro HGX BMC reports all GPU PCIe devices with Name "PCIe Device" —
|
||||
// a generic label that is not a unique hardware position. All 8 GPUs must
|
||||
// be preserved; dedup by generic slot name must not collapse them into one.
|
||||
gpus := make([]models.GPU, 8)
|
||||
serials := []string{
|
||||
"1654925165720", "1654925166160", "1654925165942", "1654925165271",
|
||||
"1654925165719", "1654925165252", "1654925165304", "1654925165587",
|
||||
}
|
||||
for i, sn := range serials {
|
||||
gpus[i] = models.GPU{
|
||||
Slot: "PCIe Device",
|
||||
Model: "B200 180GB HBM3e",
|
||||
Manufacturer: "NVIDIA",
|
||||
SerialNumber: sn,
|
||||
PartNumber: "2901-886-A1",
|
||||
Status: "OK",
|
||||
}
|
||||
}
|
||||
hw := &models.HardwareConfig{GPUs: gpus}
|
||||
result := convertPCIeDevices(hw, "2026-04-13T10:00:00Z")
|
||||
if len(result) != 8 {
|
||||
t.Fatalf("expected 8 GPU entries (one per serial), got %d", len(result))
|
||||
}
|
||||
seen := make(map[string]bool)
|
||||
for _, r := range result {
|
||||
if seen[r.SerialNumber] {
|
||||
t.Fatalf("duplicate serial %q in PCIe result", r.SerialNumber)
|
||||
}
|
||||
seen[r.SerialNumber] = true
|
||||
if r.DeviceClass != "VideoController" {
|
||||
t.Fatalf("expected VideoController device class, got %q", r.DeviceClass)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestConvertPCIeDevices_MapsGPUStatusHistory(t *testing.T) {
|
||||
hw := &models.HardwareConfig{
|
||||
GPUs: []models.GPU{
|
||||
|
||||
873
internal/parser/vendors/lenovo_xcc/parser.go
vendored
Normal file
873
internal/parser/vendors/lenovo_xcc/parser.go
vendored
Normal file
@@ -0,0 +1,873 @@
|
||||
// Package lenovo_xcc provides parser for Lenovo XCC mini-log archives.
|
||||
// Tested with: ThinkSystem SR650 V3 (XCC mini-log zip, exported via XCC UI)
|
||||
//
|
||||
// Archive structure: zip with tmp/ directory containing JSON .log files.
|
||||
//
|
||||
// IMPORTANT: Increment parserVersion when modifying parser logic!
|
||||
package lenovo_xcc
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"regexp"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
"git.mchus.pro/mchus/logpile/internal/parser"
|
||||
)
|
||||
|
||||
const parserVersion = "1.2"
|
||||
|
||||
func init() {
|
||||
parser.Register(&Parser{})
|
||||
}
|
||||
|
||||
// Parser implements VendorParser for Lenovo XCC mini-log archives.
|
||||
type Parser struct{}
|
||||
|
||||
func (p *Parser) Name() string { return "Lenovo XCC Mini-Log Parser" }
|
||||
func (p *Parser) Vendor() string { return "lenovo_xcc" }
|
||||
func (p *Parser) Version() string { return parserVersion }
|
||||
|
||||
// Detect checks if files match the Lenovo XCC mini-log archive format.
|
||||
// Returns confidence score 0-100.
|
||||
func (p *Parser) Detect(files []parser.ExtractedFile) int {
|
||||
confidence := 0
|
||||
for _, f := range files {
|
||||
path := strings.ToLower(f.Path)
|
||||
switch {
|
||||
case strings.HasSuffix(path, "tmp/basic_sys_info.log"):
|
||||
confidence += 60
|
||||
case strings.HasSuffix(path, "tmp/inventory_cpu.log"):
|
||||
confidence += 20
|
||||
case strings.HasSuffix(path, "tmp/xcc_plat_events1.log"):
|
||||
confidence += 20
|
||||
case strings.HasSuffix(path, "tmp/inventory_dimm.log"):
|
||||
confidence += 10
|
||||
case strings.HasSuffix(path, "tmp/inventory_fw.log"):
|
||||
confidence += 10
|
||||
}
|
||||
if confidence >= 100 {
|
||||
return 100
|
||||
}
|
||||
}
|
||||
return confidence
|
||||
}
|
||||
|
||||
// Parse parses the Lenovo XCC mini-log archive and returns an analysis result.
|
||||
func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, error) {
|
||||
result := &models.AnalysisResult{
|
||||
Events: make([]models.Event, 0),
|
||||
FRU: make([]models.FRUInfo, 0),
|
||||
Sensors: make([]models.SensorReading, 0),
|
||||
Hardware: &models.HardwareConfig{
|
||||
Firmware: make([]models.FirmwareInfo, 0),
|
||||
CPUs: make([]models.CPU, 0),
|
||||
Memory: make([]models.MemoryDIMM, 0),
|
||||
Storage: make([]models.Storage, 0),
|
||||
PCIeDevices: make([]models.PCIeDevice, 0),
|
||||
PowerSupply: make([]models.PSU, 0),
|
||||
},
|
||||
}
|
||||
|
||||
if f := findByPath(files, "tmp/basic_sys_info.log"); f != nil {
|
||||
parseBasicSysInfo(f.Content, result)
|
||||
}
|
||||
if f := findByPath(files, "tmp/inventory_fw.log"); f != nil {
|
||||
result.Hardware.Firmware = append(result.Hardware.Firmware, parseFirmware(f.Content)...)
|
||||
}
|
||||
if f := findByPath(files, "tmp/inventory_cpu.log"); f != nil {
|
||||
result.Hardware.CPUs = parseCPUs(f.Content)
|
||||
}
|
||||
if f := findByPath(files, "tmp/inventory_dimm.log"); f != nil {
|
||||
memory, events := parseDIMMs(f.Content)
|
||||
result.Hardware.Memory = memory
|
||||
result.Events = append(result.Events, events...)
|
||||
}
|
||||
if f := findByPath(files, "tmp/inventory_disk.log"); f != nil {
|
||||
result.Hardware.Storage = parseDisks(f.Content)
|
||||
}
|
||||
if f := findByPath(files, "tmp/inventory_card.log"); f != nil {
|
||||
result.Hardware.PCIeDevices = parseCards(f.Content)
|
||||
}
|
||||
if f := findByPath(files, "tmp/inventory_psu.log"); f != nil {
|
||||
result.Hardware.PowerSupply = parsePSUs(f.Content)
|
||||
}
|
||||
if f := findByPath(files, "tmp/inventory_ipmi_fru.log"); f != nil {
|
||||
result.FRU = parseFRU(f.Content)
|
||||
enrichBoardFromFRU(result)
|
||||
}
|
||||
if f := findByPath(files, "tmp/inventory_ipmi_sensor.log"); f != nil {
|
||||
result.Sensors = parseSensors(f.Content)
|
||||
result.Hardware.PowerSupply = enrichPSUsFromSensors(result.Hardware.PowerSupply, result.Sensors)
|
||||
}
|
||||
for _, f := range findEventFiles(files) {
|
||||
result.Events = append(result.Events, parseEvents(f.Content)...)
|
||||
}
|
||||
|
||||
result.Protocol = "ipmi"
|
||||
result.SourceType = models.SourceTypeArchive
|
||||
parser.ApplyManufacturedYearWeekFromFRU(result.FRU, result.Hardware)
|
||||
|
||||
return result, nil
|
||||
}
|
||||
|
||||
// findByPath returns the first file whose lowercased path ends with the given suffix.
|
||||
func findByPath(files []parser.ExtractedFile, suffix string) *parser.ExtractedFile {
|
||||
for i := range files {
|
||||
if strings.HasSuffix(strings.ToLower(files[i].Path), suffix) {
|
||||
return &files[i]
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// findEventFiles returns all xcc_plat_eventsN.log files.
|
||||
func findEventFiles(files []parser.ExtractedFile) []parser.ExtractedFile {
|
||||
var out []parser.ExtractedFile
|
||||
for _, f := range files {
|
||||
path := strings.ToLower(f.Path)
|
||||
if strings.Contains(path, "tmp/xcc_plat_events") && strings.HasSuffix(path, ".log") {
|
||||
out = append(out, f)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// --- JSON structures ---
|
||||
|
||||
type xccBasicSysInfoDoc struct {
|
||||
Items []xccBasicSysInfoItem `json:"items"`
|
||||
}
|
||||
|
||||
type xccBasicSysInfoItem struct {
|
||||
MachineName string `json:"machine_name"`
|
||||
MachineTypeModel string `json:"machine_typemodel"`
|
||||
SerialNumber string `json:"serial_number"`
|
||||
UUID string `json:"uuid"`
|
||||
PowerState string `json:"power_state"`
|
||||
ServerState string `json:"server_state"`
|
||||
CurrentTime string `json:"current_time"`
|
||||
}
|
||||
|
||||
// xccFWEntry covers both basic_sys_info firmware (no type_str) and inventory_fw (has type_str).
|
||||
type xccFWEntry struct {
|
||||
Index int `json:"index"`
|
||||
TypeCode int `json:"type"`
|
||||
TypeStr string `json:"type_str"` // only in inventory_fw.log
|
||||
Version string `json:"version"`
|
||||
Build string `json:"build"`
|
||||
ReleaseDate string `json:"release_date"`
|
||||
}
|
||||
|
||||
type xccFirmwareDoc struct {
|
||||
Items []xccFWEntry `json:"items"`
|
||||
}
|
||||
|
||||
type xccCPUDoc struct {
|
||||
Items []xccCPUItem `json:"items"`
|
||||
}
|
||||
|
||||
type xccCPUItem struct {
|
||||
Processors []xccCPU `json:"processors"`
|
||||
}
|
||||
|
||||
type xccCPU struct {
|
||||
Name int `json:"processors_name"`
|
||||
Model string `json:"processors_cpu_model"`
|
||||
Cores json.RawMessage `json:"processors_cores"` // may be int or string
|
||||
Threads json.RawMessage `json:"processors_threads"` // may be int or string
|
||||
ClockSpeed string `json:"processors_clock_speed"`
|
||||
L1DataCache string `json:"processors_l1datacache"`
|
||||
L2Cache string `json:"processors_l2cache"`
|
||||
L3Cache string `json:"processors_l3cache"`
|
||||
Status string `json:"processors_status"`
|
||||
SerialNumber string `json:"processors_serial_number"`
|
||||
}
|
||||
|
||||
type xccDIMMDoc struct {
|
||||
Items []xccDIMMItem `json:"items"`
|
||||
}
|
||||
|
||||
type xccDIMMItem struct {
|
||||
Memory []xccDIMM `json:"memory"`
|
||||
}
|
||||
|
||||
type xccDIMM struct {
|
||||
Index int `json:"memory_index"`
|
||||
Status string `json:"memory_status"`
|
||||
Name string `json:"memory_name"`
|
||||
Type string `json:"memory_type"`
|
||||
Capacity json.RawMessage `json:"memory_capacity"` // int (GB) or string
|
||||
PartNumber string `json:"memory_part_number"`
|
||||
SerialNumber string `json:"memory_serial_number"`
|
||||
Manufacturer string `json:"memory_manufacturer"`
|
||||
MemSpeed json.RawMessage `json:"memory_mem_speed"` // int or string
|
||||
ConfigSpeed json.RawMessage `json:"memory_config_speed"` // int or string
|
||||
}
|
||||
|
||||
type xccDiskDoc struct {
|
||||
Items []xccDiskItem `json:"items"`
|
||||
}
|
||||
|
||||
type xccDiskItem struct {
|
||||
Disks []xccDisk `json:"disks"`
|
||||
}
|
||||
|
||||
type xccDisk struct {
|
||||
ID int `json:"id"`
|
||||
SlotNo int `json:"slotNo"`
|
||||
Type string `json:"type"`
|
||||
Interface string `json:"interface"`
|
||||
Media string `json:"media"`
|
||||
SerialNo string `json:"serialNo"`
|
||||
PartNo string `json:"partNo"`
|
||||
CapacityStr string `json:"capacityStr"` // e.g. "3.20 TB"
|
||||
Manufacture string `json:"manufacture"`
|
||||
ProductName string `json:"productName"`
|
||||
RemainLife int `json:"remainLife"` // 0-100
|
||||
FWVersion string `json:"fwVersion"`
|
||||
Temperature int `json:"temperature"`
|
||||
HealthStatus int `json:"healthStatus"` // int code: 2=Normal
|
||||
State int `json:"state"`
|
||||
StateStr string `json:"statestr"`
|
||||
}
|
||||
|
||||
type xccCardDoc struct {
|
||||
Items []xccCard `json:"items"`
|
||||
}
|
||||
|
||||
type xccCard struct {
|
||||
Key int `json:"key"`
|
||||
SlotNo int `json:"slotNo"`
|
||||
AdapterName string `json:"adapterName"`
|
||||
ConnectorLabel string `json:"connectorLabel"`
|
||||
OOBSupported int `json:"oobSupported"`
|
||||
Location int `json:"location"`
|
||||
Functions []xccCardFunc `json:"functions"`
|
||||
}
|
||||
|
||||
type xccCardFunc struct {
|
||||
FunType int `json:"funType"`
|
||||
BusNo int `json:"generic_busNo"`
|
||||
DevNo int `json:"generic_devNo"`
|
||||
FunNo int `json:"generic_funNo"`
|
||||
VendorID int `json:"generic_vendorId"` // direct int
|
||||
DeviceID int `json:"generic_devId"` // direct int
|
||||
SlotDesignation string `json:"generic_slotDesignation"`
|
||||
}
|
||||
|
||||
type xccPSUDoc struct {
|
||||
Items []xccPSUItem `json:"items"`
|
||||
}
|
||||
|
||||
type xccPSUItem struct {
|
||||
Power []xccPSU `json:"power"`
|
||||
}
|
||||
|
||||
type xccPSU struct {
|
||||
Name int `json:"name"`
|
||||
Status string `json:"status"`
|
||||
RatedPower int `json:"rated_power"`
|
||||
PartNumber string `json:"part_number"`
|
||||
FRUNumber string `json:"fru_number"`
|
||||
SerialNumber string `json:"serial_number"`
|
||||
ManufID string `json:"manuf_id"`
|
||||
}
|
||||
|
||||
type xccFRUDoc struct {
|
||||
Items []xccFRUItem `json:"items"`
|
||||
}
|
||||
|
||||
type xccFRUItem struct {
|
||||
BuiltinFRU []map[string]string `json:"builtin_fru_device"`
|
||||
}
|
||||
|
||||
type xccSensorDoc struct {
|
||||
Items []xccSensor `json:"items"`
|
||||
}
|
||||
|
||||
type xccSensor struct {
|
||||
Name string `json:"Sensor Name"`
|
||||
Value string `json:"Value"`
|
||||
Status string `json:"status"`
|
||||
Unit string `json:"unit"`
|
||||
}
|
||||
|
||||
type xccEventDoc struct {
|
||||
Items []xccEvent `json:"items"`
|
||||
}
|
||||
|
||||
type xccEvent struct {
|
||||
Severity string `json:"severity"` // "I", "W", "E", "C"
|
||||
Source string `json:"source"`
|
||||
Date string `json:"date"` // "2025-12-22T13:24:02.070"
|
||||
Index int `json:"index"`
|
||||
EventID string `json:"eventid"`
|
||||
CmnID string `json:"cmnid"`
|
||||
Message string `json:"message"`
|
||||
}
|
||||
|
||||
// --- Parsers ---
|
||||
|
||||
func parseBasicSysInfo(content []byte, result *models.AnalysisResult) {
|
||||
var doc xccBasicSysInfoDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil || len(doc.Items) == 0 {
|
||||
return
|
||||
}
|
||||
item := doc.Items[0]
|
||||
|
||||
result.Hardware.BoardInfo = models.BoardInfo{
|
||||
ProductName: cleanXCCValue(item.MachineTypeModel),
|
||||
SerialNumber: cleanXCCValue(item.SerialNumber),
|
||||
UUID: cleanXCCValue(item.UUID),
|
||||
}
|
||||
|
||||
if host := cleanXCCValue(item.MachineName); host != "" {
|
||||
result.TargetHost = host
|
||||
}
|
||||
|
||||
if t, err := parseXCCTime(item.CurrentTime); err == nil {
|
||||
result.CollectedAt = t.UTC()
|
||||
}
|
||||
}
|
||||
|
||||
func parseFirmware(content []byte) []models.FirmwareInfo {
|
||||
var doc xccFirmwareDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil {
|
||||
return nil
|
||||
}
|
||||
var out []models.FirmwareInfo
|
||||
for _, fw := range doc.Items {
|
||||
if fi := xccFWEntryToModel(fw); fi != nil {
|
||||
out = append(out, *fi)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func xccFWEntryToModel(fw xccFWEntry) *models.FirmwareInfo {
|
||||
name := strings.TrimSpace(fw.TypeStr)
|
||||
version := strings.TrimSpace(fw.Version)
|
||||
if name == "" && version == "" {
|
||||
return nil
|
||||
}
|
||||
build := strings.TrimSpace(fw.Build)
|
||||
v := version
|
||||
if build != "" {
|
||||
v = version + " (" + build + ")"
|
||||
}
|
||||
return &models.FirmwareInfo{
|
||||
DeviceName: name,
|
||||
Version: v,
|
||||
BuildTime: strings.TrimSpace(fw.ReleaseDate),
|
||||
}
|
||||
}
|
||||
|
||||
func parseCPUs(content []byte) []models.CPU {
|
||||
var doc xccCPUDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil || len(doc.Items) == 0 {
|
||||
return nil
|
||||
}
|
||||
var out []models.CPU
|
||||
for _, item := range doc.Items {
|
||||
for _, c := range item.Processors {
|
||||
cpu := models.CPU{
|
||||
Socket: c.Name,
|
||||
Model: strings.TrimSpace(c.Model),
|
||||
Cores: rawJSONToInt(c.Cores),
|
||||
Threads: rawJSONToInt(c.Threads),
|
||||
FrequencyMHz: parseMHz(c.ClockSpeed),
|
||||
L1CacheKB: parseKB(c.L1DataCache),
|
||||
L2CacheKB: parseKB(c.L2Cache),
|
||||
L3CacheKB: parseKB(c.L3Cache),
|
||||
Status: strings.TrimSpace(c.Status),
|
||||
SerialNumber: strings.TrimSpace(c.SerialNumber),
|
||||
}
|
||||
out = append(out, cpu)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func parseDIMMs(content []byte) ([]models.MemoryDIMM, []models.Event) {
|
||||
var doc xccDIMMDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil || len(doc.Items) == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
var out []models.MemoryDIMM
|
||||
var events []models.Event
|
||||
for _, item := range doc.Items {
|
||||
for _, m := range item.Memory {
|
||||
status := strings.TrimSpace(m.Status)
|
||||
present := !strings.EqualFold(status, "not present") &&
|
||||
!strings.EqualFold(status, "absent")
|
||||
// memory_capacity is in GB (int); convert to MB
|
||||
capacityGB := rawJSONToInt(m.Capacity)
|
||||
dimm := models.MemoryDIMM{
|
||||
Slot: strings.TrimSpace(m.Name),
|
||||
Location: strings.TrimSpace(m.Name),
|
||||
Present: present,
|
||||
SizeMB: capacityGB * 1024,
|
||||
Type: strings.TrimSpace(m.Type),
|
||||
MaxSpeedMHz: rawJSONToInt(m.MemSpeed),
|
||||
CurrentSpeedMHz: rawJSONToInt(m.ConfigSpeed),
|
||||
Manufacturer: strings.TrimSpace(m.Manufacturer),
|
||||
SerialNumber: strings.TrimSpace(m.SerialNumber),
|
||||
PartNumber: strings.TrimSpace(strings.TrimRight(m.PartNumber, " ")),
|
||||
Status: status,
|
||||
}
|
||||
out = append(out, dimm)
|
||||
if isUnqualifiedDIMM(status) {
|
||||
events = append(events, models.Event{
|
||||
Source: "Memory",
|
||||
SensorType: "Memory",
|
||||
SensorName: dimm.Slot,
|
||||
EventType: "DIMM Qualification",
|
||||
Severity: models.SeverityWarning,
|
||||
Description: status,
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
return out, events
|
||||
}
|
||||
|
||||
func parseDisks(content []byte) []models.Storage {
|
||||
var doc xccDiskDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil || len(doc.Items) == 0 {
|
||||
return nil
|
||||
}
|
||||
var out []models.Storage
|
||||
for _, item := range doc.Items {
|
||||
for _, d := range item.Disks {
|
||||
sizeGB := parseCapacityToGB(d.CapacityStr)
|
||||
stateStr := strings.TrimSpace(d.StateStr)
|
||||
present := !strings.EqualFold(stateStr, "absent") &&
|
||||
!strings.EqualFold(stateStr, "not present")
|
||||
status := mapDiskHealthStatus(d.HealthStatus, stateStr)
|
||||
disk := models.Storage{
|
||||
Slot: fmt.Sprintf("%d", d.SlotNo),
|
||||
Type: strings.TrimSpace(d.Media),
|
||||
Model: cleanXCCValue(d.ProductName),
|
||||
SizeGB: sizeGB,
|
||||
SerialNumber: cleanXCCValue(d.SerialNo),
|
||||
Manufacturer: cleanXCCValue(d.Manufacture),
|
||||
Firmware: cleanXCCValue(d.FWVersion),
|
||||
Interface: strings.TrimSpace(d.Interface),
|
||||
Present: present,
|
||||
Status: status,
|
||||
}
|
||||
if d.Temperature > 0 {
|
||||
disk.Details = map[string]any{"temperature_c": d.Temperature}
|
||||
}
|
||||
if d.RemainLife >= 0 && d.RemainLife <= 100 {
|
||||
v := d.RemainLife
|
||||
disk.RemainingEndurancePct = &v
|
||||
}
|
||||
out = append(out, disk)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func parseCards(content []byte) []models.PCIeDevice {
|
||||
var doc xccCardDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil {
|
||||
return nil
|
||||
}
|
||||
var out []models.PCIeDevice
|
||||
for _, card := range doc.Items {
|
||||
slot := strings.TrimSpace(card.ConnectorLabel)
|
||||
if slot == "" {
|
||||
slot = fmt.Sprintf("%d", card.SlotNo)
|
||||
}
|
||||
dev := models.PCIeDevice{
|
||||
Slot: slot,
|
||||
Description: strings.TrimSpace(card.AdapterName),
|
||||
}
|
||||
if len(card.Functions) > 0 {
|
||||
fn := card.Functions[0]
|
||||
dev.BDF = fmt.Sprintf("%02x:%02x.%x", fn.BusNo, fn.DevNo, fn.FunNo)
|
||||
dev.VendorID = fn.VendorID
|
||||
dev.DeviceID = fn.DeviceID
|
||||
}
|
||||
out = append(out, dev)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func parsePSUs(content []byte) []models.PSU {
|
||||
var doc xccPSUDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil || len(doc.Items) == 0 {
|
||||
return nil
|
||||
}
|
||||
var out []models.PSU
|
||||
for _, item := range doc.Items {
|
||||
for _, p := range item.Power {
|
||||
model := cleanXCCValue(p.FRUNumber)
|
||||
if model == "" {
|
||||
model = cleanXCCValue(p.PartNumber)
|
||||
}
|
||||
psu := models.PSU{
|
||||
Slot: fmt.Sprintf("%d", p.Name),
|
||||
Present: true,
|
||||
Model: model,
|
||||
WattageW: p.RatedPower,
|
||||
SerialNumber: cleanXCCValue(p.SerialNumber),
|
||||
PartNumber: cleanXCCValue(p.PartNumber),
|
||||
Vendor: cleanXCCValue(p.ManufID),
|
||||
Status: strings.TrimSpace(p.Status),
|
||||
}
|
||||
out = append(out, psu)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func parseFRU(content []byte) []models.FRUInfo {
|
||||
var doc xccFRUDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil || len(doc.Items) == 0 {
|
||||
return nil
|
||||
}
|
||||
var out []models.FRUInfo
|
||||
for _, item := range doc.Items {
|
||||
for _, entry := range item.BuiltinFRU {
|
||||
fru := models.FRUInfo{
|
||||
Description: entry["FRU Device Description"],
|
||||
Manufacturer: entry["Board Mfg"],
|
||||
ProductName: entry["Board Product"],
|
||||
SerialNumber: entry["Board Serial"],
|
||||
PartNumber: entry["Board Part Number"],
|
||||
MfgDate: entry["Board Mfg Date"],
|
||||
}
|
||||
if fru.ProductName == "" {
|
||||
fru.ProductName = entry["Product Name"]
|
||||
}
|
||||
if fru.SerialNumber == "" {
|
||||
fru.SerialNumber = entry["Product Serial"]
|
||||
}
|
||||
if fru.PartNumber == "" {
|
||||
fru.PartNumber = entry["Product Part Number"]
|
||||
}
|
||||
if fru.Description == "" && fru.ProductName == "" && fru.SerialNumber == "" {
|
||||
continue
|
||||
}
|
||||
out = append(out, fru)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func parseSensors(content []byte) []models.SensorReading {
|
||||
var doc xccSensorDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil {
|
||||
return nil
|
||||
}
|
||||
var out []models.SensorReading
|
||||
for _, s := range doc.Items {
|
||||
name := strings.TrimSpace(s.Name)
|
||||
if name == "" {
|
||||
continue
|
||||
}
|
||||
unit := strings.TrimSpace(s.Unit)
|
||||
sr := models.SensorReading{
|
||||
Name: name,
|
||||
RawValue: strings.TrimSpace(s.Value),
|
||||
Unit: unit,
|
||||
Status: strings.TrimSpace(s.Status),
|
||||
Type: classifySensorType(name, unit),
|
||||
}
|
||||
if v, err := strconv.ParseFloat(sr.RawValue, 64); err == nil {
|
||||
sr.Value = v
|
||||
}
|
||||
out = append(out, sr)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func parseEvents(content []byte) []models.Event {
|
||||
var doc xccEventDoc
|
||||
if err := json.Unmarshal(content, &doc); err != nil {
|
||||
return nil
|
||||
}
|
||||
var out []models.Event
|
||||
for _, e := range doc.Items {
|
||||
ev := models.Event{
|
||||
ID: e.EventID,
|
||||
Source: strings.TrimSpace(e.Source),
|
||||
Description: strings.TrimSpace(e.Message),
|
||||
Severity: xccSeverity(e.Severity, e.Message),
|
||||
}
|
||||
if t, err := parseXCCTime(e.Date); err == nil {
|
||||
ev.Timestamp = t.UTC()
|
||||
}
|
||||
out = append(out, ev)
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// --- Cross-reference enrichment ---
|
||||
|
||||
// enrichBoardFromFRU sets BoardInfo.Manufacturer from the system board FRU entry
|
||||
// when it is not already populated. Mirrors bee's board parsing from dmidecode type 1.
|
||||
func enrichBoardFromFRU(result *models.AnalysisResult) {
|
||||
if result.Hardware.BoardInfo.Manufacturer != "" {
|
||||
return
|
||||
}
|
||||
for _, fru := range result.FRU {
|
||||
desc := strings.ToLower(fru.Description)
|
||||
if !strings.Contains(desc, "system board") &&
|
||||
!strings.Contains(desc, "planar") &&
|
||||
!strings.Contains(desc, "backplane") {
|
||||
continue
|
||||
}
|
||||
if mfg := cleanXCCValue(fru.Manufacturer); mfg != "" {
|
||||
result.Hardware.BoardInfo.Manufacturer = mfg
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// psuSensorSlot extracts a 1-based PSU slot number from a sensor name.
|
||||
// Recognises patterns: "PSU1 ...", "PSU 2 ...", "Power Supply 1 ...", "PWS1 ..."
|
||||
var psuSensorSlotPattern = regexp.MustCompile(`(?i)(?:PSU|Power\s+Supply|PWS)\s*(\d+)`)
|
||||
|
||||
// enrichPSUsFromSensors cross-references sensor readings into PSU InputPowerW /
|
||||
// OutputPowerW / InputVoltage. Mirrors bee's enrichPSUsWithTelemetry approach.
|
||||
func enrichPSUsFromSensors(psus []models.PSU, sensors []models.SensorReading) []models.PSU {
|
||||
if len(psus) == 0 || len(sensors) == 0 {
|
||||
return psus
|
||||
}
|
||||
for i := range psus {
|
||||
slot, err := strconv.Atoi(psus[i].Slot)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
for _, s := range sensors {
|
||||
m := psuSensorSlotPattern.FindStringSubmatch(s.Name)
|
||||
if len(m) < 2 {
|
||||
continue
|
||||
}
|
||||
sensorSlot, err := strconv.Atoi(m[1])
|
||||
if err != nil || sensorSlot != slot {
|
||||
continue
|
||||
}
|
||||
nameLower := strings.ToLower(s.Name)
|
||||
switch {
|
||||
case isPSUInputPower(nameLower):
|
||||
psus[i].InputPowerW = int(s.Value)
|
||||
case isPSUOutputPower(nameLower):
|
||||
psus[i].OutputPowerW = int(s.Value)
|
||||
case isPSUInputVoltage(nameLower):
|
||||
psus[i].InputVoltage = s.Value
|
||||
}
|
||||
}
|
||||
}
|
||||
return psus
|
||||
}
|
||||
|
||||
func isPSUInputPower(name string) bool {
|
||||
return strings.Contains(name, "input power") ||
|
||||
strings.Contains(name, "input watts") ||
|
||||
strings.Contains(name, "_pin") ||
|
||||
strings.Contains(name, " pin")
|
||||
}
|
||||
|
||||
func isPSUOutputPower(name string) bool {
|
||||
return strings.Contains(name, "output power") ||
|
||||
strings.Contains(name, "output watts") ||
|
||||
strings.Contains(name, "_pout") ||
|
||||
strings.Contains(name, " pout")
|
||||
}
|
||||
|
||||
func isPSUInputVoltage(name string) bool {
|
||||
return strings.Contains(name, "input voltage") ||
|
||||
strings.Contains(name, "ac voltage") ||
|
||||
strings.Contains(name, "_vin") ||
|
||||
strings.Contains(name, " vin")
|
||||
}
|
||||
|
||||
// mapDiskHealthStatus maps an XCC disk healthStatus integer to a canonical status
|
||||
// string. Mirrors bee's mapRAIDDriveStatus logic.
|
||||
// XCC codes: 1=Warning, 2=Normal, 3=Critical, 4=PredictiveFailure; 0=Unknown.
|
||||
func mapDiskHealthStatus(code int, stateStr string) string {
|
||||
switch code {
|
||||
case 2:
|
||||
return "OK"
|
||||
case 1, 4:
|
||||
return "Warning"
|
||||
case 3:
|
||||
return "Critical"
|
||||
default:
|
||||
if stateStr != "" {
|
||||
return stateStr
|
||||
}
|
||||
return "Unknown"
|
||||
}
|
||||
}
|
||||
|
||||
// classifySensorType returns a sensor category based on bee's classification logic:
|
||||
// fan / temperature / power / voltage / current / other.
|
||||
func classifySensorType(name, unit string) string {
|
||||
u := strings.ToLower(strings.TrimSpace(unit))
|
||||
switch u {
|
||||
case "rpm":
|
||||
return "fan"
|
||||
case "c", "celsius", "°c":
|
||||
return "temperature"
|
||||
case "w", "watts":
|
||||
return "power"
|
||||
case "v", "volts":
|
||||
return "voltage"
|
||||
case "a", "amps":
|
||||
return "current"
|
||||
}
|
||||
n := strings.ToLower(name)
|
||||
switch {
|
||||
case strings.Contains(n, "fan"):
|
||||
return "fan"
|
||||
case strings.Contains(n, "temp"):
|
||||
return "temperature"
|
||||
case strings.Contains(n, "power") || strings.Contains(n, " pwr"):
|
||||
return "power"
|
||||
case strings.Contains(n, "volt") || strings.Contains(n, " vin") || strings.Contains(n, " vout"):
|
||||
return "voltage"
|
||||
case strings.Contains(n, "curr") || strings.Contains(n, " amp"):
|
||||
return "current"
|
||||
default:
|
||||
return "other"
|
||||
}
|
||||
}
|
||||
|
||||
// cleanXCCValue strips XCC placeholder strings, returning "" for non-values.
|
||||
// Mirrors bee's cleanDMIValue for IPMI/XCC context.
|
||||
func cleanXCCValue(v string) string {
|
||||
v = strings.TrimSpace(v)
|
||||
switch strings.ToLower(v) {
|
||||
case "", "n/a", "na", "none", "unknown", "not available",
|
||||
"not applicable", "not present", "not specified", "-":
|
||||
return ""
|
||||
}
|
||||
return v
|
||||
}
|
||||
|
||||
// --- Helpers ---
|
||||
|
||||
func xccSeverity(s, message string) models.Severity {
|
||||
if isUnqualifiedDIMM(message) {
|
||||
return models.SeverityWarning
|
||||
}
|
||||
switch strings.ToUpper(strings.TrimSpace(s)) {
|
||||
case "C":
|
||||
return models.SeverityCritical
|
||||
case "E":
|
||||
return models.SeverityCritical
|
||||
case "W":
|
||||
return models.SeverityWarning
|
||||
default:
|
||||
return models.SeverityInfo
|
||||
}
|
||||
}
|
||||
|
||||
func isUnqualifiedDIMM(value string) bool {
|
||||
return strings.Contains(strings.ToLower(strings.TrimSpace(value)), "unqualified dimm")
|
||||
}
|
||||
|
||||
func parseXCCTime(s string) (time.Time, error) {
|
||||
s = strings.TrimSpace(s)
|
||||
formats := []string{
|
||||
"2006-01-02T15:04:05.000",
|
||||
"2006-01-02T15:04:05",
|
||||
"2006-01-02 15:04:05",
|
||||
}
|
||||
for _, f := range formats {
|
||||
if t, err := time.Parse(f, s); err == nil {
|
||||
return t, nil
|
||||
}
|
||||
}
|
||||
return time.Time{}, fmt.Errorf("unparseable time: %q", s)
|
||||
}
|
||||
|
||||
// parseMHz parses "4100 MHz" → 4100
|
||||
func parseMHz(s string) int {
|
||||
s = strings.TrimSpace(s)
|
||||
parts := strings.Fields(s)
|
||||
if len(parts) == 0 {
|
||||
return 0
|
||||
}
|
||||
v, _ := strconv.Atoi(parts[0])
|
||||
return v
|
||||
}
|
||||
|
||||
// parseKB parses "384 KB" → 384
|
||||
func parseKB(s string) int {
|
||||
s = strings.TrimSpace(s)
|
||||
parts := strings.Fields(s)
|
||||
if len(parts) == 0 {
|
||||
return 0
|
||||
}
|
||||
v, _ := strconv.Atoi(parts[0])
|
||||
return v
|
||||
}
|
||||
|
||||
// parseMB parses "32768 MB" → 32768
|
||||
func parseMB(s string) int {
|
||||
return parseKB(s)
|
||||
}
|
||||
|
||||
// parseMTs parses "4800 MT/s" → 4800 (treated as MHz equivalent)
|
||||
func parseMTs(s string) int {
|
||||
return parseKB(s)
|
||||
}
|
||||
|
||||
// parseCapacityToGB parses "3.20 TB" or "480 GB" → GB integer
|
||||
func parseCapacityToGB(s string) int {
|
||||
s = strings.TrimSpace(s)
|
||||
parts := strings.Fields(s)
|
||||
if len(parts) < 2 {
|
||||
return 0
|
||||
}
|
||||
v, err := strconv.ParseFloat(parts[0], 64)
|
||||
if err != nil {
|
||||
return 0
|
||||
}
|
||||
switch strings.ToUpper(parts[1]) {
|
||||
case "TB":
|
||||
return int(v * 1000)
|
||||
case "GB":
|
||||
return int(v)
|
||||
case "MB":
|
||||
return int(v / 1024)
|
||||
}
|
||||
return int(v)
|
||||
}
|
||||
|
||||
// rawJSONToInt parses a json.RawMessage that may be an int or a quoted string → int
|
||||
func rawJSONToInt(raw json.RawMessage) int {
|
||||
if len(raw) == 0 {
|
||||
return 0
|
||||
}
|
||||
// try direct int
|
||||
var n int
|
||||
if err := json.Unmarshal(raw, &n); err == nil {
|
||||
return n
|
||||
}
|
||||
// try string
|
||||
var s string
|
||||
if err := json.Unmarshal(raw, &s); err == nil {
|
||||
v, _ := strconv.Atoi(strings.TrimSpace(s))
|
||||
return v
|
||||
}
|
||||
return 0
|
||||
}
|
||||
|
||||
// parseHexID parses "0x15b3" → 5555
|
||||
func parseHexID(s string) int {
|
||||
s = strings.TrimSpace(strings.ToLower(s))
|
||||
s = strings.TrimPrefix(s, "0x")
|
||||
v, _ := strconv.ParseInt(s, 16, 32)
|
||||
return int(v)
|
||||
}
|
||||
398
internal/parser/vendors/lenovo_xcc/parser_test.go
vendored
Normal file
398
internal/parser/vendors/lenovo_xcc/parser_test.go
vendored
Normal file
@@ -0,0 +1,398 @@
|
||||
package lenovo_xcc
|
||||
|
||||
import (
|
||||
"testing"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
"git.mchus.pro/mchus/logpile/internal/parser"
|
||||
)
|
||||
|
||||
const exampleArchive = "/Users/mchusavitin/Documents/git/logpile/example/7D76CTO1WW_JF0002KT_xcc_mini-log_20260413-122150.zip"
|
||||
|
||||
func TestDetect_LenovoXCCMiniLog(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
score := p.Detect(files)
|
||||
if score < 80 {
|
||||
t.Errorf("expected Detect score >= 80 for XCC mini-log archive, got %d", score)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_BasicSysInfo(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, err := p.Parse(files)
|
||||
if err != nil {
|
||||
t.Fatalf("Parse returned error: %v", err)
|
||||
}
|
||||
if result == nil || result.Hardware == nil {
|
||||
t.Fatal("Parse returned nil result or hardware")
|
||||
}
|
||||
|
||||
hw := result.Hardware
|
||||
if hw.BoardInfo.SerialNumber == "" {
|
||||
t.Error("BoardInfo.SerialNumber is empty")
|
||||
}
|
||||
if hw.BoardInfo.ProductName == "" {
|
||||
t.Error("BoardInfo.ProductName is empty")
|
||||
}
|
||||
t.Logf("BoardInfo: serial=%s model=%s uuid=%s", hw.BoardInfo.SerialNumber, hw.BoardInfo.ProductName, hw.BoardInfo.UUID)
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_CPUs(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil || result.Hardware == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
if len(result.Hardware.CPUs) == 0 {
|
||||
t.Error("expected at least one CPU, got none")
|
||||
}
|
||||
for i, cpu := range result.Hardware.CPUs {
|
||||
t.Logf("CPU[%d]: socket=%d model=%q cores=%d threads=%d freq=%dMHz", i, cpu.Socket, cpu.Model, cpu.Cores, cpu.Threads, cpu.FrequencyMHz)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_Memory(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil || result.Hardware == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
if len(result.Hardware.Memory) == 0 {
|
||||
t.Error("expected memory DIMMs, got none")
|
||||
}
|
||||
t.Logf("Memory: %d DIMMs", len(result.Hardware.Memory))
|
||||
for i, m := range result.Hardware.Memory {
|
||||
t.Logf("DIMM[%d]: slot=%s present=%v size=%dMB sn=%s", i, m.Slot, m.Present, m.SizeMB, m.SerialNumber)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_Storage(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil || result.Hardware == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
t.Logf("Storage: %d disks", len(result.Hardware.Storage))
|
||||
for i, s := range result.Hardware.Storage {
|
||||
t.Logf("Disk[%d]: slot=%s model=%q size=%dGB sn=%s", i, s.Slot, s.Model, s.SizeGB, s.SerialNumber)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_PCIeCards(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil || result.Hardware == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
t.Logf("PCIe cards: %d", len(result.Hardware.PCIeDevices))
|
||||
for i, c := range result.Hardware.PCIeDevices {
|
||||
t.Logf("Card[%d]: slot=%s desc=%q bdf=%s", i, c.Slot, c.Description, c.BDF)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_PSUs(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil || result.Hardware == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
if len(result.Hardware.PowerSupply) == 0 {
|
||||
t.Error("expected PSUs, got none")
|
||||
}
|
||||
for i, p := range result.Hardware.PowerSupply {
|
||||
t.Logf("PSU[%d]: slot=%s wattage=%dW status=%s sn=%s", i, p.Slot, p.WattageW, p.Status, p.SerialNumber)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_Sensors(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
if len(result.Sensors) == 0 {
|
||||
t.Error("expected sensors, got none")
|
||||
}
|
||||
t.Logf("Sensors: %d", len(result.Sensors))
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_Events(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
if len(result.Events) == 0 {
|
||||
t.Error("expected events, got none")
|
||||
}
|
||||
t.Logf("Events: %d", len(result.Events))
|
||||
for i, e := range result.Events {
|
||||
if i >= 5 {
|
||||
break
|
||||
}
|
||||
t.Logf("Event[%d]: severity=%s ts=%s desc=%q", i, e.Severity, e.Timestamp.Format("2006-01-02T15:04:05"), e.Description)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_FRU(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
t.Logf("FRU: %d entries", len(result.FRU))
|
||||
for i, f := range result.FRU {
|
||||
t.Logf("FRU[%d]: desc=%q product=%q serial=%q", i, f.Description, f.ProductName, f.SerialNumber)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse_LenovoXCCMiniLog_Firmware(t *testing.T) {
|
||||
files, err := parser.ExtractArchive(exampleArchive)
|
||||
if err != nil {
|
||||
t.Skipf("example archive not available: %v", err)
|
||||
}
|
||||
|
||||
p := &Parser{}
|
||||
result, _ := p.Parse(files)
|
||||
if result == nil || result.Hardware == nil {
|
||||
t.Fatal("Parse returned nil")
|
||||
}
|
||||
|
||||
if len(result.Hardware.Firmware) == 0 {
|
||||
t.Error("expected firmware entries, got none")
|
||||
}
|
||||
for i, f := range result.Hardware.Firmware {
|
||||
t.Logf("FW[%d]: name=%q version=%q buildtime=%q", i, f.DeviceName, f.Version, f.BuildTime)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseDIMMs_UnqualifiedDIMMAddsWarningEvent(t *testing.T) {
|
||||
content := []byte(`{
|
||||
"items": [{
|
||||
"memory": [{
|
||||
"memory_name": "DIMM A1",
|
||||
"memory_status": "Unqualified DIMM",
|
||||
"memory_type": "DDR5",
|
||||
"memory_capacity": 32
|
||||
}]
|
||||
}]
|
||||
}`)
|
||||
|
||||
memory, events := parseDIMMs(content)
|
||||
if len(memory) != 1 {
|
||||
t.Fatalf("expected 1 DIMM, got %d", len(memory))
|
||||
}
|
||||
if len(events) != 1 {
|
||||
t.Fatalf("expected 1 warning event, got %d", len(events))
|
||||
}
|
||||
if events[0].Severity != models.SeverityWarning {
|
||||
t.Fatalf("expected warning severity, got %q", events[0].Severity)
|
||||
}
|
||||
if events[0].SensorName != "DIMM A1" {
|
||||
t.Fatalf("unexpected sensor name: %q", events[0].SensorName)
|
||||
}
|
||||
}
|
||||
|
||||
func TestSeverity_UnqualifiedDIMMMessageBecomesWarning(t *testing.T) {
|
||||
if got := xccSeverity("I", "System found Unqualified DIMM in slot DIMM A1"); got != models.SeverityWarning {
|
||||
t.Fatalf("expected warning severity, got %q", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestParseBasicSysInfo_CleansPlaceholderValuesAndSetsTargetHost(t *testing.T) {
|
||||
result := &models.AnalysisResult{Hardware: &models.HardwareConfig{}}
|
||||
content := []byte(`{
|
||||
"items": [{
|
||||
"machine_name": " sr650v3-node01 ",
|
||||
"machine_typemodel": " 7D76CTO1WW ",
|
||||
"serial_number": " Not Specified ",
|
||||
"uuid": "N/A"
|
||||
}]
|
||||
}`)
|
||||
|
||||
parseBasicSysInfo(content, result)
|
||||
|
||||
if result.TargetHost != "sr650v3-node01" {
|
||||
t.Fatalf("unexpected target host: %q", result.TargetHost)
|
||||
}
|
||||
if result.Hardware.BoardInfo.ProductName != "7D76CTO1WW" {
|
||||
t.Fatalf("unexpected product name: %q", result.Hardware.BoardInfo.ProductName)
|
||||
}
|
||||
if result.Hardware.BoardInfo.SerialNumber != "" {
|
||||
t.Fatalf("expected serial number to be cleaned, got %q", result.Hardware.BoardInfo.SerialNumber)
|
||||
}
|
||||
if result.Hardware.BoardInfo.UUID != "" {
|
||||
t.Fatalf("expected UUID to be cleaned, got %q", result.Hardware.BoardInfo.UUID)
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnrichBoardFromFRU_SystemBoardManufacturerOnly(t *testing.T) {
|
||||
result := &models.AnalysisResult{
|
||||
Hardware: &models.HardwareConfig{},
|
||||
FRU: []models.FRUInfo{
|
||||
{Description: "Power Supply 1", Manufacturer: "Ignore Me"},
|
||||
{Description: "System Board", Manufacturer: " Lenovo "},
|
||||
},
|
||||
}
|
||||
|
||||
enrichBoardFromFRU(result)
|
||||
|
||||
if result.Hardware.BoardInfo.Manufacturer != "Lenovo" {
|
||||
t.Fatalf("unexpected manufacturer: %q", result.Hardware.BoardInfo.Manufacturer)
|
||||
}
|
||||
}
|
||||
|
||||
func TestEnrichPSUsFromSensors_AssignsTelemetryBySlot(t *testing.T) {
|
||||
psus := []models.PSU{
|
||||
{Slot: "1"},
|
||||
{Slot: "2"},
|
||||
}
|
||||
sensors := []models.SensorReading{
|
||||
{Name: "PSU1 Input Power", Value: 430},
|
||||
{Name: "Power Supply 1 Output Power", Value: 390},
|
||||
{Name: "PWS1 AC Voltage", Value: 230.5},
|
||||
{Name: "PSU2 Input Power", Value: 0},
|
||||
{Name: "PSU3 Input Power", Value: 999},
|
||||
{Name: "Fan 1", Value: 12000},
|
||||
}
|
||||
|
||||
got := enrichPSUsFromSensors(psus, sensors)
|
||||
|
||||
if got[0].InputPowerW != 430 {
|
||||
t.Fatalf("unexpected PSU1 input power: %d", got[0].InputPowerW)
|
||||
}
|
||||
if got[0].OutputPowerW != 390 {
|
||||
t.Fatalf("unexpected PSU1 output power: %d", got[0].OutputPowerW)
|
||||
}
|
||||
if got[0].InputVoltage != 230.5 {
|
||||
t.Fatalf("unexpected PSU1 input voltage: %v", got[0].InputVoltage)
|
||||
}
|
||||
if got[1].InputPowerW != 0 || got[1].OutputPowerW != 0 || got[1].InputVoltage != 0 {
|
||||
t.Fatalf("unexpected telemetry assigned to PSU2: %+v", got[1])
|
||||
}
|
||||
}
|
||||
|
||||
func TestMapDiskHealthStatus(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
code int
|
||||
stateStr string
|
||||
want string
|
||||
}{
|
||||
{name: "normal", code: 2, stateStr: "Online", want: "OK"},
|
||||
{name: "warning", code: 1, stateStr: "Online", want: "Warning"},
|
||||
{name: "predictive failure", code: 4, stateStr: "Online", want: "Warning"},
|
||||
{name: "critical", code: 3, stateStr: "Failed", want: "Critical"},
|
||||
{name: "fallback state", code: 0, stateStr: "Rebuilding", want: "Rebuilding"},
|
||||
{name: "unknown", code: 0, stateStr: "", want: "Unknown"},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := mapDiskHealthStatus(tt.code, tt.stateStr); got != tt.want {
|
||||
t.Fatalf("got %q, want %q", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestClassifySensorType(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
in string
|
||||
unit string
|
||||
want string
|
||||
}{
|
||||
{name: "unit rpm", in: "Fan 1", unit: "RPM", want: "fan"},
|
||||
{name: "unit celsius", in: "CPU Temp", unit: "C", want: "temperature"},
|
||||
{name: "unit watts", in: "PSU1 Input Power", unit: "W", want: "power"},
|
||||
{name: "unit volts", in: "PWS1 AC Voltage", unit: "V", want: "voltage"},
|
||||
{name: "unit amps", in: "PSU1 Current", unit: "A", want: "current"},
|
||||
{name: "name fallback", in: "GPU Temp", unit: "", want: "temperature"},
|
||||
{name: "other", in: "Presence", unit: "", want: "other"},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
if got := classifySensorType(tt.in, tt.unit); got != tt.want {
|
||||
t.Fatalf("got %q, want %q", got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestCleanXCCValue(t *testing.T) {
|
||||
tests := []struct {
|
||||
in string
|
||||
want string
|
||||
}{
|
||||
{in: " Lenovo ", want: "Lenovo"},
|
||||
{in: "N/A", want: ""},
|
||||
{in: " not specified ", want: ""},
|
||||
{in: "-", want: ""},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
if got := cleanXCCValue(tt.in); got != tt.want {
|
||||
t.Fatalf("cleanXCCValue(%q) = %q, want %q", tt.in, got, tt.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
1
internal/parser/vendors/vendors.go
vendored
1
internal/parser/vendors/vendors.go
vendored
@@ -14,6 +14,7 @@ import (
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/unraid"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/xfusion"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/xigmanas"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/lenovo_xcc"
|
||||
|
||||
// Generic fallback parser (must be last for lowest priority)
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/generic"
|
||||
|
||||
@@ -24,6 +24,7 @@ func newCollectTestServer() (*Server, *httptest.Server) {
|
||||
mux.HandleFunc("POST /api/collect", s.handleCollectStart)
|
||||
mux.HandleFunc("GET /api/collect/{id}", s.handleCollectStatus)
|
||||
mux.HandleFunc("POST /api/collect/{id}/cancel", s.handleCollectCancel)
|
||||
mux.HandleFunc("POST /api/collect/{id}/skip", s.handleCollectSkip)
|
||||
return s, httptest.NewServer(mux)
|
||||
}
|
||||
|
||||
@@ -65,9 +66,6 @@ func TestCollectProbe(t *testing.T) {
|
||||
if payload.HostPowerState != "Off" {
|
||||
t.Fatalf("expected host power state Off, got %q", payload.HostPowerState)
|
||||
}
|
||||
if !payload.PowerControlAvailable {
|
||||
t.Fatalf("expected power control to be available")
|
||||
}
|
||||
}
|
||||
|
||||
func TestCollectLifecycleToTerminal(t *testing.T) {
|
||||
|
||||
@@ -26,12 +26,11 @@ func (c *mockConnector) Probe(ctx context.Context, req collector.Request) (*coll
|
||||
hostPoweredOn = false
|
||||
}
|
||||
return &collector.ProbeResult{
|
||||
Reachable: true,
|
||||
Protocol: c.protocol,
|
||||
HostPowerState: map[bool]string{true: "On", false: "Off"}[hostPoweredOn],
|
||||
HostPoweredOn: hostPoweredOn,
|
||||
PowerControlAvailable: true,
|
||||
SystemPath: "/redfish/v1/Systems/1",
|
||||
Reachable: true,
|
||||
Protocol: c.protocol,
|
||||
HostPowerState: map[bool]string{true: "On", false: "Off"}[hostPoweredOn],
|
||||
HostPoweredOn: hostPoweredOn,
|
||||
SystemPath: "/redfish/v1/Systems/1",
|
||||
}, nil
|
||||
}
|
||||
|
||||
|
||||
@@ -19,18 +19,15 @@ type CollectRequest struct {
|
||||
Password string `json:"password,omitempty"`
|
||||
Token string `json:"token,omitempty"`
|
||||
TLSMode string `json:"tls_mode"`
|
||||
PowerOnIfHostOff bool `json:"power_on_if_host_off,omitempty"`
|
||||
StopHostAfterCollect bool `json:"stop_host_after_collect,omitempty"`
|
||||
DebugPayloads bool `json:"debug_payloads,omitempty"`
|
||||
DebugPayloads bool `json:"debug_payloads,omitempty"`
|
||||
}
|
||||
|
||||
type CollectProbeResponse struct {
|
||||
Reachable bool `json:"reachable"`
|
||||
Protocol string `json:"protocol,omitempty"`
|
||||
HostPowerState string `json:"host_power_state,omitempty"`
|
||||
HostPoweredOn bool `json:"host_powered_on"`
|
||||
PowerControlAvailable bool `json:"power_control_available"`
|
||||
Message string `json:"message,omitempty"`
|
||||
HostPowerState string `json:"host_power_state,omitempty"`
|
||||
HostPoweredOn bool `json:"host_powered_on"`
|
||||
Message string `json:"message,omitempty"`
|
||||
}
|
||||
|
||||
type CollectJobResponse struct {
|
||||
@@ -78,7 +75,8 @@ type Job struct {
|
||||
CreatedAt time.Time
|
||||
UpdatedAt time.Time
|
||||
RequestMeta CollectRequestMeta
|
||||
cancel func()
|
||||
cancel func()
|
||||
skipFn func()
|
||||
}
|
||||
|
||||
type CollectModuleStatus struct {
|
||||
|
||||
@@ -18,6 +18,7 @@ import (
|
||||
"sort"
|
||||
"strconv"
|
||||
"strings"
|
||||
"sync"
|
||||
"sync/atomic"
|
||||
"time"
|
||||
|
||||
@@ -1674,34 +1675,28 @@ func (s *Server) handleCollectProbe(w http.ResponseWriter, r *http.Request) {
|
||||
|
||||
message := "Связь с BMC установлена"
|
||||
if result != nil {
|
||||
switch {
|
||||
case !result.HostPoweredOn && result.PowerControlAvailable:
|
||||
message = "Связь с BMC установлена, host выключен. Можно включить перед сбором."
|
||||
case !result.HostPoweredOn:
|
||||
message = "Связь с BMC установлена, host выключен."
|
||||
default:
|
||||
message = "Связь с BMC установлена, host включен."
|
||||
if result.HostPoweredOn {
|
||||
message = "Связь с BMC установлена, host включён."
|
||||
} else {
|
||||
message = "Связь с BMC установлена, host выключен. Данные инвентаря могут быть неполными."
|
||||
}
|
||||
}
|
||||
|
||||
hostPowerState := ""
|
||||
hostPoweredOn := false
|
||||
powerControlAvailable := false
|
||||
reachable := false
|
||||
if result != nil {
|
||||
reachable = result.Reachable
|
||||
hostPowerState = strings.TrimSpace(result.HostPowerState)
|
||||
hostPoweredOn = result.HostPoweredOn
|
||||
powerControlAvailable = result.PowerControlAvailable
|
||||
}
|
||||
|
||||
jsonResponse(w, CollectProbeResponse{
|
||||
Reachable: reachable,
|
||||
Protocol: req.Protocol,
|
||||
HostPowerState: hostPowerState,
|
||||
HostPoweredOn: hostPoweredOn,
|
||||
PowerControlAvailable: powerControlAvailable,
|
||||
Message: message,
|
||||
Reachable: reachable,
|
||||
Protocol: req.Protocol,
|
||||
HostPowerState: hostPowerState,
|
||||
HostPoweredOn: hostPoweredOn,
|
||||
Message: message,
|
||||
})
|
||||
}
|
||||
|
||||
@@ -1737,6 +1732,22 @@ func (s *Server) handleCollectCancel(w http.ResponseWriter, r *http.Request) {
|
||||
jsonResponse(w, job.toStatusResponse())
|
||||
}
|
||||
|
||||
func (s *Server) handleCollectSkip(w http.ResponseWriter, r *http.Request) {
|
||||
jobID := strings.TrimSpace(r.PathValue("id"))
|
||||
if !isValidCollectJobID(jobID) {
|
||||
jsonError(w, "Invalid collect job id", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
|
||||
job, ok := s.jobManager.SkipJob(jobID)
|
||||
if !ok {
|
||||
jsonError(w, "Collect job not found", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
|
||||
jsonResponse(w, job.toStatusResponse())
|
||||
}
|
||||
|
||||
func (s *Server) startCollectionJob(jobID string, req CollectRequest) {
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
if attached := s.jobManager.AttachJobCancel(jobID, cancel); !attached {
|
||||
@@ -1744,6 +1755,11 @@ func (s *Server) startCollectionJob(jobID string, req CollectRequest) {
|
||||
return
|
||||
}
|
||||
|
||||
skipCh := make(chan struct{})
|
||||
var skipOnce sync.Once
|
||||
skipFn := func() { skipOnce.Do(func() { close(skipCh) }) }
|
||||
s.jobManager.AttachJobSkip(jobID, skipFn)
|
||||
|
||||
go func() {
|
||||
connector, ok := s.getCollector(req.Protocol)
|
||||
if !ok {
|
||||
@@ -1811,7 +1827,9 @@ func (s *Server) startCollectionJob(jobID string, req CollectRequest) {
|
||||
}
|
||||
}
|
||||
|
||||
result, err := connector.Collect(ctx, toCollectorRequest(req), emitProgress)
|
||||
collectorReq := toCollectorRequest(req)
|
||||
collectorReq.SkipHungCh = skipCh
|
||||
result, err := connector.Collect(ctx, collectorReq, emitProgress)
|
||||
if err != nil {
|
||||
if ctx.Err() != nil {
|
||||
return
|
||||
@@ -2035,9 +2053,7 @@ func toCollectorRequest(req CollectRequest) collector.Request {
|
||||
Password: req.Password,
|
||||
Token: req.Token,
|
||||
TLSMode: req.TLSMode,
|
||||
PowerOnIfHostOff: req.PowerOnIfHostOff,
|
||||
StopHostAfterCollect: req.StopHostAfterCollect,
|
||||
DebugPayloads: req.DebugPayloads,
|
||||
DebugPayloads: req.DebugPayloads,
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -175,6 +175,43 @@ func (m *JobManager) UpdateJobDebugInfo(id string, info *CollectDebugInfo) (*Job
|
||||
return cloned, true
|
||||
}
|
||||
|
||||
func (m *JobManager) AttachJobSkip(id string, skipFn func()) bool {
|
||||
m.mu.Lock()
|
||||
defer m.mu.Unlock()
|
||||
|
||||
job, ok := m.jobs[id]
|
||||
if !ok || job == nil || isTerminalCollectStatus(job.Status) {
|
||||
return false
|
||||
}
|
||||
job.skipFn = skipFn
|
||||
return true
|
||||
}
|
||||
|
||||
func (m *JobManager) SkipJob(id string) (*Job, bool) {
|
||||
m.mu.Lock()
|
||||
job, ok := m.jobs[id]
|
||||
if !ok || job == nil {
|
||||
m.mu.Unlock()
|
||||
return nil, false
|
||||
}
|
||||
if isTerminalCollectStatus(job.Status) {
|
||||
cloned := cloneJob(job)
|
||||
m.mu.Unlock()
|
||||
return cloned, true
|
||||
}
|
||||
skipFn := job.skipFn
|
||||
job.skipFn = nil
|
||||
job.UpdatedAt = time.Now().UTC()
|
||||
job.Logs = append(job.Logs, formatCollectLogLine(job.UpdatedAt, "Пропуск зависших запросов по команде пользователя"))
|
||||
cloned := cloneJob(job)
|
||||
m.mu.Unlock()
|
||||
|
||||
if skipFn != nil {
|
||||
skipFn()
|
||||
}
|
||||
return cloned, true
|
||||
}
|
||||
|
||||
func (m *JobManager) AttachJobCancel(id string, cancelFn context.CancelFunc) bool {
|
||||
m.mu.Lock()
|
||||
defer m.mu.Unlock()
|
||||
@@ -229,5 +266,6 @@ func cloneJob(job *Job) *Job {
|
||||
cloned.CurrentPhase = job.CurrentPhase
|
||||
cloned.ETASeconds = job.ETASeconds
|
||||
cloned.cancel = nil
|
||||
cloned.skipFn = nil
|
||||
return &cloned
|
||||
}
|
||||
|
||||
@@ -99,6 +99,7 @@ func (s *Server) setupRoutes() {
|
||||
s.mux.HandleFunc("POST /api/collect/probe", s.handleCollectProbe)
|
||||
s.mux.HandleFunc("GET /api/collect/{id}", s.handleCollectStatus)
|
||||
s.mux.HandleFunc("POST /api/collect/{id}/cancel", s.handleCollectCancel)
|
||||
s.mux.HandleFunc("POST /api/collect/{id}/skip", s.handleCollectSkip)
|
||||
}
|
||||
|
||||
func (s *Server) Run() error {
|
||||
|
||||
@@ -24,6 +24,7 @@ func newFlowTestServer() (*Server, *httptest.Server) {
|
||||
mux.HandleFunc("POST /api/collect", s.handleCollectStart)
|
||||
mux.HandleFunc("GET /api/collect/{id}", s.handleCollectStatus)
|
||||
mux.HandleFunc("POST /api/collect/{id}/cancel", s.handleCollectCancel)
|
||||
mux.HandleFunc("POST /api/collect/{id}/skip", s.handleCollectSkip)
|
||||
return s, httptest.NewServer(mux)
|
||||
}
|
||||
|
||||
|
||||
@@ -128,6 +128,7 @@ echo ""
|
||||
# Show next steps
|
||||
echo -e "${YELLOW}Next steps:${NC}"
|
||||
echo " 1. Create git tag:"
|
||||
echo " # LOGPile release tags use vN.M, for example: v1.12"
|
||||
echo " git tag -a ${VERSION} -m \"Release ${VERSION}\""
|
||||
echo ""
|
||||
echo " 2. Push tag to remote:"
|
||||
|
||||
@@ -211,8 +211,6 @@ main {
|
||||
}
|
||||
|
||||
#api-connect-btn,
|
||||
#api-power-on-collect-btn,
|
||||
#api-collect-off-btn,
|
||||
#convert-folder-btn,
|
||||
#convert-run-btn,
|
||||
#cancel-job-btn,
|
||||
@@ -229,8 +227,6 @@ main {
|
||||
}
|
||||
|
||||
#api-connect-btn:hover,
|
||||
#api-power-on-collect-btn:hover,
|
||||
#api-collect-off-btn:hover,
|
||||
#convert-folder-btn:hover,
|
||||
#convert-run-btn:hover,
|
||||
#cancel-job-btn:hover,
|
||||
@@ -241,8 +237,6 @@ main {
|
||||
#convert-run-btn:disabled,
|
||||
#convert-folder-btn:disabled,
|
||||
#api-connect-btn:disabled,
|
||||
#api-power-on-collect-btn:disabled,
|
||||
#api-collect-off-btn:disabled,
|
||||
#cancel-job-btn:disabled,
|
||||
.upload-area button:disabled {
|
||||
opacity: 0.6;
|
||||
@@ -311,64 +305,19 @@ main {
|
||||
border-top: 1px solid #e2e8f0;
|
||||
}
|
||||
|
||||
.api-confirm-modal-backdrop {
|
||||
position: fixed;
|
||||
inset: 0;
|
||||
background: rgba(0, 0, 0, 0.45);
|
||||
.api-host-off-warning {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
z-index: 1000;
|
||||
}
|
||||
|
||||
.api-confirm-modal {
|
||||
background: #fff;
|
||||
border-radius: 10px;
|
||||
padding: 1.5rem 1.75rem;
|
||||
max-width: 380px;
|
||||
width: 90%;
|
||||
box-shadow: 0 8px 32px rgba(0,0,0,0.18);
|
||||
}
|
||||
|
||||
.api-confirm-modal p {
|
||||
margin-bottom: 1.1rem;
|
||||
font-size: 0.95rem;
|
||||
color: #333;
|
||||
line-height: 1.5;
|
||||
}
|
||||
|
||||
.api-confirm-modal-actions {
|
||||
display: flex;
|
||||
gap: 0.6rem;
|
||||
justify-content: flex-end;
|
||||
}
|
||||
|
||||
.api-confirm-modal-actions button {
|
||||
border: none;
|
||||
gap: 0.4rem;
|
||||
padding: 0.5rem 0.75rem;
|
||||
background: #fef3c7;
|
||||
border: 1px solid #f59e0b;
|
||||
border-radius: 6px;
|
||||
padding: 0.5rem 1rem;
|
||||
font-size: 0.9rem;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
font-size: 0.875rem;
|
||||
color: #92400e;
|
||||
font-weight: 500;
|
||||
}
|
||||
|
||||
.api-confirm-modal-actions .btn-cancel {
|
||||
background: #e2e8f0;
|
||||
color: #333;
|
||||
}
|
||||
|
||||
.api-confirm-modal-actions .btn-cancel:hover {
|
||||
background: #cbd5e1;
|
||||
}
|
||||
|
||||
.api-confirm-modal-actions .btn-confirm {
|
||||
background: #dc3545;
|
||||
color: #fff;
|
||||
}
|
||||
|
||||
.api-confirm-modal-actions .btn-confirm:hover {
|
||||
background: #b02a37;
|
||||
}
|
||||
|
||||
.api-connect-status {
|
||||
margin-top: 0.75rem;
|
||||
@@ -445,6 +394,33 @@ main {
|
||||
cursor: default;
|
||||
}
|
||||
|
||||
.job-status-actions {
|
||||
display: flex;
|
||||
gap: 0.5rem;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
#skip-hung-btn {
|
||||
background: #f59e0b;
|
||||
color: #fff;
|
||||
border: none;
|
||||
border-radius: 6px;
|
||||
padding: 0.5rem 0.9rem;
|
||||
font-size: 0.875rem;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
transition: background-color 0.2s ease, opacity 0.2s ease;
|
||||
}
|
||||
|
||||
#skip-hung-btn:hover {
|
||||
background: #d97706;
|
||||
}
|
||||
|
||||
#skip-hung-btn:disabled {
|
||||
opacity: 0.6;
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
||||
.job-status-meta {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(230px, 1fr));
|
||||
|
||||
@@ -91,9 +91,9 @@ function initApiSource() {
|
||||
}
|
||||
|
||||
const cancelJobButton = document.getElementById('cancel-job-btn');
|
||||
const skipHungButton = document.getElementById('skip-hung-btn');
|
||||
const connectButton = document.getElementById('api-connect-btn');
|
||||
const collectButton = document.getElementById('api-collect-btn');
|
||||
const powerOffCheckbox = document.getElementById('api-power-off');
|
||||
const fieldNames = ['host', 'port', 'username', 'password'];
|
||||
|
||||
apiForm.addEventListener('submit', (event) => {
|
||||
@@ -110,6 +110,11 @@ function initApiSource() {
|
||||
cancelCollectionJob();
|
||||
});
|
||||
}
|
||||
if (skipHungButton) {
|
||||
skipHungButton.addEventListener('click', () => {
|
||||
skipHungCollectionJob();
|
||||
});
|
||||
}
|
||||
if (connectButton) {
|
||||
connectButton.addEventListener('click', () => {
|
||||
startApiProbe();
|
||||
@@ -120,22 +125,6 @@ function initApiSource() {
|
||||
startCollectionWithOptions();
|
||||
});
|
||||
}
|
||||
if (powerOffCheckbox) {
|
||||
powerOffCheckbox.addEventListener('change', () => {
|
||||
if (!powerOffCheckbox.checked) {
|
||||
return;
|
||||
}
|
||||
// If host was already on when probed, warn before enabling shutdown
|
||||
if (apiProbeResult && apiProbeResult.host_powered_on) {
|
||||
showConfirmModal(
|
||||
'Хост был включён до начала сбора. Вы уверены, что хотите выключить его после завершения сбора?',
|
||||
() => { /* confirmed — leave checked */ },
|
||||
() => { powerOffCheckbox.checked = false; }
|
||||
);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
fieldNames.forEach((fieldName) => {
|
||||
const field = apiForm.elements.namedItem(fieldName);
|
||||
if (!field) {
|
||||
@@ -163,36 +152,6 @@ function initApiSource() {
|
||||
renderCollectionJob();
|
||||
}
|
||||
|
||||
function showConfirmModal(message, onConfirm, onCancel) {
|
||||
const backdrop = document.createElement('div');
|
||||
backdrop.className = 'api-confirm-modal-backdrop';
|
||||
backdrop.innerHTML = `
|
||||
<div class="api-confirm-modal" role="dialog" aria-modal="true">
|
||||
<p>${escapeHtml(message)}</p>
|
||||
<div class="api-confirm-modal-actions">
|
||||
<button class="btn-cancel">Отмена</button>
|
||||
<button class="btn-confirm">Да, выключить</button>
|
||||
</div>
|
||||
</div>
|
||||
`;
|
||||
document.body.appendChild(backdrop);
|
||||
|
||||
const close = () => document.body.removeChild(backdrop);
|
||||
backdrop.querySelector('.btn-cancel').addEventListener('click', () => {
|
||||
close();
|
||||
if (onCancel) onCancel();
|
||||
});
|
||||
backdrop.querySelector('.btn-confirm').addEventListener('click', () => {
|
||||
close();
|
||||
if (onConfirm) onConfirm();
|
||||
});
|
||||
backdrop.addEventListener('click', (e) => {
|
||||
if (e.target === backdrop) {
|
||||
close();
|
||||
if (onCancel) onCancel();
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
function startApiProbe() {
|
||||
const { isValid, payload, errors } = validateCollectForm();
|
||||
@@ -255,11 +214,7 @@ function startCollectionWithOptions() {
|
||||
return;
|
||||
}
|
||||
|
||||
const powerOnCheckbox = document.getElementById('api-power-on');
|
||||
const powerOffCheckbox = document.getElementById('api-power-off');
|
||||
const debugPayloads = document.getElementById('api-debug-payloads');
|
||||
payload.power_on_if_host_off = powerOnCheckbox ? powerOnCheckbox.checked : false;
|
||||
payload.stop_host_after_collect = powerOffCheckbox ? powerOffCheckbox.checked : false;
|
||||
payload.debug_payloads = debugPayloads ? debugPayloads.checked : false;
|
||||
startCollectionJob(payload);
|
||||
}
|
||||
@@ -268,8 +223,6 @@ function renderApiProbeState() {
|
||||
const connectButton = document.getElementById('api-connect-btn');
|
||||
const probeOptions = document.getElementById('api-probe-options');
|
||||
const status = document.getElementById('api-connect-status');
|
||||
const powerOnCheckbox = document.getElementById('api-power-on');
|
||||
const powerOffCheckbox = document.getElementById('api-power-off');
|
||||
if (!connectButton || !probeOptions || !status) {
|
||||
return;
|
||||
}
|
||||
@@ -283,7 +236,6 @@ function renderApiProbeState() {
|
||||
}
|
||||
|
||||
const hostOn = apiProbeResult.host_powered_on;
|
||||
const powerControlAvailable = apiProbeResult.power_control_available;
|
||||
|
||||
if (hostOn) {
|
||||
status.textContent = apiProbeResult.message || 'Связь с BMC есть, host включён.';
|
||||
@@ -295,25 +247,15 @@ function renderApiProbeState() {
|
||||
|
||||
probeOptions.classList.remove('hidden');
|
||||
|
||||
// "Включить" checkbox
|
||||
if (powerOnCheckbox) {
|
||||
const hostOffWarning = document.getElementById('api-host-off-warning');
|
||||
if (hostOffWarning) {
|
||||
if (hostOn) {
|
||||
// Host already on — checkbox is checked and disabled
|
||||
powerOnCheckbox.checked = true;
|
||||
powerOnCheckbox.disabled = true;
|
||||
hostOffWarning.classList.add('hidden');
|
||||
} else {
|
||||
// Host off — default: checked (will power on), enabled
|
||||
powerOnCheckbox.checked = true;
|
||||
powerOnCheckbox.disabled = !powerControlAvailable;
|
||||
hostOffWarning.classList.remove('hidden');
|
||||
}
|
||||
}
|
||||
|
||||
// "Выключить" checkbox — default: unchecked
|
||||
if (powerOffCheckbox) {
|
||||
powerOffCheckbox.checked = false;
|
||||
powerOffCheckbox.disabled = !powerControlAvailable;
|
||||
}
|
||||
|
||||
connectButton.textContent = 'Переподключиться';
|
||||
}
|
||||
|
||||
@@ -535,6 +477,36 @@ function pollCollectionJobStatus() {
|
||||
});
|
||||
}
|
||||
|
||||
function skipHungCollectionJob() {
|
||||
if (!collectionJob || isCollectionJobTerminal(collectionJob.status)) {
|
||||
return;
|
||||
}
|
||||
const btn = document.getElementById('skip-hung-btn');
|
||||
if (btn) {
|
||||
btn.disabled = true;
|
||||
btn.textContent = 'Пропуск...';
|
||||
}
|
||||
fetch(`/api/collect/${encodeURIComponent(collectionJob.id)}/skip`, {
|
||||
method: 'POST'
|
||||
})
|
||||
.then(async (response) => {
|
||||
const body = await response.json().catch(() => ({}));
|
||||
if (!response.ok) {
|
||||
throw new Error(body.error || 'Не удалось пропустить зависшие запросы');
|
||||
}
|
||||
syncServerLogs(body.logs);
|
||||
renderCollectionJob();
|
||||
})
|
||||
.catch((err) => {
|
||||
appendJobLog(`Ошибка пропуска: ${err.message}`);
|
||||
if (btn) {
|
||||
btn.disabled = false;
|
||||
btn.textContent = 'Пропустить зависшие';
|
||||
}
|
||||
renderCollectionJob();
|
||||
});
|
||||
}
|
||||
|
||||
function cancelCollectionJob() {
|
||||
if (!collectionJob || isCollectionJobTerminal(collectionJob.status)) {
|
||||
return;
|
||||
@@ -671,6 +643,19 @@ function renderCollectionJob() {
|
||||
)).join('');
|
||||
|
||||
cancelButton.disabled = isTerminal;
|
||||
|
||||
const skipBtn = document.getElementById('skip-hung-btn');
|
||||
if (skipBtn) {
|
||||
const isCollecting = !isTerminal && collectionJob.status === 'running';
|
||||
if (isCollecting) {
|
||||
skipBtn.classList.remove('hidden');
|
||||
} else {
|
||||
skipBtn.classList.add('hidden');
|
||||
skipBtn.disabled = false;
|
||||
skipBtn.textContent = 'Пропустить зависшие';
|
||||
}
|
||||
}
|
||||
|
||||
setApiFormBlocked(!isTerminal);
|
||||
}
|
||||
|
||||
|
||||
@@ -80,18 +80,12 @@
|
||||
</div>
|
||||
<div id="api-connect-status" class="api-connect-status"></div>
|
||||
<div id="api-probe-options" class="api-probe-options hidden">
|
||||
<label class="api-form-checkbox" for="api-power-on">
|
||||
<input id="api-power-on" name="power_on_if_host_off" type="checkbox">
|
||||
<span>Включить перед сбором</span>
|
||||
</label>
|
||||
<label class="api-form-checkbox" for="api-power-off">
|
||||
<input id="api-power-off" name="stop_host_after_collect" type="checkbox">
|
||||
<span>Выключить после сбора</span>
|
||||
</label>
|
||||
<div class="api-probe-options-separator"></div>
|
||||
<div id="api-host-off-warning" class="api-host-off-warning hidden">
|
||||
⚠ Host выключен — данные инвентаря могут быть неполными
|
||||
</div>
|
||||
<label class="api-form-checkbox" for="api-debug-payloads">
|
||||
<input id="api-debug-payloads" name="debug_payloads" type="checkbox">
|
||||
<span>Сбор расширенных метрик для отладки</span>
|
||||
<span>Сбор расширенных данных для диагностики</span>
|
||||
</label>
|
||||
<div class="api-form-actions">
|
||||
<button id="api-collect-btn" type="submit">Собрать</button>
|
||||
@@ -102,7 +96,10 @@
|
||||
<section id="api-job-status" class="job-status hidden" aria-live="polite">
|
||||
<div class="job-status-header">
|
||||
<h4>Статус задачи сбора</h4>
|
||||
<button id="cancel-job-btn" type="button">Отменить</button>
|
||||
<div class="job-status-actions">
|
||||
<button id="skip-hung-btn" type="button" class="hidden" title="Прервать зависшие запросы и перейти к анализу собранных данных">Пропустить зависшие</button>
|
||||
<button id="cancel-job-btn" type="button">Отменить</button>
|
||||
</div>
|
||||
</div>
|
||||
<div class="job-status-meta">
|
||||
<div><span class="meta-label">jobId:</span> <code id="job-id-value">-</code></div>
|
||||
|
||||
Reference in New Issue
Block a user