feat: Redfish hardware event log collection + MSI ghost GPU filter + inventory improvements

- Collect hardware event logs (last 7 days) from Systems and Managers/SEL LogServices
- Parse AMI raw IPMI dump messages into readable descriptions (Sensor_Type: Event_Type)
- Filter out audit/journal/non-hardware log services; only SEL from Managers
- MSI ghost GPU filter: exclude processor GPU entries with temperature=0 when host is powered on
- Reanimator collected_at uses InventoryData/Status.LastModifiedTime (30-day fallback)
- Invalidate Redfish inventory CRC groups before host power-on
- Log inventory LastModifiedTime age in collection logs
- Drop SecureBoot collection (SecureBootMode, SecureBootDatabases) — not hardware inventory
- Add build version to UI footer via template
- Add MSI Redfish API reference doc to bible-local/docs/

ADL-032–ADL-035

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mikhail Chusavitin
2026-03-18 23:47:22 +03:00
parent 30409eef67
commit 96e65d8f65
15 changed files with 989 additions and 13 deletions

View File

@@ -822,3 +822,99 @@ special acquisition strategy.
- Repo-owned compact fixtures under `internal/collector/redfishprofile/testdata/`, derived from - Repo-owned compact fixtures under `internal/collector/redfishprofile/testdata/`, derived from
representative raw-export snapshots, are used to lock profile matching and acquisition tuning representative raw-export snapshots, are used to lock profile matching and acquisition tuning
for known MSI and Supermicro-family shapes. for known MSI and Supermicro-family shapes.
---
## ADL-032 — MSI ghost GPU filter: exclude GPUs with temperature=0 on powered-on host
**Date:** 2026-03-18
**Context:**
MSI/AMI BMC caches GPU inventory from the host via Host Interface (in-band). When GPUs are
removed without a reboot the old entries remain in `Chassis/GPU*` and
`Systems/Self/Processors/GPU*` with `Status.Health: OK, State: Enabled`. The BMC has no
out-of-band mechanism to detect physical absence. A physically present GPU always reports
an ambient temperature (>0°C) even when idle; a stale cached entry returns `Reading: 0`.
**Decision:**
- Add `EnableMSIGhostGPUFilter` directive (enabled by MSI profile's `refineAnalysis`
alongside `EnableProcessorGPUFallback`).
- In `collectGPUsFromProcessors`: for each processor GPU, resolve its chassis path and read
`Chassis/GPU{n}/Sensors/GPU{n}_Temperature`. If `PowerState=On` and `Reading=0` → skip.
- Filter only applies when host is powered on; when host is off all temperatures are 0 and
the signal is ambiguous.
**Consequences:**
- Ghost GPUs from previous hardware configurations no longer appear in the inventory.
- Filter is MSI-profile-owned and does not affect HGX, Supermicro, or generic paths.
- Any new MSI GPU chassis that uses a different temperature sensor path will bypass the filter
(safe default: include rather than wrongly exclude).
---
## ADL-033 — Reanimator export collected_at uses inventory LastModifiedTime with 30-day fallback
**Date:** 2026-03-18
**Context:**
For Redfish sources the BMC Manager `DateTime` reflects when the BMC clock read the time, not
when the hardware inventory was last known-good. `InventoryData/Status.LastModifiedTime`
(AMI/MSI OEM endpoint) records the actual timestamp of the last successful host-pushed
inventory cycle and is a better proxy for "when was this hardware configuration last confirmed".
**Decision:**
- `inferInventoryLastModifiedTime` reads `LastModifiedTime` from the snapshot and sets
`AnalysisResult.InventoryLastModifiedAt`.
- `reanimatorCollectedAt()` in the exporter selects `InventoryLastModifiedAt` when it is set
and no older than 30 days; otherwise falls back to `CollectedAt`.
- Fallback rationale: inventory older than 30 days is likely from a long-running server with
no recent reboot; using the actual collection date is more useful for the downstream consumer.
- The inventory timestamp is also logged during replay and live collection for diagnostics.
**Consequences:**
- Reanimator export `collected_at` reflects the last confirmed inventory cycle on AMI/MSI BMCs.
- On non-AMI BMCs or when `InventoryData/Status` is absent, behavior is unchanged.
- If inventory is stale (>30 days), collection date is used as before.
---
## ADL-034 — Redfish inventory invalidated before host power-on
**Date:** 2026-03-18
**Context:**
When a host is powered on by the collector (`power_on_if_host_off=true`), the BMC still holds
inventory from the previous boot. If hardware changed between shutdowns, the new boot will push
fresh inventory — but only if the BMC accepts it (CRC mismatch triggers re-population). Without
explicit invalidation, unchanged CRCs can cause the BMC to skip re-processing even after a
hardware change.
**Decision:**
- Before any power-on attempt, `invalidateRedfishInventory` POSTs to
`{systemPath}/Oem/Ami/Inventory/Crc` with all groups zeroed (`CPU`, `DIMM`, `PCIE`,
`CERTIFICATES`, `SECUREBOOT`).
- Best-effort: a 404/405 response (non-AMI BMC) is logged and silently ignored.
- The invalidation is logged at `INFO` level and surfaced as a collect progress message.
**Consequences:**
- On AMI/MSI BMCs: the next boot will push a full fresh inventory regardless of whether
CRCs appear unchanged, eliminating ghost components from prior hardware configurations.
- On non-AMI BMCs: the POST fails immediately (endpoint does not exist), nothing changes.
- Invalidation runs only when `power_on_if_host_off=true` and host is confirmed off.
---
## ADL-035 — Redfish hardware event log collection from Systems LogServices
**Date:** 2026-03-18
**Context:** Redfish BMCs expose event logs via `LogServices/{svc}/Entries`. On MSI/AMI this includes the IPMI SEL with hardware events (temperature, power, drive failures, etc.). Live collection previously collected only inventory/sensor snapshots; event history was unavailable in Reanimator.
**Decision:**
- After tree-walk, fetch hardware log entries separately via `collectRedfishLogEntries()` (not part of tree-walk to avoid bloat).
- Only `Systems/{sys}/LogServices` is queried — Managers LogServices (BMC audit/journal) are excluded.
- Log services with Id/Name containing "audit", "journal", "bmc", "security", "manager", "debug" are skipped.
- Entries older than 7 days (client-side filter) are discarded. Pages are followed until an out-of-window entry is found (assumes newest-first ordering, typical for BMCs).
- Entries with `EntryType: "Oem"` or `MessageId` containing user/auth/login keywords are filtered as non-hardware.
- Raw entries stored in `rawPayloads["redfish_log_entries"]` as `[]map[string]interface{}`.
- Parsed to `models.Event` in `parseRedfishLogEntries()` during replay — same path for live and offline.
- Max 200 entries per log service, 500 total to limit BMC load.
**Consequences:**
- Hardware event history (last 7 days) visible in Reanimator `EventLogs` section.
- No impact on existing inventory pipeline or offline archive replay (archives without `redfish_log_entries` key silently skip parsing).
- Adds extra HTTP requests during live collection (sequential, after tree-walk completes).

View File

@@ -0,0 +1,343 @@
# MSI BMC Redfish API Reference
Source: MSI Enterprise Platform Solutions — Redfish BMC User Guide v1.0 (AMI/MegaRAC stack).
Spec compliance: DSP0266 1.15.1, DSP8010 2019.2.
> This document is trimmed to sections relevant to LOGPile collection and inventory analysis.
> Auth, LDAP/AD, SMTP, VirtualMedia, Certificates, RADIUS, Composability, and BMC config
> sections are omitted.
---
## Supported HTTP methods
`GET`, `POST`, `PATCH`, `DELETE`. Unsupported methods return `405`.
PATCH requires an `If-Match` / `ETag` precondition header; missing header → `428`, mismatch → `412`.
---
## 1. Core Redfish API endpoints
| Resource | URI | Schema |
|---|---|---|
| Service Root | `/redfish/v1/` | ServiceRoot.v1_7_0 |
| ComputerSystem Collection | `/redfish/v1/Systems` | ComputerSystemCollection |
| ComputerSystem | `/redfish/v1/Systems/{sys}` | ComputerSystem.v1_16_2 |
| Memory Collection | `/redfish/v1/Systems/{sys}/Memory` | MemoryCollection |
| Memory | `/redfish/v1/Systems/{sys}/Memory/{mem}` | Memory.v1_19_0 |
| MemoryMetrics | `/redfish/v1/Systems/{sys}/Memory/{mem}/MemoryMetrics` | MemoryMetrics.v1_7_0 |
| MemoryDomain Collection | `/redfish/v1/Systems/{sys}/MemoryDomain` | MemoryDomainCollection |
| MemoryDomain | `/redfish/v1/Systems/{sys}/MemoryDomain/{dom}` | MemoryDomain.v1_2_3 |
| MemoryChunks Collection | `/redfish/v1/Systems/{sys}/MemoryDomain/{dom}/MemoryChunks` | MemoryChunksCollection |
| MemoryChunks | `/redfish/v1/Systems/{sys}/MemoryDomain/{dom}/MemoryChunks/{chunk}` | MemoryChunks.v1_4_0 |
| Processor Collection | `/redfish/v1/Systems/{sys}/Processors` | ProcessorCollection |
| Processor | `/redfish/v1/Systems/{sys}/Processors/{proc}` | Processor.v1_15_0 |
| SubProcessors Collection | `/redfish/v1/Systems/{sys}/Processors/{proc}/SubProcessors` | ProcessorCollection |
| SubProcessor | `/redfish/v1/Systems/{sys}/Processors/{proc}/SubProcessors/{sub}` | Processor.v1_15_0 |
| ProcessorMetrics | `/redfish/v1/Systems/{sys}/Processors/{proc}/ProcessorMetrics` | ProcessorMetrics.v1_4_0 |
| Bios | `/redfish/v1/Systems/{sys}/Bios` | Bios.v1_2_0 |
| SimpleStorage Collection | `/redfish/v1/Systems/{sys}/SimpleStorage` | SimpleStorageCollection |
| SimpleStorage | `/redfish/v1/Systems/{sys}/SimpleStorage/{ss}` | SimpleStorage.v1_3_0 |
| Storage Collection | `/redfish/v1/Systems/{sys}/Storage` | StorageCollection |
| Storage | `/redfish/v1/Systems/{sys}/Storage/{stor}` | Storage.v1_9_0 |
| StorageController Collection | `/redfish/v1/Systems/{sys}/Storage/{stor}/Controllers` | StorageControllerCollection |
| StorageController | `/redfish/v1/Systems/{sys}/Storage/{stor}/Controllers/{ctrl}` | StorageController.v1_0_0 |
| Drive | `/redfish/v1/Systems/{sys}/Storage/{stor}/Drives/{drv}` | Drive.v1_13_0 |
| Volume Collection | `/redfish/v1/Systems/{sys}/Storage/{stor}/Volumes` | VolumeCollection |
| Volume | `/redfish/v1/Systems/{sys}/Storage/{stor}/Volumes/{vol}` | Volume.v1_5_0 |
| NetworkInterface Collection | `/redfish/v1/Systems/{sys}/NetworkInterfaces` | NetworkInterfaceCollection |
| NetworkInterface | `/redfish/v1/Systems/{sys}/NetworkInterfaces/{nic}` | NetworkInterface.v1_2_0 |
| EthernetInterface (System) | `/redfish/v1/Systems/{sys}/EthernetInterfaces/{eth}` | EthernetInterface.v1_6_2 |
| GraphicsController Collection | `/redfish/v1/Systems/{sys}/GraphicsControllers` | GraphicsControllerCollection |
| GraphicsController | `/redfish/v1/Systems/{sys}/GraphicsControllers/{gpu}` | GraphicsController.v1_0_0 |
| USBController Collection | `/redfish/v1/Systems/{sys}/USBControllers` | USBControllerCollection |
| USBController | `/redfish/v1/Systems/{sys}/USBControllers/{usb}` | USBController.v1_0_0 |
| SecureBoot | `/redfish/v1/Systems/{sys}/SecureBoot` | SecureBoot.v1_1_0 |
| LogService Collection (System) | `/redfish/v1/Systems/{sys}/LogServices` | LogServiceCollection |
| LogService (System) | `/redfish/v1/Systems/{sys}/LogServices/{log}` | LogService.v1_1_3 |
| LogEntry Collection | `/redfish/v1/Systems/{sys}/LogServices/{log}/Entries` | LogEntryCollection |
| LogEntry | `/redfish/v1/Systems/{sys}/LogServices/{log}/Entries/{entry}` | LogEntry.v1_12_0 |
| Chassis Collection | `/redfish/v1/Chassis` | ChassisCollection |
| Chassis | `/redfish/v1/Chassis/{ch}` | Chassis.v1_15_0 |
| Power | `/redfish/v1/Chassis/{ch}/Power` | Power.v1_5_4 |
| PowerSubSystem | `/redfish/v1/Chassis/{ch}/PowerSubSystem` | PowerSubsystem.v1_1_0 |
| PowerSupplies Collection | `/redfish/v1/Chassis/{ch}/PowerSubSystem/PowerSupplies` | PowerSupplyCollection |
| PowerSupply | `/redfish/v1/Chassis/{ch}/PowerSubSystem/PowerSupplies/{psu}` | PowerSupply.v1_3_0 |
| PowerSupplyMetrics | `/redfish/v1/Chassis/{ch}/PowerSubSystem/PowerSupplies/{psu}/Metrics` | PowerSupplyMetrics.v1_0_1 |
| Thermal | `/redfish/v1/Chassis/{ch}/Thermal` | Thermal.v1_5_3 |
| ThermalSubSystem | `/redfish/v1/Chassis/{ch}/ThermalSubSystem` | ThermalSubsystem.v1_0_0 |
| ThermalMetrics | `/redfish/v1/Chassis/{ch}/ThermalSubSystem/ThermalMetrics` | ThermalMetrics.v1_0_1 |
| Fans Collection | `/redfish/v1/Chassis/{ch}/ThermalSubSystem/Fans` | FanCollection |
| Fan | `/redfish/v1/Chassis/{ch}/ThermalSubSystem/Fans/{fan}` | Fan.v1_1_1 |
| Sensor Collection | `/redfish/v1/Chassis/{ch}/Sensors` | SensorCollection |
| Sensor | `/redfish/v1/Chassis/{ch}/Sensors/{sen}` | Sensor.v1_0_2 |
| PCIeDevice Collection | `/redfish/v1/Chassis/{ch}/PCIeDevices` | PCIeDeviceCollection |
| PCIeDevice | `/redfish/v1/Chassis/{ch}/PCIeDevices/{dev}` | PCIeDevice.v1_9_0 |
| PCIeFunction Collection | `/redfish/v1/Chassis/{ch}/PCIeDevices/{dev}/PCIeFunctions` | PCIeFunctionCollection |
| PCIeFunction | `/redfish/v1/Chassis/{ch}/PCIeDevices/{dev}/PCIeFunctions/{fn}` | PCIeFunction.v1_2_3 |
| PCIeSlots | `/redfish/v1/Chassis/{ch}/PCIeSlots` | PCIeSlots.v1_5_0 |
| NetworkAdapter Collection | `/redfish/v1/Chassis/{ch}/NetworkAdapters` | NetworkAdapterCollection |
| NetworkAdapter | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}` | NetworkAdapter.v1_8_0 |
| NetworkDeviceFunction Collection | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/NetworkDeviceFunctions` | NetworkDeviceFunctionCollection |
| NetworkDeviceFunction | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/NetworkDeviceFunctions/{fn}` | NetworkDeviceFunction.v1_5_0 |
| Assembly | `/redfish/v1/Chassis/{ch}/Assembly` | Assembly.v1_2_2 |
| Assembly (Drive) | `/redfish/v1/Systems/{sys}/Storage/{stor}/Drives/{drv}/Assembly` | Assembly.v1_2_2 |
| Assembly (Processor) | `/redfish/v1/Systems/{sys}/Processors/{proc}/Assembly` | Assembly.v1_2_2 |
| Assembly (Memory) | `/redfish/v1/Systems/{sys}/Memory/{mem}/Assembly` | Assembly.v1_2_2 |
| Assembly (NetworkAdapter) | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/Assembly` | Assembly.v1_2_2 |
| Assembly (PCIeDevice) | `/redfish/v1/Chassis/{ch}/PCIeDevices/{dev}/Assembly` | Assembly.v1_2_2 |
| MediaController Collection | `/redfish/v1/Chassis/{ch}/MediaControllers` | MediaControllerCollection |
| MediaController | `/redfish/v1/Chassis/{ch}/MediaControllers/{mc}` | MediaController.v1_1_0 |
| LogService Collection (Chassis) | `/redfish/v1/Chassis/{ch}/LogServices` | LogServiceCollection |
| LogService (Chassis) | `/redfish/v1/Chassis/{ch}/LogServices/{log}` | LogService.v1_1_3 |
| Manager Collection | `/redfish/v1/Managers` | ManagerCollection |
| Manager | `/redfish/v1/Managers/{mgr}` | Manager.v1_13_0 |
| EthernetInterface (Manager) | `/redfish/v1/Managers/{mgr}/EthernetInterfaces/{eth}` | EthernetInterface.v1_6_2 |
| LogService Collection (Manager) | `/redfish/v1/Managers/{mgr}/LogServices` | LogServiceCollection |
| LogService (Manager) | `/redfish/v1/Managers/{mgr}/LogServices/{log}` | LogService.v1_1_3 |
| UpdateService | `/redfish/v1/UpdateService` | UpdateService.v1_6_0 |
| TaskService | `/redfish/v1/TasksService` | TaskService.v1_1_4 |
| Task Collection | `/redfish/v1/TaskService/Tasks` | TaskCollection |
| Task | `/redfish/v1/TaskService/Tasks/{task}` | Task.v1_4_2 |
---
## 2. Telemetry API endpoints
| Resource | URI | Schema |
|---|---|---|
| TelemetryService | `/redfish/v1/TelemetryService` | TelemetryService.v1_2_1 |
| MetricDefinition Collection | `/redfish/v1/TelemetryService/MetricDefinitions` | MetricDefinitionCollection |
| MetricDefinition | `/redfish/v1/TelemetryService/MetricDefinitions/{md}` | MetricDefinition.v1_0_3 |
| MetricReportDefinition Collection | `/redfish/v1/TelemetryService/MetricReportDefinitions` | MetricReportDefinitionCollection |
| MetricReportDefinition | `/redfish/v1/TelemetryService/MetricReportDefinitions/{mrd}` | MetricReportDefinition.v1_3_0 |
| MetricReport Collection | `/redfish/v1/TelemetryService/MetricReports` | MetricReportCollection |
| MetricReport | `/redfish/v1/TelemetryService/MetricReports/{mr}` | MetricReport.v1_2_0 |
| Telemetry LogService | `/redfish/v1/TelemetryService/LogService` | LogService.v1_1_3 |
| Telemetry LogEntry Collection | `/redfish/v1/TelemetryService/LogService/Entries` | LogEntryCollection |
---
## 3. Processor / NIC sub-resources (GPU-relevant)
| Resource | URI |
|---|---|
| Processor (NetworkAdapter) | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/Processors/{proc}` |
| AccelerationFunction Collection | `/redfish/v1/Systems/{sys}/Processors/{proc}/AccelerationFunctions` |
| AccelerationFunction | `/redfish/v1/Systems/{sys}/Processors/{proc}/AccelerationFunctions/{fn}` |
| Port Collection (NetworkAdapter) | `/redfish/v1/Chassis/{ch}/NetworkAdapters/{na}/Ports` |
| Port (GraphicsController) | `/redfish/v1/Systems/{sys}/GraphicsControllers/{gpu}/Ports/{port}` |
| OperatingConfig Collection | `/redfish/v1/Systems/{sys}/Processors/{proc}/OperatingConfigs` |
| OperatingConfig | `/redfish/v1/Systems/{sys}/Processors/{proc}/OperatingConfigs/{cfg}` |
---
## 4. Error response format
On error, the service returns an HTTP status code and a JSON body with a single `error` property:
```json
{
"error": {
"code": "Base.1.12.0.ActionParameterMissing",
"message": "...",
"@Message.ExtendedInfo": [
{
"@odata.type": "#Message.v1_0_8.Message",
"MessageId": "Base.1.12.0.ActionParameterMissing",
"Message": "...",
"MessageArgs": [],
"Severity": "Warning",
"Resolution": "..."
}
]
}
}
```
**Common status codes:**
| Code | Meaning |
|------|---------|
| 200 | OK with body |
| 201 | Created |
| 204 | Success, no body |
| 400 | Bad request / validation error |
| 401 | Unauthorized |
| 403 | Forbidden / firmware update in progress |
| 404 | Resource not found |
| 405 | Method not allowed |
| 412 | ETag precondition failed (PATCH) |
| 415 | Unsupported media type |
| 428 | Missing precondition header (PATCH) |
| 501 | Not implemented |
**Request validation sequence:**
1. Authorization check → 401
2. Entity privilege check → 403
3. URI existence → 404
4. Firmware update lock → 403
5. Method allowed → 405
6. Media type → 415
7. Body format → 400
8. PATCH: ETag header → 428/412
9. Property validation → 400
---
## 5. OEM: Inventory refresh (AMI/MSI-specific)
### 5.1 InventoryCrc — force component re-inventory
`GET/POST/DELETE /redfish/v1/Systems/{sys}/Oem/Ami/Inventory/Crc`
The `GroupCrcList` field lists current CRC checksums per component group. When a group's CRC
changes (host sends new inventory) or is explicitly zeroed out via POST, the BMC discards its
cached inventory and re-reads that group from the host.
**CRC groups:**
| Group | Covers |
|-------|--------|
| `CPU` | Processors, ProcessorMetrics |
| `DIMM` | Memory, MemoryDomains, MemoryChunks, MemoryMetrics |
| `PCIE` | Storage, PCIeDevices, NetworkInterfaces, NetworkAdapters |
| `CERTIFICATES` | Boot Certificates |
| `SECURBOOT` | SecureBoot data |
**POST — invalidate selected groups (force re-inventory):**
```
POST /redfish/v1/Systems/{sys}/Oem/Ami/Inventory/Crc
Content-Type: application/json
{
"GroupCrcList": [
{ "CPU": 0 },
{ "DIMM": 0 },
{ "PCIE": 0 }
]
}
```
Setting a group's value to `0` signals the BMC to invalidate and repopulate that group on next
host inventory push (typically at next boot or host-interface inventory cycle).
**DELETE** — remove all CRC records entirely.
**Note:** Inventory data is populated by the host via the Redfish Host Interface (in-band),
not by the BMC itself. Zeroing a CRC group does not immediately re-read hardware — it marks
the group as stale so the next host-side inventory push will be accepted. A cold reboot is the
most reliable trigger.
### 5.2 InventoryData Status — monitor inventory processing
`GET /redfish/v1/Oem/Ami/InventoryData/Status`
Available only after the host has posted an inventory file. Shows current processing state.
**Status enum:**
| Value | Meaning |
|-------|---------|
| `BootInProgress` | Host is booting |
| `Queued` | Processing task queued |
| `In-Progress` | Processing running in background |
| `Ready` / `Completed` | Processing finished successfully |
| `Failed` | Processing failed |
Response also includes:
- `InventoryData.DeletedModules` — array of groups updated in this population cycle
- `InventoryData.Messages` — warnings/errors encountered during processing
- `ProcessingTime` — milliseconds taken
- `LastModifiedTime` — ISO 8601 timestamp of last successful update
### 5.3 Systems OEM properties — Inventory reference
`GET /redfish/v1/Systems/{sys}``Oem.Ami` contains:
| Property | Notes |
|----------|-------|
| `Inventory` | Reference to InventoryCrc URI + current GroupCrc data |
| `RedfishVersion` | BIOS Redfish version (populated via Host Interface) |
| `RtpVersion` | BIOS RTP version (populated via Host Interface) |
| `ManagerBootConfiguration.ManagerBootMode` | PATCH to trigger soft reset: `SoftReset` / `ResetTimeout` / `None` |
---
## 6. OEM: Component state actions
### 6.1 Memory enable/disable
```
POST /redfish/v1/Systems/{sys}/Memory/{mem}/Actions/AmiBios.ChangeState
Content-Type: application/json
{ "State": "Disabled" }
```
Response: 204.
### 6.2 PCIeFunction enable/disable
```
POST /redfish/v1/Chassis/{ch}/PCIeDevices/{dev}/PCIeFunctions/{fn}/Actions/AmiBios.ChangeState
Content-Type: application/json
{ "State": "Disabled" }
```
Response: 204.
---
## 7. OEM: Storage sensor readings
`GET /redfish/v1/Systems/{sys}/Storage/{stor}``Oem.Ami.StorageControllerSensors`
Array of sensor objects per storage controller instance. Each entry exposes:
- `Reading` (Number) — current sensor value
- `ReadingType` (String) — type of reading
- `ReadingUnit` (String) — unit
---
## 8. OEM: Power and Thermal OwnerLUN
Both `GET /redfish/v1/Chassis/{ch}/Power` and `GET /redfish/v1/Chassis/{ch}/Thermal` expose
`Oem.Ami.OwnerLUN` (Number, read-only) — the IPMI LUN associated with each
temperature/fan/voltage sensor entry. Useful for correlating Redfish sensor readings with IPMI
SDR records.
---
## 9. UpdateService
`GET /redfish/v1/UpdateService``Oem.Ami.BMC.DualImageConfiguration`:
| Property | Description |
|----------|-------------|
| `ActiveImage` | Currently active BMC image slot |
| `BootImage` | Image slot BMC boots from |
| `FirmwareImage1Name` / `FirmwareImage1Version` | First image slot name + version |
| `FirmwareImage2Name` / `FirmwareImage2Version` | Second image slot name + version |
Standard `SimpleUpdate` action available at `/redfish/v1/UpdateService/Actions/UpdateService.SimpleUpdate`.
---
## 10. Inventory refresh summary
| Approach | Trigger | Latency | Scope |
|----------|---------|---------|-------|
| Host reboot | Physical/soft reset | Minutes | All groups |
| `POST InventoryCrc` (groups = 0) | Explicit API call | Next host inventory push | Selected groups |
| Firmware update (`SimpleUpdate`) | Explicit API call | Minutes + reboot | Full platform |
| Sensor/telemetry reads | Always live on GET | Immediate | Sensors only |
**Key constraint:** `InventoryCrc POST` marks groups stale but does not re-read hardware
directly. Actual inventory data flows from the host to BMC via the Redfish Host Interface
in-band channel, typically during POST/boot. For immediate inventory refresh without a full
reboot, a soft reset via `ManagerBootMode: SoftReset` PATCH may be sufficient on some
configurations.

View File

@@ -311,6 +311,8 @@ func (c *RedfishConnector) Collect(ctx context.Context, req Request, emit Progre
if emit != nil { if emit != nil {
emit(Progress{Status: "running", Progress: 99, Message: "Redfish: анализ raw snapshot..."}) emit(Progress{Status: "running", Progress: 99, Message: "Redfish: анализ raw snapshot..."})
} }
// Collect hardware event logs separately (not part of tree-walk to avoid bloat).
rawLogEntries := c.collectRedfishLogEntries(withRedfishTelemetryPhase(ctx, "log_entries"), snapshotClient, req, baseURL, systemPaths, managerPaths)
rawPayloads := map[string]any{ rawPayloads := map[string]any{
"redfish_tree": rawTree, "redfish_tree": rawTree,
"redfish_profiles": map[string]any{ "redfish_profiles": map[string]any{
@@ -413,12 +415,21 @@ func (c *RedfishConnector) Collect(ctx context.Context, req Request, emit Progre
if len(fetchErrMap) > 0 { if len(fetchErrMap) > 0 {
rawPayloads["redfish_fetch_errors"] = redfishFetchErrorMapToList(fetchErrMap) rawPayloads["redfish_fetch_errors"] = redfishFetchErrorMapToList(fetchErrMap)
} }
if len(rawLogEntries) > 0 {
rawPayloads["redfish_log_entries"] = rawLogEntries
}
// Unified tunnel: live collection and raw import go through the same analyzer over redfish_tree. // Unified tunnel: live collection and raw import go through the same analyzer over redfish_tree.
result, err := ReplayRedfishFromRawPayloads(rawPayloads, nil) result, err := ReplayRedfishFromRawPayloads(rawPayloads, nil)
if err != nil { if err != nil {
return nil, err return nil, err
} }
totalElapsed := time.Since(collectStart).Round(time.Second) totalElapsed := time.Since(collectStart).Round(time.Second)
if !result.InventoryLastModifiedAt.IsZero() {
log.Printf("redfish-collect: inventory last modified at %s (age: %s)",
result.InventoryLastModifiedAt.Format(time.RFC3339),
time.Since(result.InventoryLastModifiedAt).Round(time.Minute),
)
}
log.Printf( log.Printf(
"redfish-postprobe-metrics: nvme_candidates=%d nvme_selected=%d nvme_added=%d candidates=%d selected=%d skipped_explicit=%d added=%d dur=%s", "redfish-postprobe-metrics: nvme_candidates=%d nvme_selected=%d nvme_added=%d candidates=%d selected=%d skipped_explicit=%d added=%d dur=%s",
postProbeMetrics.NVMECandidates, postProbeMetrics.NVMECandidates,
@@ -495,6 +506,11 @@ func (c *RedfishConnector) ensureHostPowerForCollection(ctx context.Context, cli
return false, false return false, false
} }
// Invalidate all inventory CRC groups before powering on so the BMC accepts
// fresh inventory from the host after boot. Best-effort: failure is logged but
// does not block power-on.
c.invalidateRedfishInventory(ctx, client, req, baseURL, systemPath, emit)
resetTarget := redfishResetActionTarget(systemDoc) resetTarget := redfishResetActionTarget(systemDoc)
resetType := redfishPickResetType(systemDoc, "On", "ForceOn") resetType := redfishPickResetType(systemDoc, "On", "ForceOn")
if resetTarget == "" || resetType == "" { if resetTarget == "" || resetType == "" {
@@ -602,6 +618,32 @@ func (c *RedfishConnector) restoreHostPowerAfterCollection(ctx context.Context,
} }
} }
// invalidateRedfishInventory POSTs to the AMI/MSI InventoryCrc endpoint to zero out
// all known CRC groups before a host power-on. This causes the BMC to accept fresh
// inventory from the host after boot, preventing stale inventory (ghost GPUs, wrong
// BIOS version, etc.) from persisting across hardware changes.
// Best-effort: any error is logged and the call silently returns.
func (c *RedfishConnector) invalidateRedfishInventory(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string, emit ProgressFn) {
crcPath := joinPath(systemPath, "/Oem/Ami/Inventory/Crc")
body := map[string]any{
"GroupCrcList": []map[string]any{
{"CPU": 0},
{"DIMM": 0},
{"PCIE": 0},
{"CERTIFICATES": 0},
{"SECUREBOOT": 0},
},
}
if err := c.postJSON(ctx, client, req, baseURL, crcPath, body); err != nil {
log.Printf("redfish: inventory invalidation skipped (not AMI/MSI or endpoint unavailable): %v", err)
return
}
log.Printf("redfish: inventory CRC groups invalidated at %s before host power-on", crcPath)
if emit != nil {
emit(Progress{Status: "running", Progress: 19, Message: "Redfish: инвентарь BMC инвалидирован перед включением host (все CRC группы сброшены)"})
}
}
func (c *RedfishConnector) waitForHostPowerState(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string, wantOn bool, timeout time.Duration) bool { func (c *RedfishConnector) waitForHostPowerState(ctx context.Context, client *http.Client, req Request, baseURL, systemPath string, wantOn bool, timeout time.Duration) bool {
deadline := time.Now().Add(timeout) deadline := time.Now().Add(timeout)
for { for {
@@ -2627,6 +2669,7 @@ func shouldCrawlPath(path string) bool {
"/Bios/Settings", "/Bios/Settings",
"/GetServerAllUSBStatus", "/GetServerAllUSBStatus",
"/Oem/Public/KVM", "/Oem/Public/KVM",
"/SecureBoot/SecureBootDatabases",
} { } {
if strings.Contains(normalized, part) { if strings.Contains(normalized, part) {
return false return false
@@ -5548,7 +5591,7 @@ func storageControllerFromPath(path string) string {
return "" return ""
} }
func parseFirmware(system, bios, manager, secureBoot, networkProtocol map[string]interface{}) []models.FirmwareInfo { func parseFirmware(system, bios, manager, networkProtocol map[string]interface{}) []models.FirmwareInfo {
var out []models.FirmwareInfo var out []models.FirmwareInfo
appendFW := func(name, version string) { appendFW := func(name, version string) {
@@ -5562,7 +5605,6 @@ func parseFirmware(system, bios, manager, secureBoot, networkProtocol map[string
appendFW("BIOS", asString(system["BiosVersion"])) appendFW("BIOS", asString(system["BiosVersion"]))
appendFW("BIOS", asString(bios["Version"])) appendFW("BIOS", asString(bios["Version"]))
appendFW("BMC", asString(manager["FirmwareVersion"])) appendFW("BMC", asString(manager["FirmwareVersion"]))
appendFW("SecureBoot", asString(secureBoot["SecureBootMode"]))
return out return out
} }

View File

@@ -0,0 +1,392 @@
package collector
import (
"context"
"log"
"net/http"
"strings"
"time"
"git.mchus.pro/mchus/logpile/internal/models"
)
const (
redfishLogEntriesWindow = 7 * 24 * time.Hour
redfishLogEntriesMaxTotal = 500
redfishLogEntriesMaxPerSvc = 200
)
// collectRedfishLogEntries fetches hardware event log entries from Systems and Managers LogServices.
// Only hardware-relevant entries from the last 7 days are returned.
// For Systems: all log services except audit/journal/security/debug.
// For Managers: only the IPMI SEL service (Id="SEL") — audit and event logs are excluded.
func (c *RedfishConnector) collectRedfishLogEntries(ctx context.Context, client *http.Client, req Request, baseURL string, systemPaths, managerPaths []string) []map[string]interface{} {
cutoff := time.Now().UTC().Add(-redfishLogEntriesWindow)
seen := make(map[string]struct{})
var out []map[string]interface{}
collectFrom := func(logServicesPath string, filter func(map[string]interface{}) bool) {
if len(out) >= redfishLogEntriesMaxTotal {
return
}
services, err := c.getCollectionMembers(ctx, client, req, baseURL, logServicesPath)
if err != nil || len(services) == 0 {
return
}
for _, svc := range services {
if len(out) >= redfishLogEntriesMaxTotal {
break
}
if !filter(svc) {
continue
}
entriesPath := redfishLogServiceEntriesPath(svc)
if entriesPath == "" {
continue
}
entries := c.fetchRedfishLogEntriesWithPaging(ctx, client, req, baseURL, entriesPath, cutoff, seen, redfishLogEntriesMaxPerSvc)
out = append(out, entries...)
}
}
for _, systemPath := range systemPaths {
collectFrom(joinPath(systemPath, "/LogServices"), isHardwareLogService)
}
// Managers hold the IPMI SEL on AMI/MSI BMCs — include only the "SEL" service.
for _, managerPath := range managerPaths {
collectFrom(joinPath(managerPath, "/LogServices"), isManagerSELService)
}
if len(out) > 0 {
log.Printf("redfish: collected %d hardware log entries (Systems+Managers SEL, window=7d)", len(out))
}
return out
}
// fetchRedfishLogEntriesWithPaging fetches entries from a LogEntry collection,
// following nextLink pages. Stops early when entries older than cutoff are encountered
// (assumes BMC returns entries newest-first, which is typical).
func (c *RedfishConnector) fetchRedfishLogEntriesWithPaging(ctx context.Context, client *http.Client, req Request, baseURL, entriesPath string, cutoff time.Time, seen map[string]struct{}, limit int) []map[string]interface{} {
var out []map[string]interface{}
nextPath := entriesPath
for nextPath != "" && len(out) < limit {
collection, err := c.getJSON(ctx, client, req, baseURL, nextPath)
if err != nil {
break
}
// Handle both linked members (@odata.id only) and inline members (full objects).
rawMembers, _ := collection["Members"].([]interface{})
hitOldEntry := false
for _, rawMember := range rawMembers {
if len(out) >= limit {
break
}
memberMap, ok := rawMember.(map[string]interface{})
if !ok {
continue
}
var entry map[string]interface{}
if _, hasCreated := memberMap["Created"]; hasCreated {
// Inline entry — use directly.
entry = memberMap
} else {
// Linked entry — fetch by path.
memberPath := normalizeRedfishPath(asString(memberMap["@odata.id"]))
if memberPath == "" {
continue
}
entry, err = c.getJSON(ctx, client, req, baseURL, memberPath)
if err != nil || len(entry) == 0 {
continue
}
}
// Dedup by entry Id or path.
entryKey := asString(entry["Id"])
if entryKey == "" {
entryKey = asString(entry["@odata.id"])
}
if entryKey != "" {
if _, dup := seen[entryKey]; dup {
continue
}
seen[entryKey] = struct{}{}
}
// Time filter.
created := parseRedfishEntryTime(asString(entry["Created"]))
if !created.IsZero() && created.Before(cutoff) {
hitOldEntry = true
continue
}
// Hardware relevance filter.
if !isHardwareLogEntry(entry) {
continue
}
out = append(out, entry)
}
// Stop paging once we've seen entries older than the window.
if hitOldEntry {
break
}
nextPath = firstNonEmpty(
normalizeRedfishPath(asString(collection["Members@odata.nextLink"])),
normalizeRedfishPath(asString(collection["@odata.nextLink"])),
)
}
return out
}
// isManagerSELService returns true only for the IPMI SEL exposed under Managers.
// On AMI/MSI BMCs the hardware SEL lives at Managers/{mgr}/LogServices/SEL.
// All other Manager log services (AuditLog, EventLog, Journal) are excluded.
func isManagerSELService(svc map[string]interface{}) bool {
id := strings.ToLower(strings.TrimSpace(asString(svc["Id"])))
return id == "sel"
}
// isHardwareLogService returns true if the log service looks like a hardware event log
// (SEL, System Event Log) rather than a BMC audit/journal log.
func isHardwareLogService(svc map[string]interface{}) bool {
id := strings.ToLower(strings.TrimSpace(asString(svc["Id"])))
name := strings.ToLower(strings.TrimSpace(asString(svc["Name"])))
for _, skip := range []string{"audit", "journal", "bmc", "security", "manager", "debug"} {
if strings.Contains(id, skip) || strings.Contains(name, skip) {
return false
}
}
return true
}
// redfishLogServiceEntriesPath returns the Entries collection path for a LogService document.
func redfishLogServiceEntriesPath(svc map[string]interface{}) string {
if entriesLink, ok := svc["Entries"].(map[string]interface{}); ok {
if p := normalizeRedfishPath(asString(entriesLink["@odata.id"])); p != "" {
return p
}
}
if id := normalizeRedfishPath(asString(svc["@odata.id"])); id != "" {
return joinPath(id, "/Entries")
}
return ""
}
// isHardwareLogEntry returns true if the log entry is hardware-related.
// Audit, authentication, and session events are excluded.
func isHardwareLogEntry(entry map[string]interface{}) bool {
entryType := strings.TrimSpace(asString(entry["EntryType"]))
if strings.EqualFold(entryType, "Oem") {
return false
}
msgID := strings.ToLower(strings.TrimSpace(asString(entry["MessageId"])))
for _, skip := range []string{
"user", "account", "password", "login", "logon", "session",
"auth", "certificate", "security", "credential", "privilege",
} {
if strings.Contains(msgID, skip) {
return false
}
}
// Also check the human-readable message for obvious audit patterns.
msg := strings.ToLower(strings.TrimSpace(asString(entry["Message"])))
for _, skip := range []string{"logged in", "logged out", "log in", "log out", "sign in", "signed in"} {
if strings.Contains(msg, skip) {
return false
}
}
return true
}
// parseRedfishEntryTime parses a Redfish LogEntry Created timestamp (ISO 8601 / RFC 3339).
func parseRedfishEntryTime(raw string) time.Time {
raw = strings.TrimSpace(raw)
if raw == "" {
return time.Time{}
}
for _, layout := range []string{time.RFC3339, time.RFC3339Nano, "2006-01-02T15:04:05Z07:00"} {
if t, err := time.Parse(layout, raw); err == nil {
return t.UTC()
}
}
return time.Time{}
}
// parseRedfishLogEntries converts raw log entries stored in RawPayloads into models.Event slice.
// Called during Redfish replay for both live and offline (archive) collections.
func parseRedfishLogEntries(rawPayloads map[string]any, collectedAt time.Time) []models.Event {
raw, ok := rawPayloads["redfish_log_entries"]
if !ok {
return nil
}
var entries []map[string]interface{}
switch v := raw.(type) {
case []map[string]interface{}:
entries = v
case []interface{}:
for _, item := range v {
if m, ok := item.(map[string]interface{}); ok {
entries = append(entries, m)
}
}
default:
return nil
}
if len(entries) == 0 {
return nil
}
out := make([]models.Event, 0, len(entries))
for _, entry := range entries {
ev := redfishLogEntryToEvent(entry, collectedAt)
if ev == nil {
continue
}
out = append(out, *ev)
}
return out
}
// redfishLogEntryToEvent converts a single Redfish LogEntry document to models.Event.
func redfishLogEntryToEvent(entry map[string]interface{}, collectedAt time.Time) *models.Event {
// Prefer EventTimestamp (actual hardware event time) over Created (Redfish record creation time).
ts := parseRedfishEntryTime(asString(entry["EventTimestamp"]))
if ts.IsZero() {
ts = parseRedfishEntryTime(asString(entry["Created"]))
}
if ts.IsZero() {
ts = collectedAt
}
severity := redfishLogEntrySeverity(entry)
sensorType := strings.TrimSpace(asString(entry["SensorType"]))
messageID := strings.TrimSpace(asString(entry["MessageId"]))
entryType := strings.TrimSpace(asString(entry["EntryType"]))
entryCode := strings.TrimSpace(asString(entry["EntryCode"]))
// SensorName: prefer "Name", fall back to "SensorNumber" + SensorType.
sensorName := strings.TrimSpace(asString(entry["Name"]))
if sensorName == "" {
num := strings.TrimSpace(asString(entry["SensorNumber"]))
if num != "" && sensorType != "" {
sensorName = sensorType + " " + num
}
}
rawMessage := strings.TrimSpace(asString(entry["Message"]))
// AMI/MSI BMCs dump raw IPMI record fields into Message instead of human-readable text.
// Detect this and build a readable description from structured fields instead.
description, rawData := redfishDecodeMessage(rawMessage, sensorType, entryCode, entry)
if description == "" {
return nil
}
return &models.Event{
ID: messageID,
Timestamp: ts,
Source: "redfish",
SensorType: sensorType,
SensorName: sensorName,
EventType: entryType,
Severity: severity,
Description: description,
RawData: rawData,
}
}
// redfishDecodeMessage returns a human-readable description and optional raw data.
// AMI/MSI BMCs dump raw IPMI record fields into Message as "Key : Value, Key : Value, ..."
// instead of a plain human-readable string. We extract the useful decoded fields from it.
func redfishDecodeMessage(message, sensorType, entryCode string, entry map[string]interface{}) (description, rawData string) {
if !isRawIPMIDump(message) {
description = message
return
}
rawData = message
kv := parseIPMIDumpKV(message)
// Sensor_Type inside the dump is more specific than the top-level SensorType field.
if v := kv["Sensor_Type"]; v != "" {
sensorType = v
}
eventType := kv["Event_Type"] // human-readable IPMI event type, e.g. "Legacy OFF State"
var parts []string
if sensorType != "" {
parts = append(parts, sensorType)
}
if eventType != "" {
parts = append(parts, eventType)
} else if entryCode != "" {
parts = append(parts, entryCode)
}
description = strings.Join(parts, ": ")
return
}
// isRawIPMIDump returns true if the message is an AMI raw IPMI record dump.
func isRawIPMIDump(message string) bool {
return strings.Contains(message, "Event_Data_1 :") && strings.Contains(message, "Record_Type :")
}
// parseIPMIDumpKV parses the AMI "Key : Value, Key : Value, " format into a map.
func parseIPMIDumpKV(message string) map[string]string {
out := make(map[string]string)
for _, part := range strings.Split(message, ",") {
part = strings.TrimSpace(part)
idx := strings.Index(part, " : ")
if idx < 0 {
continue
}
k := strings.TrimSpace(part[:idx])
v := strings.TrimSpace(part[idx+3:])
if k != "" && v != "" {
out[k] = v
}
}
return out
}
// redfishLogEntrySeverity maps a Redfish LogEntry to models.Severity.
// AMI/MSI BMCs often set Severity="OK" on all SEL records regardless of content,
// so we fall back to inferring severity from SensorType when the explicit field is unhelpful.
func redfishLogEntrySeverity(entry map[string]interface{}) models.Severity {
// Newer Redfish uses MessageSeverity; older uses Severity.
raw := strings.ToLower(firstNonEmpty(
strings.TrimSpace(asString(entry["MessageSeverity"])),
strings.TrimSpace(asString(entry["Severity"])),
))
switch raw {
case "critical":
return models.SeverityCritical
case "warning":
return models.SeverityWarning
case "ok", "informational", "":
// BMC didn't set a meaningful severity — infer from SensorType.
return redfishSeverityFromSensorType(strings.TrimSpace(asString(entry["SensorType"])))
default:
return models.SeverityInfo
}
}
// redfishSeverityFromSensorType infers event severity from the IPMI/Redfish SensorType string.
func redfishSeverityFromSensorType(sensorType string) models.Severity {
switch strings.ToLower(sensorType) {
case "critical interrupt", "processor", "memory", "power unit",
"power supply", "drive slot", "system firmware progress":
return models.SeverityWarning
default:
return models.SeverityInfo
}
}

View File

@@ -53,7 +53,7 @@ func ReplayRedfishFromRawPayloads(rawPayloads map[string]any, emit ProgressFn) (
chassisDoc, _ := r.getJSON(primaryChassis) chassisDoc, _ := r.getJSON(primaryChassis)
managerDoc, _ := r.getJSON(primaryManager) managerDoc, _ := r.getJSON(primaryManager)
biosDoc, _ := r.getJSON(joinPath(primarySystem, "/Bios")) biosDoc, _ := r.getJSON(joinPath(primarySystem, "/Bios"))
secureBootDoc, _ := r.getJSON(joinPath(primarySystem, "/SecureBoot"))
systemFRUDoc, _ := r.getJSON(joinPath(primarySystem, "/Oem/Public/FRU")) systemFRUDoc, _ := r.getJSON(joinPath(primarySystem, "/Oem/Public/FRU"))
chassisFRUDoc, _ := r.getJSON(joinPath(primaryChassis, "/Oem/Public/FRU")) chassisFRUDoc, _ := r.getJSON(joinPath(primaryChassis, "/Oem/Public/FRU"))
fruDoc := systemFRUDoc fruDoc := systemFRUDoc
@@ -96,16 +96,19 @@ func ReplayRedfishFromRawPayloads(rawPayloads map[string]any, emit ProgressFn) (
healthEvents := r.collectHealthSummaryEvents(chassisPaths) healthEvents := r.collectHealthSummaryEvents(chassisPaths)
driveFetchWarningEvents := buildDriveFetchWarningEvents(rawPayloads) driveFetchWarningEvents := buildDriveFetchWarningEvents(rawPayloads)
networkProtocolDoc, _ := r.getJSON(joinPath(primaryManager, "/NetworkProtocol")) networkProtocolDoc, _ := r.getJSON(joinPath(primaryManager, "/NetworkProtocol"))
firmware := parseFirmware(systemDoc, biosDoc, managerDoc, secureBootDoc, networkProtocolDoc) firmware := parseFirmware(systemDoc, biosDoc, managerDoc, networkProtocolDoc)
firmware = dedupeFirmwareInfo(append(firmware, r.collectFirmwareInventory()...)) firmware = dedupeFirmwareInfo(append(firmware, r.collectFirmwareInventory()...))
boardInfo.BMCMACAddress = r.collectBMCMAC(managerPaths) boardInfo.BMCMACAddress = r.collectBMCMAC(managerPaths)
assemblyFRU := r.collectAssemblyFRU(chassisPaths) assemblyFRU := r.collectAssemblyFRU(chassisPaths)
collectedAt, sourceTimezone := inferRedfishCollectionTime(managerDoc, rawPayloads) collectedAt, sourceTimezone := inferRedfishCollectionTime(managerDoc, rawPayloads)
inventoryLastModifiedAt := inferInventoryLastModifiedTime(r.tree)
logEntryEvents := parseRedfishLogEntries(rawPayloads, collectedAt)
result := &models.AnalysisResult{ result := &models.AnalysisResult{
CollectedAt: collectedAt, CollectedAt: collectedAt,
InventoryLastModifiedAt: inventoryLastModifiedAt,
SourceTimezone: sourceTimezone, SourceTimezone: sourceTimezone,
Events: append(append(append(make([]models.Event, 0, len(discreteEvents)+len(healthEvents)+len(driveFetchWarningEvents)+1), healthEvents...), discreteEvents...), driveFetchWarningEvents...), Events: append(append(append(append(make([]models.Event, 0, len(discreteEvents)+len(healthEvents)+len(driveFetchWarningEvents)+len(logEntryEvents)+1), healthEvents...), discreteEvents...), driveFetchWarningEvents...), logEntryEvents...),
FRU: assemblyFRU, FRU: assemblyFRU,
Sensors: dedupeSensorReadings(append(append(thresholdSensors, thermalSensors...), powerSensors...)), Sensors: dedupeSensorReadings(append(append(thresholdSensors, thermalSensors...), powerSensors...)),
RawPayloads: cloneRawPayloads(rawPayloads), RawPayloads: cloneRawPayloads(rawPayloads),
@@ -183,6 +186,35 @@ func inferRedfishCollectionTime(managerDoc map[string]interface{}, rawPayloads m
return time.Time{}, offset return time.Time{}, offset
} }
// inferInventoryLastModifiedTime reads InventoryData/Status.InventoryData.LastModifiedTime
// from the Redfish snapshot. Returns zero time if not present or unparseable.
func inferInventoryLastModifiedTime(snapshot map[string]interface{}) time.Time {
docAny, ok := snapshot["/redfish/v1/Oem/Ami/InventoryData/Status"]
if !ok {
return time.Time{}
}
doc, ok := docAny.(map[string]interface{})
if !ok {
return time.Time{}
}
invData, ok := doc["InventoryData"].(map[string]interface{})
if !ok {
return time.Time{}
}
raw := strings.TrimSpace(asString(invData["LastModifiedTime"]))
if raw == "" {
return time.Time{}
}
for _, layout := range []string{time.RFC3339, time.RFC3339Nano} {
if ts, err := time.Parse(layout, raw); err == nil {
t := ts.UTC()
log.Printf("redfish replay: inventory last modified at %s (InventoryData/Status.LastModifiedTime)", t.Format(time.RFC3339))
return t
}
}
return time.Time{}
}
func appendMissingServerModelWarning(result *models.AnalysisResult, systemDoc map[string]interface{}, systemFRUPath, chassisFRUPath string) { func appendMissingServerModelWarning(result *models.AnalysisResult, systemDoc map[string]interface{}, systemFRUPath, chassisFRUPath string) {
if result == nil || result.Hardware == nil { if result == nil || result.Hardware == nil {
return return

View File

@@ -58,6 +58,44 @@ func (r redfishSnapshotReader) collectGPUs(systemPaths, chassisPaths []string, p
return out return out
} }
// msiGhostGPUFilter returns true when the GPU chassis for gpuID shows a temperature
// of 0 on a powered-on host, which is the reliable MSI/AMI signal that the GPU is
// no longer physically installed (stale BMC inventory cache).
// It only filters when the system PowerState is "On" — when the host is off, all
// temperature readings are 0 and we cannot distinguish absent from idle.
func (r redfishSnapshotReader) msiGhostGPUFilter(systemPaths []string, gpuID, chassisPath string) bool {
// Require host powered on.
for _, sp := range systemPaths {
doc, err := r.getJSON(sp)
if err != nil {
continue
}
if !strings.EqualFold(strings.TrimSpace(asString(doc["PowerState"])), "on") {
return false
}
break
}
// Read the temperature sensor for this GPU chassis.
sensorPath := joinPath(chassisPath, "/Sensors/"+gpuID+"_Temperature")
sensorDoc, err := r.getJSON(sensorPath)
if err != nil || len(sensorDoc) == 0 {
return false
}
reading, ok := sensorDoc["Reading"]
if !ok {
return false
}
switch v := reading.(type) {
case float64:
return v == 0
case int:
return v == 0
case int64:
return v == 0
}
return false
}
// collectGPUsFromProcessors finds GPUs that some BMCs (e.g. MSI) expose as // collectGPUsFromProcessors finds GPUs that some BMCs (e.g. MSI) expose as
// Processor entries with ProcessorType=GPU rather than as PCIe devices. // Processor entries with ProcessorType=GPU rather than as PCIe devices.
// It supplements the existing gpus slice (already found via PCIe path), // It supplements the existing gpus slice (already found via PCIe path),
@@ -68,6 +106,7 @@ func (r redfishSnapshotReader) collectGPUsFromProcessors(systemPaths, chassisPat
return append([]models.GPU{}, existing...) return append([]models.GPU{}, existing...)
} }
chassisByID := make(map[string]map[string]interface{}) chassisByID := make(map[string]map[string]interface{})
chassisPathByID := make(map[string]string)
for _, cp := range chassisPaths { for _, cp := range chassisPaths {
doc, err := r.getJSON(cp) doc, err := r.getJSON(cp)
if err != nil || len(doc) == 0 { if err != nil || len(doc) == 0 {
@@ -76,6 +115,7 @@ func (r redfishSnapshotReader) collectGPUsFromProcessors(systemPaths, chassisPat
id := strings.TrimSpace(asString(doc["Id"])) id := strings.TrimSpace(asString(doc["Id"]))
if id != "" { if id != "" {
chassisByID[strings.ToUpper(id)] = doc chassisByID[strings.ToUpper(id)] = doc
chassisPathByID[strings.ToUpper(id)] = cp
} }
} }
@@ -108,6 +148,13 @@ func (r redfishSnapshotReader) collectGPUsFromProcessors(systemPaths, chassisPat
serial = resolveProcessorGPUChassisSerial(chassisByID, gpuID, plan) serial = resolveProcessorGPUChassisSerial(chassisByID, gpuID, plan)
} }
if plan.Directives.EnableMSIGhostGPUFilter {
chassisPath := resolveProcessorGPUChassisPath(chassisPathByID, gpuID, plan)
if chassisPath != "" && r.msiGhostGPUFilter(systemPaths, gpuID, chassisPath) {
continue
}
}
uuid := strings.TrimSpace(asString(doc["UUID"])) uuid := strings.TrimSpace(asString(doc["UUID"]))
uuidKey := strings.ToUpper(uuid) uuidKey := strings.ToUpper(uuid)
serialKey := strings.ToUpper(serial) serialKey := strings.ToUpper(serial)

View File

@@ -45,6 +45,15 @@ func resolveProcessorGPUChassisSerial(chassisByID map[string]map[string]interfac
return "" return ""
} }
func resolveProcessorGPUChassisPath(chassisPathByID map[string]string, gpuID string, plan redfishprofile.ResolvedAnalysisPlan) string {
for _, candidateID := range processorGPUChassisCandidateIDs(gpuID, plan) {
if p, ok := chassisPathByID[strings.ToUpper(candidateID)]; ok {
return p
}
}
return ""
}
func processorGPUChassisCandidateIDs(gpuID string, plan redfishprofile.ResolvedAnalysisPlan) []string { func processorGPUChassisCandidateIDs(gpuID string, plan redfishprofile.ResolvedAnalysisPlan) []string {
gpuID = strings.TrimSpace(gpuID) gpuID = strings.TrimSpace(gpuID)
if gpuID == "" { if gpuID == "" {

View File

@@ -52,7 +52,6 @@ func baselineSeedPaths(discovered DiscoveredResources) []string {
for _, p := range discovered.SystemPaths { for _, p := range discovered.SystemPaths {
add(p) add(p)
add(joinPath(p, "/Bios")) add(joinPath(p, "/Bios"))
add(joinPath(p, "/SecureBoot"))
add(joinPath(p, "/Oem/Public")) add(joinPath(p, "/Oem/Public"))
add(joinPath(p, "/Oem/Public/FRU")) add(joinPath(p, "/Oem/Public/FRU"))
add(joinPath(p, "/Processors")) add(joinPath(p, "/Processors"))

View File

@@ -10,7 +10,6 @@ func genericProfile() Profile {
ensurePrefetchPolicy(plan, AcquisitionPrefetchPolicy{ ensurePrefetchPolicy(plan, AcquisitionPrefetchPolicy{
IncludeSuffixes: []string{ IncludeSuffixes: []string{
"/Bios", "/Bios",
"/SecureBoot",
"/Processors", "/Processors",
"/Memory", "/Memory",
"/Storage", "/Storage",
@@ -47,7 +46,6 @@ func genericProfile() Profile {
ensureScopedPathPolicy(plan, AcquisitionScopedPathPolicy{ ensureScopedPathPolicy(plan, AcquisitionScopedPathPolicy{
SystemCriticalSuffixes: []string{ SystemCriticalSuffixes: []string{
"/Bios", "/Bios",
"/SecureBoot",
"/Oem/Public", "/Oem/Public",
"/Oem/Public/FRU", "/Oem/Public/FRU",
"/Processors", "/Processors",

View File

@@ -64,8 +64,10 @@ func msiProfile() Profile {
if snapshotHasGPUProcessor(snapshot, discovered.SystemPaths) && snapshotHasPathPrefix(snapshot, "/redfish/v1/Chassis/GPU") { if snapshotHasGPUProcessor(snapshot, discovered.SystemPaths) && snapshotHasPathPrefix(snapshot, "/redfish/v1/Chassis/GPU") {
plan.Directives.EnableProcessorGPUFallback = true plan.Directives.EnableProcessorGPUFallback = true
plan.Directives.EnableMSIProcessorGPUChassisLookup = true plan.Directives.EnableMSIProcessorGPUChassisLookup = true
plan.Directives.EnableMSIGhostGPUFilter = true
addAnalysisLookupMode(plan, "msi-index") addAnalysisLookupMode(plan, "msi-index")
addAnalysisNote(plan, "msi analysis enables processor-gpu fallback from discovered GPU chassis") addAnalysisNote(plan, "msi analysis enables processor-gpu fallback from discovered GPU chassis")
addAnalysisNote(plan, "msi ghost-gpu filter enabled: GPUs with temperature=0 on powered-on host are excluded")
} }
}, },
} }

View File

@@ -103,6 +103,7 @@ type AnalysisDirectives struct {
EnableProcessorGPUChassisAlias bool EnableProcessorGPUChassisAlias bool
EnableGenericGraphicsControllerDedup bool EnableGenericGraphicsControllerDedup bool
EnableMSIProcessorGPUChassisLookup bool EnableMSIProcessorGPUChassisLookup bool
EnableMSIGhostGPUFilter bool
EnableStorageEnclosureRecovery bool EnableStorageEnclosureRecovery bool
EnableKnownStorageControllerRecovery bool EnableKnownStorageControllerRecovery bool
} }

View File

@@ -33,7 +33,7 @@ func ConvertToReanimator(result *models.AnalysisResult) (*ReanimatorExport, erro
// Determine target host (optional field) // Determine target host (optional field)
targetHost := inferTargetHost(result.TargetHost, result.Filename) targetHost := inferTargetHost(result.TargetHost, result.Filename)
collectedAt := formatRFC3339(result.CollectedAt) collectedAt := formatRFC3339(reanimatorCollectedAt(result))
devices := canonicalDevicesForExport(result.Hardware) devices := canonicalDevicesForExport(result.Hardware)
export := &ReanimatorExport{ export := &ReanimatorExport{
@@ -58,6 +58,17 @@ func ConvertToReanimator(result *models.AnalysisResult) (*ReanimatorExport, erro
return export, nil return export, nil
} }
// reanimatorCollectedAt returns the best timestamp for Reanimator export collected_at.
// Prefers InventoryLastModifiedAt when it is set and no older than 30 days; falls back
// to CollectedAt (and ultimately to now via formatRFC3339).
func reanimatorCollectedAt(result *models.AnalysisResult) time.Time {
inv := result.InventoryLastModifiedAt
if !inv.IsZero() && time.Since(inv) <= 30*24*time.Hour {
return inv
}
return result.CollectedAt
}
// formatRFC3339 formats time in RFC3339 format, returns current time if zero // formatRFC3339 formats time in RFC3339 format, returns current time if zero
func formatRFC3339(t time.Time) string { func formatRFC3339(t time.Time) string {
if t.IsZero() { if t.IsZero() {

View File

@@ -14,7 +14,8 @@ type AnalysisResult struct {
Protocol string `json:"protocol,omitempty"` // redfish | ipmi Protocol string `json:"protocol,omitempty"` // redfish | ipmi
TargetHost string `json:"target_host,omitempty"` // BMC host for live collect TargetHost string `json:"target_host,omitempty"` // BMC host for live collect
SourceTimezone string `json:"source_timezone,omitempty"` // Source timezone/offset used during collection (e.g. +08:00) SourceTimezone string `json:"source_timezone,omitempty"` // Source timezone/offset used during collection (e.g. +08:00)
CollectedAt time.Time `json:"collected_at,omitempty"` // Collection/upload timestamp CollectedAt time.Time `json:"collected_at,omitempty"` // Collection/upload timestamp
InventoryLastModifiedAt time.Time `json:"inventory_last_modified_at,omitempty"` // Redfish inventory last modified (InventoryData/Status)
RawPayloads map[string]any `json:"raw_payloads,omitempty"` // Additional source payloads (e.g. Redfish tree) RawPayloads map[string]any `json:"raw_payloads,omitempty"` // Additional source payloads (e.g. Redfish tree)
Events []Event `json:"events"` Events []Event `json:"events"`
FRU []FRUInfo `json:"fru"` FRU []FRUInfo `json:"fru"`

View File

@@ -46,7 +46,10 @@ func (s *Server) handleIndex(w http.ResponseWriter, r *http.Request) {
} }
w.Header().Set("Content-Type", "text/html; charset=utf-8") w.Header().Set("Content-Type", "text/html; charset=utf-8")
tmpl.Execute(w, nil) tmpl.Execute(w, map[string]string{
"AppVersion": s.config.AppVersion,
"AppCommit": s.config.AppCommit,
})
} }
func (s *Server) handleChartCurrent(w http.ResponseWriter, r *http.Request) { func (s *Server) handleChartCurrent(w http.ResponseWriter, r *http.Request) {

View File

@@ -165,7 +165,7 @@
<div class="footer-buttons"> <div class="footer-buttons">
</div> </div>
<div class="footer-info"> <div class="footer-info">
<p>Автор: <a href="https://mchus.pro" target="_blank">mchus.pro</a> | <a href="https://git.mchus.pro/mchus/logpile" target="_blank">Git Repository</a></p> <p>Автор: <a href="https://mchus.pro" target="_blank">mchus.pro</a> | <a href="https://git.mchus.pro/mchus/logpile" target="_blank">Git Repository</a>{{if .AppVersion}} | v{{.AppVersion}}{{end}}</p>
</div> </div>
</footer> </footer>