v1.3.0: Add multiple vendor parsers and enhanced hardware detection
New parsers: - NVIDIA Field Diagnostics parser with dmidecode output support - NVIDIA Bug Report parser with comprehensive hardware extraction - Supermicro crashdump (CDump.txt) parser - Generic fallback parser for unrecognized text files Enhanced GPU parsing (nvidia-bug-report): - Model and manufacturer detection (NVIDIA H100 80GB HBM3) - UUID, Video BIOS version, IRQ information - Bus location (BDF), DMA size/mask, device minor - PCIe bus type details New hardware detection (nvidia-bug-report): - System Information: server S/N, UUID, manufacturer, product name - CPU: model, S/N, cores, threads, frequencies from dmidecode - Memory: P/N, S/N, manufacturer, speed for all DIMMs - Power Supplies: manufacturer, model, S/N, wattage, status - Network Adapters: Ethernet/InfiniBand controllers with VPD data - Model, P/N, S/N from lspci Vital Product Data - Port count/type detection (QSFP56, OSFP, etc.) - Support for ConnectX-6/7 adapters Archive handling improvements: - Plain .gz file support (not just tar.gz) - Increased size limit for plain gzip files (50MB) - Better error handling for mixed archive formats Web interface enhancements: - Display parser name and filename badges - Improved file info section with visual indicators Co-Authored-By: Claude (qwen3-coder:480b) <noreply@anthropic.com>
This commit is contained in:
133
internal/parser/vendors/supermicro/README.md
vendored
Normal file
133
internal/parser/vendors/supermicro/README.md
vendored
Normal file
@@ -0,0 +1,133 @@
|
||||
# SMC Crash Dump Parser
|
||||
|
||||
Парсер для архивов Supermicro (SMC) BMC Crash Dump.
|
||||
|
||||
## Поддерживаемые серверы
|
||||
|
||||
- Supermicro SYS-821GE-TNHR
|
||||
- Другие серверы Supermicro с BMC Crashdump функциональностью
|
||||
|
||||
## Формат архива
|
||||
|
||||
Парсер работает с архивами в формате:
|
||||
- `.tgz` / `.tar.gz` (сжатый tar)
|
||||
- `.tar` (несжатый tar)
|
||||
|
||||
## Распознаваемые файлы
|
||||
|
||||
### Основные файлы
|
||||
|
||||
1. **CDump.txt** - JSON файл с данными crashdump
|
||||
- Metadata (BMC, BIOS, ME версии firmware)
|
||||
- CPU информация (CPUID, количество ядер, microcode версия, PPIN)
|
||||
- MCA (Machine Check Architecture) данные - ошибки процессоров
|
||||
|
||||
## Извлекаемые данные
|
||||
|
||||
### Hardware Configuration
|
||||
|
||||
#### CPUs
|
||||
```json
|
||||
{
|
||||
"slot": "CPU0",
|
||||
"model": "CPUID: 0xc06f2",
|
||||
"cores": 56,
|
||||
"manufacturer": "Intel",
|
||||
"firmware": "Microcode: 0x210002b3"
|
||||
}
|
||||
```
|
||||
|
||||
### FRU Information
|
||||
|
||||
- BMC Firmware Version
|
||||
- BIOS Version
|
||||
- ME Firmware Version
|
||||
- CPU PPIN (Protected Processor Inventory Number)
|
||||
|
||||
### Events
|
||||
|
||||
События создаются для:
|
||||
- **Crashdump collection** - когда был собран crashdump
|
||||
- **MCA Errors** - ошибки Machine Check Architecture
|
||||
- Corrected errors (Warning severity)
|
||||
- Uncorrected errors (Critical severity)
|
||||
|
||||
Уровни severity:
|
||||
- `info` - информационные события (crashdump по запросу)
|
||||
- `warning` - предупреждения (corrected MCA errors, reset detected)
|
||||
- `critical` - критические ошибки (uncorrected MCA errors)
|
||||
|
||||
## Пример использования
|
||||
|
||||
```bash
|
||||
# Запуск веб-интерфейса
|
||||
./logpile --file /path/to/CDump_090859_01302026.tgz
|
||||
|
||||
# Веб-интерфейс будет доступен на http://localhost:8082
|
||||
```
|
||||
|
||||
## Автоопределение
|
||||
|
||||
Парсер автоматически определяет архивы SMC Crash Dump по наличию:
|
||||
- `CDump.txt` с маркерами "crash_data", "METADATA", "bmc_fw_ver"
|
||||
|
||||
Confidence score:
|
||||
- `CDump.txt` с маркерами crashdump: +80
|
||||
|
||||
## Версионирование
|
||||
|
||||
**Текущая версия парсера:** 1.0.0
|
||||
|
||||
При модификации логики парсера необходимо увеличивать версию в константе `parserVersion` в файле `parser.go`.
|
||||
|
||||
## Примеры данных
|
||||
|
||||
### Пример CDump.txt (metadata)
|
||||
```json
|
||||
{
|
||||
"crash_data": {
|
||||
"METADATA": {
|
||||
"cpu0": {
|
||||
"cpuid": "0xc06f2",
|
||||
"core_count": "0x38",
|
||||
"ppin": "0xa3ccbe7d45026592",
|
||||
"ucode_patch_ver": "0x210002b3"
|
||||
},
|
||||
"bmc_fw_ver": "01.03.18",
|
||||
"bios_id": "BIOS Date: 08/04/2025 Rev 2.7",
|
||||
"me_fw_ver": "6.1.4.204",
|
||||
"timestamp": "2026-01-30T09:06:52Z",
|
||||
"trigger_type": "On-Demand"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### MCA Error Detection
|
||||
|
||||
Парсер проверяет регистры MCA status на наличие ошибок:
|
||||
- Bit 63 (Valid) - индикатор валидной ошибки
|
||||
- Bit 61 (UC) - uncorrected error
|
||||
- Bit 60 (EN) - error enabled
|
||||
|
||||
## Известные ограничения
|
||||
|
||||
1. Парсер фокусируется на данных из `CDump.txt`
|
||||
2. Детальный анализ MCA errors пока упрощен (только проверка status регистров)
|
||||
3. TOR dump и другие расширенные данные пока не парсятся
|
||||
|
||||
## Разработка
|
||||
|
||||
### Добавление новых полей
|
||||
|
||||
1. Изучите структуру JSON в CDump.txt
|
||||
2. Добавьте поля в структуры `Metadata`, `CPUMetadata`, или `MCAData`
|
||||
3. Обновите функции парсинга
|
||||
4. Увеличьте версию парсера
|
||||
|
||||
### Расширение MCA анализа
|
||||
|
||||
Для более детального анализа MCA ошибок можно:
|
||||
1. Добавить декодирование MCA error codes
|
||||
2. Парсить MISC и ADDR регистры
|
||||
3. Добавить корреляцию ошибок между банками
|
||||
261
internal/parser/vendors/supermicro/crashdump.go
vendored
Normal file
261
internal/parser/vendors/supermicro/crashdump.go
vendored
Normal file
@@ -0,0 +1,261 @@
|
||||
package supermicro
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
)
|
||||
|
||||
// CrashDumpData represents the structure of CDump.txt
|
||||
type CrashDumpData struct {
|
||||
CrashData struct {
|
||||
METADATA Metadata `json:"METADATA"`
|
||||
PROCESSORS ProcessorsData `json:"PROCESSORS"`
|
||||
} `json:"crash_data"`
|
||||
}
|
||||
|
||||
// ProcessorsData contains processor crash data
|
||||
type ProcessorsData struct {
|
||||
Version string `json:"_version"`
|
||||
CPU0 Processors `json:"cpu0"`
|
||||
CPU1 Processors `json:"cpu1"`
|
||||
}
|
||||
|
||||
// Metadata contains crashdump metadata
|
||||
type Metadata struct {
|
||||
CPU0 CPUMetadata `json:"cpu0"`
|
||||
CPU1 CPUMetadata `json:"cpu1"`
|
||||
BMCFWVer string `json:"bmc_fw_ver"`
|
||||
BIOSId string `json:"bios_id"`
|
||||
MEFWVer string `json:"me_fw_ver"`
|
||||
Timestamp string `json:"timestamp"`
|
||||
TriggerType string `json:"trigger_type"`
|
||||
PlatformName string `json:"platform_name"`
|
||||
CrashdumpVer string `json:"crashdump_ver"`
|
||||
ResetDetected string `json:"_reset_detected"`
|
||||
}
|
||||
|
||||
// CPUMetadata contains CPU metadata
|
||||
type CPUMetadata struct {
|
||||
CPUID string `json:"cpuid"`
|
||||
CoreMask string `json:"core_mask"`
|
||||
CHACount string `json:"cha_count"`
|
||||
CoreCount string `json:"core_count"`
|
||||
PPIN string `json:"ppin"`
|
||||
UcodePatchVer string `json:"ucode_patch_ver"`
|
||||
}
|
||||
|
||||
// Processors contains processor crash data
|
||||
type Processors struct {
|
||||
MCA MCAData `json:"MCA"`
|
||||
}
|
||||
|
||||
// MCAData contains Machine Check Architecture data
|
||||
type MCAData struct {
|
||||
Uncore map[string]interface{} `json:"uncore"`
|
||||
}
|
||||
|
||||
// ParseCrashDump parses CDump.txt file
|
||||
func ParseCrashDump(content []byte, result *models.AnalysisResult) error {
|
||||
var data CrashDumpData
|
||||
if err := json.Unmarshal(content, &data); err != nil {
|
||||
return fmt.Errorf("failed to parse CDump.txt: %w", err)
|
||||
}
|
||||
|
||||
// Initialize Hardware.Firmware slice if nil
|
||||
if result.Hardware.Firmware == nil {
|
||||
result.Hardware.Firmware = make([]models.FirmwareInfo, 0)
|
||||
}
|
||||
|
||||
// Parse metadata
|
||||
parseMetadata(&data.CrashData.METADATA, result)
|
||||
|
||||
// Parse CPU information
|
||||
parseCPUInfo(&data.CrashData.METADATA, result)
|
||||
|
||||
// Parse MCA errors
|
||||
parseMCAErrors(&data.CrashData, result)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// parseMetadata extracts metadata information
|
||||
func parseMetadata(metadata *Metadata, result *models.AnalysisResult) {
|
||||
// Store firmware versions in HardwareConfig.Firmware
|
||||
if metadata.BMCFWVer != "" {
|
||||
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
|
||||
DeviceName: "BMC",
|
||||
Version: metadata.BMCFWVer,
|
||||
})
|
||||
}
|
||||
|
||||
if metadata.BIOSId != "" {
|
||||
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
|
||||
DeviceName: "BIOS",
|
||||
Version: metadata.BIOSId,
|
||||
})
|
||||
}
|
||||
|
||||
if metadata.MEFWVer != "" {
|
||||
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
|
||||
DeviceName: "ME",
|
||||
Version: metadata.MEFWVer,
|
||||
})
|
||||
}
|
||||
|
||||
// Create event for crashdump trigger
|
||||
timestamp := time.Now()
|
||||
if metadata.Timestamp != "" {
|
||||
if t, err := time.Parse(time.RFC3339, metadata.Timestamp); err == nil {
|
||||
timestamp = t
|
||||
}
|
||||
}
|
||||
|
||||
triggerType := metadata.TriggerType
|
||||
if triggerType == "" {
|
||||
triggerType = "Unknown"
|
||||
}
|
||||
|
||||
severity := models.SeverityInfo
|
||||
if metadata.ResetDetected != "" && metadata.ResetDetected != "NONE" {
|
||||
severity = models.SeverityWarning
|
||||
}
|
||||
|
||||
result.Events = append(result.Events, models.Event{
|
||||
Timestamp: timestamp,
|
||||
Source: "Crashdump",
|
||||
EventType: "System Crashdump",
|
||||
Description: fmt.Sprintf("Crashdump collected (%s)", triggerType),
|
||||
Severity: severity,
|
||||
RawData: fmt.Sprintf("Version: %s, Reset: %s", metadata.CrashdumpVer, metadata.ResetDetected),
|
||||
})
|
||||
}
|
||||
|
||||
// parseCPUInfo extracts CPU information
|
||||
func parseCPUInfo(metadata *Metadata, result *models.AnalysisResult) {
|
||||
cpus := []struct {
|
||||
socket int
|
||||
data CPUMetadata
|
||||
}{
|
||||
{0, metadata.CPU0},
|
||||
{1, metadata.CPU1},
|
||||
}
|
||||
|
||||
for _, cpu := range cpus {
|
||||
if cpu.data.CPUID == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
// Parse core count
|
||||
coreCount := 0
|
||||
if cpu.data.CoreCount != "" {
|
||||
if count, err := strconv.ParseInt(strings.TrimPrefix(cpu.data.CoreCount, "0x"), 16, 64); err == nil {
|
||||
coreCount = int(count)
|
||||
}
|
||||
}
|
||||
|
||||
cpuModel := models.CPU{
|
||||
Socket: cpu.socket,
|
||||
Model: fmt.Sprintf("Intel CPU (CPUID: %s)", cpu.data.CPUID),
|
||||
Cores: coreCount,
|
||||
}
|
||||
|
||||
// Add PPIN
|
||||
if cpu.data.PPIN != "" && cpu.data.PPIN != "0x0" {
|
||||
cpuModel.PPIN = cpu.data.PPIN
|
||||
}
|
||||
|
||||
result.Hardware.CPUs = append(result.Hardware.CPUs, cpuModel)
|
||||
|
||||
// Add microcode version to firmware list
|
||||
if cpu.data.UcodePatchVer != "" {
|
||||
result.Hardware.Firmware = append(result.Hardware.Firmware, models.FirmwareInfo{
|
||||
DeviceName: fmt.Sprintf("CPU%d Microcode", cpu.socket),
|
||||
Version: cpu.data.UcodePatchVer,
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// parseMCAErrors extracts Machine Check Architecture errors
|
||||
func parseMCAErrors(crashData *struct {
|
||||
METADATA Metadata `json:"METADATA"`
|
||||
PROCESSORS ProcessorsData `json:"PROCESSORS"`
|
||||
}, result *models.AnalysisResult) {
|
||||
timestamp := time.Now()
|
||||
if crashData.METADATA.Timestamp != "" {
|
||||
if t, err := time.Parse(time.RFC3339, crashData.METADATA.Timestamp); err == nil {
|
||||
timestamp = t
|
||||
}
|
||||
}
|
||||
|
||||
// Parse each CPU's MCA data
|
||||
cpuProcs := []struct {
|
||||
name string
|
||||
data Processors
|
||||
}{
|
||||
{"cpu0", crashData.PROCESSORS.CPU0},
|
||||
{"cpu1", crashData.PROCESSORS.CPU1},
|
||||
}
|
||||
|
||||
for _, cpu := range cpuProcs {
|
||||
if cpu.data.MCA.Uncore == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
// Check each MCA bank for errors
|
||||
for bankName, bankDataRaw := range cpu.data.MCA.Uncore {
|
||||
bankData, ok := bankDataRaw.(map[string]interface{})
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
|
||||
// Look for status register
|
||||
statusKey := strings.ToLower(bankName) + "_status"
|
||||
statusRaw, ok := bankData[statusKey]
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
|
||||
statusStr, ok := statusRaw.(string)
|
||||
if !ok {
|
||||
continue
|
||||
}
|
||||
|
||||
// Parse status value
|
||||
status, err := strconv.ParseUint(strings.TrimPrefix(statusStr, "0x"), 16, 64)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
|
||||
// Check if MCA error is valid (bit 63 = Valid)
|
||||
if status&(1<<63) != 0 {
|
||||
// MCA error detected
|
||||
severity := models.SeverityWarning
|
||||
if status&(1<<61) != 0 { // UC bit = uncorrected error
|
||||
severity = models.SeverityCritical
|
||||
}
|
||||
|
||||
description := fmt.Sprintf("MCA Error in %s bank %s", cpu.name, bankName)
|
||||
if status&(1<<61) != 0 {
|
||||
description += " (Uncorrected)"
|
||||
} else {
|
||||
description += " (Corrected)"
|
||||
}
|
||||
|
||||
result.Events = append(result.Events, models.Event{
|
||||
Timestamp: timestamp,
|
||||
Source: "MCA",
|
||||
EventType: "Machine Check",
|
||||
Description: description,
|
||||
Severity: severity,
|
||||
RawData: fmt.Sprintf("Status: %s, CPU: %s, Bank: %s", statusStr, cpu.name, bankName),
|
||||
})
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
98
internal/parser/vendors/supermicro/parser.go
vendored
Normal file
98
internal/parser/vendors/supermicro/parser.go
vendored
Normal file
@@ -0,0 +1,98 @@
|
||||
// Package supermicro provides parser for Supermicro BMC crashdump archives
|
||||
// Tested with: Supermicro SYS-821GE-TNHR (Crashdump format)
|
||||
//
|
||||
// IMPORTANT: Increment parserVersion when modifying parser logic!
|
||||
// This helps track which version was used to parse specific logs.
|
||||
package supermicro
|
||||
|
||||
import (
|
||||
"strings"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
"git.mchus.pro/mchus/logpile/internal/parser"
|
||||
)
|
||||
|
||||
// parserVersion - version of this parser module
|
||||
// IMPORTANT: Increment this version when making changes to parser logic!
|
||||
const parserVersion = "1.0.0"
|
||||
|
||||
func init() {
|
||||
parser.Register(&Parser{})
|
||||
}
|
||||
|
||||
// Parser implements VendorParser for Supermicro servers
|
||||
type Parser struct{}
|
||||
|
||||
// Name returns human-readable parser name
|
||||
func (p *Parser) Name() string {
|
||||
return "SMC Crash Dump Parser"
|
||||
}
|
||||
|
||||
// Vendor returns vendor identifier
|
||||
func (p *Parser) Vendor() string {
|
||||
return "supermicro"
|
||||
}
|
||||
|
||||
// Version returns parser version
|
||||
// IMPORTANT: Update parserVersion constant when modifying parser logic!
|
||||
func (p *Parser) Version() string {
|
||||
return parserVersion
|
||||
}
|
||||
|
||||
// Detect checks if archive matches Supermicro crashdump format
|
||||
// Returns confidence 0-100
|
||||
func (p *Parser) Detect(files []parser.ExtractedFile) int {
|
||||
confidence := 0
|
||||
|
||||
for _, f := range files {
|
||||
path := strings.ToLower(f.Path)
|
||||
|
||||
// Strong indicator for Supermicro Crashdump format
|
||||
if strings.HasSuffix(path, "cdump.txt") {
|
||||
// Check if it's really Supermicro crashdump format
|
||||
if containsCrashdumpMarkers(f.Content) {
|
||||
confidence += 80
|
||||
}
|
||||
}
|
||||
|
||||
// Cap at 100
|
||||
if confidence >= 100 {
|
||||
return 100
|
||||
}
|
||||
}
|
||||
|
||||
return confidence
|
||||
}
|
||||
|
||||
// containsCrashdumpMarkers checks if content has Supermicro crashdump markers
|
||||
func containsCrashdumpMarkers(content []byte) bool {
|
||||
s := string(content)
|
||||
// Check for typical Supermicro Crashdump structure
|
||||
return strings.Contains(s, "crash_data") &&
|
||||
strings.Contains(s, "METADATA") &&
|
||||
(strings.Contains(s, "bmc_fw_ver") || strings.Contains(s, "crashdump_ver"))
|
||||
}
|
||||
|
||||
// Parse parses Supermicro crashdump archive
|
||||
func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, error) {
|
||||
result := &models.AnalysisResult{
|
||||
Events: make([]models.Event, 0),
|
||||
FRU: make([]models.FRUInfo, 0),
|
||||
Sensors: make([]models.SensorReading, 0),
|
||||
}
|
||||
|
||||
// Initialize hardware config
|
||||
result.Hardware = &models.HardwareConfig{
|
||||
CPUs: make([]models.CPU, 0),
|
||||
}
|
||||
|
||||
// Parse CDump.txt (JSON crashdump)
|
||||
if f := parser.FindFileByName(files, "CDump.txt"); f != nil {
|
||||
if err := ParseCrashDump(f.Content, result); err != nil {
|
||||
// Log error but continue parsing other files
|
||||
_ = err // Ignore error for now
|
||||
}
|
||||
}
|
||||
|
||||
return result, nil
|
||||
}
|
||||
Reference in New Issue
Block a user