v1.3.0: Add multiple vendor parsers and enhanced hardware detection
New parsers: - NVIDIA Field Diagnostics parser with dmidecode output support - NVIDIA Bug Report parser with comprehensive hardware extraction - Supermicro crashdump (CDump.txt) parser - Generic fallback parser for unrecognized text files Enhanced GPU parsing (nvidia-bug-report): - Model and manufacturer detection (NVIDIA H100 80GB HBM3) - UUID, Video BIOS version, IRQ information - Bus location (BDF), DMA size/mask, device minor - PCIe bus type details New hardware detection (nvidia-bug-report): - System Information: server S/N, UUID, manufacturer, product name - CPU: model, S/N, cores, threads, frequencies from dmidecode - Memory: P/N, S/N, manufacturer, speed for all DIMMs - Power Supplies: manufacturer, model, S/N, wattage, status - Network Adapters: Ethernet/InfiniBand controllers with VPD data - Model, P/N, S/N from lspci Vital Product Data - Port count/type detection (QSFP56, OSFP, etc.) - Support for ConnectX-6/7 adapters Archive handling improvements: - Plain .gz file support (not just tar.gz) - Increased size limit for plain gzip files (50MB) - Better error handling for mixed archive formats Web interface enhancements: - Display parser name and filename badges - Improved file info section with visual indicators Co-Authored-By: Claude (qwen3-coder:480b) <noreply@anthropic.com>
This commit is contained in:
175
internal/parser/vendors/nvidia/README.md
vendored
Normal file
175
internal/parser/vendors/nvidia/README.md
vendored
Normal file
@@ -0,0 +1,175 @@
|
||||
# NVIDIA Field Diagnostics Parser
|
||||
|
||||
Парсер для диагностических архивов NVIDIA HGX Field Diagnostics.
|
||||
Универсальный парсер, не привязанный к конкретному производителю серверов.
|
||||
|
||||
## Поддерживаемые архивы
|
||||
|
||||
- NVIDIA HGX Field Diag (работает с любыми серверами: Supermicro, Dell, HPE, и т.д.)
|
||||
- Архивы с результатами GPU диагностики NVIDIA
|
||||
|
||||
## Формат архива
|
||||
|
||||
Парсер работает с архивами в формате:
|
||||
- `.tar` (несжатый tar)
|
||||
- `.tar.gz` (сжатый gzip)
|
||||
|
||||
## Распознаваемые файлы
|
||||
|
||||
### Основные файлы
|
||||
|
||||
1. **output.log** - вывод dmidecode с информацией о системе
|
||||
- Производитель сервера (Manufacturer)
|
||||
- Модель сервера (Product Name) - например, SYS-821GE-TNHR
|
||||
- Серийный номер сервера (Serial Number) - например, A514359X5A07900
|
||||
- UUID, SKU Number, Family
|
||||
|
||||
2. **unified_summary.json** - детальная информация о системе и компонентах
|
||||
- Информация о GPU (модель, производитель, VBIOS, PCI адреса)
|
||||
- Информация о NVSwitch (VendorID, DeviceID, Link speed/width)
|
||||
- Информация о производителе и модели сервера
|
||||
|
||||
3. **summary.json** - результаты тестов диагностики
|
||||
- Результаты тестов GPU (inforom, checkinforom, gpumem, gpustress, pcie, nvlink, nvswitch, power)
|
||||
- Коды ошибок и статусы тестов
|
||||
|
||||
4. **summary.csv** - альтернативный формат результатов тестов
|
||||
|
||||
### Дополнительные файлы
|
||||
|
||||
- `gpu_fieldiag/*.log` - детальные логи диагностики каждого GPU
|
||||
- `inventory/*.json` - дополнительная информация о конфигурации
|
||||
|
||||
## Извлекаемые данные
|
||||
|
||||
### Hardware Configuration
|
||||
|
||||
#### GPUs
|
||||
```json
|
||||
{
|
||||
"slot": "GPUSXM1",
|
||||
"model": "NVIDIA Device 2335",
|
||||
"manufacturer": "NVIDIA Corporation",
|
||||
"firmware": "96.00.D0.00.03",
|
||||
"bdf": "0000:3a:00.0"
|
||||
}
|
||||
```
|
||||
|
||||
#### NVSwitch (как PCIe устройства)
|
||||
```json
|
||||
{
|
||||
"slot": "NVSWITCHNVSWITCH0",
|
||||
"device_class": "NVSwitch",
|
||||
"manufacturer": "NVIDIA Corporation",
|
||||
"vendor_id": 4318,
|
||||
"device_id": 8867,
|
||||
"bdf": "0000:05:00.0",
|
||||
"link_speed": "16GT/s",
|
||||
"link_width": 2
|
||||
}
|
||||
```
|
||||
|
||||
### Events
|
||||
|
||||
События создаются для:
|
||||
- **Предупреждений и ошибок** тестов диагностики
|
||||
- Примеры событий:
|
||||
- `Row remapping failed` - ошибка памяти GPU (Warning)
|
||||
- Различные тесты: connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power
|
||||
|
||||
Уровни severity:
|
||||
- `info` - информационные события (тесты прошли успешно)
|
||||
- `warning` - предупреждения (например, Row remapping failed)
|
||||
- `critical` - критические ошибки (коды ошибок 300+)
|
||||
|
||||
## Пример использования
|
||||
|
||||
```bash
|
||||
# Запуск веб-интерфейса
|
||||
./logpile --file /path/to/A514359X5A07900_logs-20260122-074208.tar
|
||||
|
||||
# Веб-интерфейс будет доступен на http://localhost:8082
|
||||
```
|
||||
|
||||
## Автоопределение
|
||||
|
||||
Парсер автоматически определяет архивы NVIDIA Field Diag по наличию:
|
||||
- `unified_summary.json` с маркером "HGX Field Diag"
|
||||
- `summary.json` и `summary.csv` с результатами тестов
|
||||
- Директории `gpu_fieldiag/`
|
||||
|
||||
Confidence score:
|
||||
- `unified_summary.json` с маркером "HGX Field Diag": +40
|
||||
- `summary.json`: +20
|
||||
- `summary.csv`: +15
|
||||
- `gpu_fieldiag/` directory: +15
|
||||
|
||||
## Версионирование
|
||||
|
||||
**Текущая версия парсера:** 1.1.0
|
||||
|
||||
При модификации логики парсера необходимо увеличивать версию в константе `parserVersion` в файле `parser.go`.
|
||||
|
||||
### История версий
|
||||
|
||||
- **1.1.0** - Добавлен парсинг output.log (dmidecode) для извлечения модели и серийного номера сервера
|
||||
- **1.0.0** - Первоначальная версия с парсингом unified_summary.json и summary.json/csv
|
||||
|
||||
## Примеры данных
|
||||
|
||||
### Пример unified_summary.json
|
||||
```json
|
||||
{
|
||||
"runInfo": {
|
||||
"diagVersion": "24287-XXXX-FLD-42658",
|
||||
"diagName": "HGX Field Diag",
|
||||
"finalResult": "FAIL",
|
||||
"errorCode": 363
|
||||
},
|
||||
"tests": [{
|
||||
"virtualId": "inventory",
|
||||
"components": [{
|
||||
"componentId": "GPUSXM1",
|
||||
"properties": [
|
||||
{"id": "Manufacturer", "value": "Any Server Vendor"},
|
||||
{"id": "VendorID", "value": "10de"},
|
||||
{"id": "DeviceID", "value": "2335"}
|
||||
]
|
||||
}]
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Пример summary.json
|
||||
```json
|
||||
[
|
||||
{
|
||||
"Error Code": "005-000-1-000000000363",
|
||||
"Test": "gpumem",
|
||||
"Component ID": "SXM5_SN_1653925025497",
|
||||
"Notes": "Row remapping failed",
|
||||
"Virtual ID": "gpumem"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Известные ограничения
|
||||
|
||||
1. Парсер фокусируется на данных из `unified_summary.json` и `summary.json`
|
||||
2. Детальные логи из `gpu_fieldiag/*.log` пока не парсятся
|
||||
3. Информация о CPU, памяти и дисках не извлекается (в архиве отсутствует)
|
||||
|
||||
## Разработка
|
||||
|
||||
### Добавление новых полей
|
||||
|
||||
1. Изучите структуру JSON в архиве
|
||||
2. Добавьте поля в структуры `Component` или `Property`
|
||||
3. Обновите функции `parseGPUComponent` или `parseNVSwitchComponent`
|
||||
4. Увеличьте версию парсера
|
||||
|
||||
### Добавление новых типов файлов
|
||||
|
||||
1. Создайте новый файл с парсером (например, `gpu_logs.go`)
|
||||
2. Добавьте парсинг в функцию `Parse()` в `parser.go`
|
||||
3. Обновите документацию
|
||||
68
internal/parser/vendors/nvidia/output_log.go
vendored
Normal file
68
internal/parser/vendors/nvidia/output_log.go
vendored
Normal file
@@ -0,0 +1,68 @@
|
||||
package nvidia
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"strings"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
)
|
||||
|
||||
// ParseOutputLog parses output.log file which contains dmidecode output
|
||||
func ParseOutputLog(content []byte, result *models.AnalysisResult) error {
|
||||
scanner := bufio.NewScanner(strings.NewReader(string(content)))
|
||||
|
||||
inSystemInfo := false
|
||||
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
trimmed := strings.TrimSpace(line)
|
||||
|
||||
// Detect "System Information" section
|
||||
if strings.Contains(trimmed, "System Information") {
|
||||
inSystemInfo = true
|
||||
continue
|
||||
}
|
||||
|
||||
// Exit section when we hit another Handle or empty section
|
||||
if inSystemInfo && strings.HasPrefix(trimmed, "Handle ") {
|
||||
inSystemInfo = false
|
||||
continue
|
||||
}
|
||||
|
||||
// Parse fields in System Information section
|
||||
if inSystemInfo && strings.Contains(line, ":") {
|
||||
parts := strings.SplitN(trimmed, ":", 2)
|
||||
if len(parts) != 2 {
|
||||
continue
|
||||
}
|
||||
|
||||
field := strings.TrimSpace(parts[0])
|
||||
value := strings.TrimSpace(parts[1])
|
||||
|
||||
if value == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
switch field {
|
||||
case "Manufacturer":
|
||||
result.Hardware.BoardInfo.Manufacturer = value
|
||||
case "Product Name":
|
||||
result.Hardware.BoardInfo.ProductName = value
|
||||
case "Serial Number":
|
||||
result.Hardware.BoardInfo.SerialNumber = value
|
||||
case "Version":
|
||||
// Store version in part number if needed
|
||||
if result.Hardware.BoardInfo.PartNumber == "" {
|
||||
result.Hardware.BoardInfo.PartNumber = value
|
||||
}
|
||||
case "UUID":
|
||||
// Store UUID somewhere if needed (we don't have a field for it yet)
|
||||
// Could add to FRU or as a custom field
|
||||
case "Family":
|
||||
// Could store family info if needed
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return scanner.Err()
|
||||
}
|
||||
166
internal/parser/vendors/nvidia/parser.go
vendored
Normal file
166
internal/parser/vendors/nvidia/parser.go
vendored
Normal file
@@ -0,0 +1,166 @@
|
||||
// Package nvidia provides parser for NVIDIA Field Diagnostics archives
|
||||
// Tested with: HGX Field Diag (works with various server vendors)
|
||||
//
|
||||
// IMPORTANT: Increment parserVersion when modifying parser logic!
|
||||
// This helps track which version was used to parse specific logs.
|
||||
package nvidia
|
||||
|
||||
import (
|
||||
"strings"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
"git.mchus.pro/mchus/logpile/internal/parser"
|
||||
)
|
||||
|
||||
// parserVersion - version of this parser module
|
||||
// IMPORTANT: Increment this version when making changes to parser logic!
|
||||
const parserVersion = "1.1.0"
|
||||
|
||||
func init() {
|
||||
parser.Register(&Parser{})
|
||||
}
|
||||
|
||||
// Parser implements VendorParser for NVIDIA Field Diagnostics
|
||||
type Parser struct{}
|
||||
|
||||
// Name returns human-readable parser name
|
||||
func (p *Parser) Name() string {
|
||||
return "NVIDIA Field Diagnostics Parser"
|
||||
}
|
||||
|
||||
// Vendor returns vendor identifier
|
||||
func (p *Parser) Vendor() string {
|
||||
return "nvidia"
|
||||
}
|
||||
|
||||
// Version returns parser version
|
||||
// IMPORTANT: Update parserVersion constant when modifying parser logic!
|
||||
func (p *Parser) Version() string {
|
||||
return parserVersion
|
||||
}
|
||||
|
||||
// Detect checks if archive matches NVIDIA Field Diagnostics format
|
||||
// Returns confidence 0-100
|
||||
func (p *Parser) Detect(files []parser.ExtractedFile) int {
|
||||
confidence := 0
|
||||
|
||||
for _, f := range files {
|
||||
path := strings.ToLower(f.Path)
|
||||
|
||||
// Strong indicators for NVIDIA Field Diagnostics format
|
||||
if strings.HasSuffix(path, "unified_summary.json") {
|
||||
// Check if it's really NVIDIA Field Diag format
|
||||
if containsNvidiaFieldDiagMarkers(f.Content) {
|
||||
confidence += 40
|
||||
}
|
||||
}
|
||||
|
||||
if strings.HasSuffix(path, "summary.json") && !strings.Contains(path, "unified_") {
|
||||
confidence += 20
|
||||
}
|
||||
|
||||
if strings.HasSuffix(path, "summary.csv") {
|
||||
confidence += 15
|
||||
}
|
||||
|
||||
if strings.Contains(path, "gpu_fieldiag/") {
|
||||
confidence += 15
|
||||
}
|
||||
|
||||
if strings.HasSuffix(path, "output.log") {
|
||||
// Check if it contains dmidecode output
|
||||
if strings.Contains(string(f.Content), "dmidecode") ||
|
||||
strings.Contains(string(f.Content), "System Information") {
|
||||
confidence += 10
|
||||
}
|
||||
}
|
||||
|
||||
// Cap at 100
|
||||
if confidence >= 100 {
|
||||
return 100
|
||||
}
|
||||
}
|
||||
|
||||
return confidence
|
||||
}
|
||||
|
||||
// containsNvidiaFieldDiagMarkers checks if content has NVIDIA Field Diag markers
|
||||
func containsNvidiaFieldDiagMarkers(content []byte) bool {
|
||||
s := string(content)
|
||||
// Check for typical NVIDIA Field Diagnostics structure
|
||||
return strings.Contains(s, "runInfo") &&
|
||||
strings.Contains(s, "diagVersion") &&
|
||||
strings.Contains(s, "HGX Field Diag")
|
||||
}
|
||||
|
||||
// Parse parses NVIDIA Field Diagnostics archive
|
||||
func (p *Parser) Parse(files []parser.ExtractedFile) (*models.AnalysisResult, error) {
|
||||
result := &models.AnalysisResult{
|
||||
Events: make([]models.Event, 0),
|
||||
FRU: make([]models.FRUInfo, 0),
|
||||
Sensors: make([]models.SensorReading, 0),
|
||||
}
|
||||
|
||||
// Initialize hardware config
|
||||
result.Hardware = &models.HardwareConfig{
|
||||
GPUs: make([]models.GPU, 0),
|
||||
}
|
||||
|
||||
// Parse output.log first (contains dmidecode system info)
|
||||
// Find the output.log file that contains dmidecode output
|
||||
outputLogFile := findDmidecodeOutputLog(files)
|
||||
if outputLogFile != nil {
|
||||
if err := ParseOutputLog(outputLogFile.Content, result); err != nil {
|
||||
// Log error but continue parsing other files
|
||||
_ = err // Ignore error for now
|
||||
}
|
||||
}
|
||||
|
||||
// Parse unified_summary.json (contains detailed component info)
|
||||
if f := parser.FindFileByName(files, "unified_summary.json"); f != nil {
|
||||
if err := ParseUnifiedSummary(f.Content, result); err != nil {
|
||||
// Log error but continue parsing other files
|
||||
_ = err // Ignore error for now
|
||||
}
|
||||
}
|
||||
|
||||
// Parse summary.json (test results summary)
|
||||
if f := parser.FindFileByName(files, "summary.json"); f != nil {
|
||||
events := ParseSummaryJSON(f.Content)
|
||||
result.Events = append(result.Events, events...)
|
||||
}
|
||||
|
||||
// Parse summary.csv (alternative format)
|
||||
if f := parser.FindFileByName(files, "summary.csv"); f != nil {
|
||||
csvEvents := ParseSummaryCSV(f.Content)
|
||||
result.Events = append(result.Events, csvEvents...)
|
||||
}
|
||||
|
||||
// Parse GPU field diagnostics logs
|
||||
gpuFieldiagFiles := parser.FindFileByPattern(files, "gpu_fieldiag/", ".log")
|
||||
for _, f := range gpuFieldiagFiles {
|
||||
// Parse individual GPU diagnostic logs if needed
|
||||
// For now, we focus on summary files
|
||||
_ = f
|
||||
}
|
||||
|
||||
return result, nil
|
||||
}
|
||||
|
||||
// findDmidecodeOutputLog finds the output.log file that contains dmidecode output
|
||||
func findDmidecodeOutputLog(files []parser.ExtractedFile) *parser.ExtractedFile {
|
||||
for _, f := range files {
|
||||
// Look for output.log files
|
||||
if !strings.HasSuffix(strings.ToLower(f.Path), "output.log") {
|
||||
continue
|
||||
}
|
||||
|
||||
// Check if it contains dmidecode output
|
||||
content := string(f.Content)
|
||||
if strings.Contains(content, "dmidecode") &&
|
||||
strings.Contains(content, "System Information") {
|
||||
return &f
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
152
internal/parser/vendors/nvidia/summary.go
vendored
Normal file
152
internal/parser/vendors/nvidia/summary.go
vendored
Normal file
@@ -0,0 +1,152 @@
|
||||
package nvidia
|
||||
|
||||
import (
|
||||
"encoding/csv"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
)
|
||||
|
||||
// SummaryEntry represents a single test result entry
|
||||
type SummaryEntry struct {
|
||||
ErrorCode string `json:"Error Code"`
|
||||
Test string `json:"Test"`
|
||||
ComponentID string `json:"Component ID"`
|
||||
Notes string `json:"Notes"`
|
||||
VirtualID string `json:"Virtual ID"`
|
||||
IgnoreError string `json:"Ignore Error"`
|
||||
}
|
||||
|
||||
// ParseSummaryJSON parses summary.json file and returns events
|
||||
func ParseSummaryJSON(content []byte) []models.Event {
|
||||
var entries []SummaryEntry
|
||||
if err := json.Unmarshal(content, &entries); err != nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
events := make([]models.Event, 0)
|
||||
timestamp := time.Now() // Use current time as we don't have exact timestamps in summary
|
||||
|
||||
for _, entry := range entries {
|
||||
// Only create events for failures or warnings
|
||||
if entry.Notes != "OK" || entry.ErrorCode != "001-000-1-000000000000" {
|
||||
event := models.Event{
|
||||
Timestamp: timestamp,
|
||||
Source: "GPU Field Diagnostics",
|
||||
EventType: entry.Test,
|
||||
Description: formatSummaryDescription(entry),
|
||||
Severity: getSeverityFromErrorCode(entry.ErrorCode, entry.Notes),
|
||||
RawData: fmt.Sprintf("Test: %s, Component: %s, Error: %s", entry.Test, entry.ComponentID, entry.ErrorCode),
|
||||
}
|
||||
events = append(events, event)
|
||||
}
|
||||
}
|
||||
|
||||
return events
|
||||
}
|
||||
|
||||
// ParseSummaryCSV parses summary.csv file and returns events
|
||||
func ParseSummaryCSV(content []byte) []models.Event {
|
||||
reader := csv.NewReader(strings.NewReader(string(content)))
|
||||
records, err := reader.ReadAll()
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
events := make([]models.Event, 0)
|
||||
timestamp := time.Now()
|
||||
|
||||
// Skip header row
|
||||
for i, record := range records {
|
||||
if i == 0 {
|
||||
continue // Skip header
|
||||
}
|
||||
|
||||
// CSV format: ErrorCode,Test,VirtualID,SubTest,Type,ComponentID,Notes,Level,,,IgnoreError
|
||||
if len(record) < 7 {
|
||||
continue
|
||||
}
|
||||
|
||||
errorCode := record[0]
|
||||
test := record[1]
|
||||
componentID := record[5]
|
||||
notes := record[6]
|
||||
|
||||
// Only create events for failures or warnings
|
||||
if notes != "OK" || (errorCode != "0" && !strings.HasPrefix(errorCode, "048-000-0") && !strings.HasPrefix(errorCode, "001-000-1")) {
|
||||
event := models.Event{
|
||||
Timestamp: timestamp,
|
||||
Source: "GPU Field Diagnostics",
|
||||
EventType: test,
|
||||
Description: formatCSVDescription(test, componentID, notes, errorCode),
|
||||
Severity: getSeverityFromErrorCode(errorCode, notes),
|
||||
RawData: fmt.Sprintf("Test: %s, Component: %s, Error: %s", test, componentID, errorCode),
|
||||
}
|
||||
events = append(events, event)
|
||||
}
|
||||
}
|
||||
|
||||
return events
|
||||
}
|
||||
|
||||
// formatSummaryDescription creates a human-readable description from summary entry
|
||||
func formatSummaryDescription(entry SummaryEntry) string {
|
||||
component := entry.ComponentID
|
||||
if component == "" {
|
||||
component = entry.VirtualID
|
||||
}
|
||||
|
||||
if entry.Notes == "OK" {
|
||||
return fmt.Sprintf("%s test passed for %s", entry.Test, component)
|
||||
}
|
||||
|
||||
return fmt.Sprintf("%s test failed for %s: %s (Error: %s)", entry.Test, component, entry.Notes, entry.ErrorCode)
|
||||
}
|
||||
|
||||
// formatCSVDescription creates a human-readable description from CSV record
|
||||
func formatCSVDescription(test, component, notes, errorCode string) string {
|
||||
if notes == "OK" {
|
||||
return fmt.Sprintf("%s test passed for %s", test, component)
|
||||
}
|
||||
|
||||
return fmt.Sprintf("%s test failed for %s: %s (Error: %s)", test, component, notes, errorCode)
|
||||
}
|
||||
|
||||
// getSeverityFromErrorCode determines severity based on error code and notes
|
||||
func getSeverityFromErrorCode(errorCode, notes string) models.Severity {
|
||||
// Parse error code format: XXX-YYY-Z-ZZZZZZZZZZZZ
|
||||
// First digit indicates severity in some cases
|
||||
|
||||
if notes == "OK" {
|
||||
return models.SeverityInfo
|
||||
}
|
||||
|
||||
// Row remapping failed is a warning
|
||||
if strings.Contains(notes, "Row remapping failed") {
|
||||
return models.SeverityWarning
|
||||
}
|
||||
|
||||
// Check error code
|
||||
if errorCode == "" || errorCode == "0" {
|
||||
return models.SeverityInfo
|
||||
}
|
||||
|
||||
// Codes starting with 0 are typically informational
|
||||
if strings.HasPrefix(errorCode, "001-000-1") || strings.HasPrefix(errorCode, "048-000-0") {
|
||||
return models.SeverityInfo
|
||||
}
|
||||
|
||||
// Non-zero error codes are typically warnings or errors
|
||||
// If code is in 300+ range, it's likely an error
|
||||
if len(errorCode) > 2 {
|
||||
firstDigits := errorCode[:3]
|
||||
if firstDigits >= "300" {
|
||||
return models.SeverityCritical
|
||||
}
|
||||
}
|
||||
|
||||
return models.SeverityWarning
|
||||
}
|
||||
281
internal/parser/vendors/nvidia/unified_summary.go
vendored
Normal file
281
internal/parser/vendors/nvidia/unified_summary.go
vendored
Normal file
@@ -0,0 +1,281 @@
|
||||
package nvidia
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
"git.mchus.pro/mchus/logpile/internal/models"
|
||||
)
|
||||
|
||||
// UnifiedSummaryData represents the structure of unified_summary.json
|
||||
type UnifiedSummaryData struct {
|
||||
RunInfo RunInfo `json:"runInfo"`
|
||||
Tests []Test `json:"tests"`
|
||||
}
|
||||
|
||||
// RunInfo contains information about the diagnostic run
|
||||
type RunInfo struct {
|
||||
TimeInfo struct {
|
||||
StartTime string `json:"startTime"`
|
||||
EndTime string `json:"endTime"`
|
||||
TotalDuration string `json:"totalDuration"`
|
||||
} `json:"timeInfo"`
|
||||
DiagVersion string `json:"diagVersion"`
|
||||
BaseVersion string `json:"baseVersion"`
|
||||
FinalResult string `json:"finalResult"`
|
||||
ErrorCode int `json:"errorCode"`
|
||||
DiagName string `json:"diagName"`
|
||||
RunLevel string `json:"runLevel"`
|
||||
}
|
||||
|
||||
// Test represents a diagnostic test
|
||||
type Test struct {
|
||||
VirtualID string `json:"virtualId"`
|
||||
Action string `json:"action"`
|
||||
StartTime string `json:"startTime"`
|
||||
EndTime string `json:"endTime"`
|
||||
Components []Component `json:"components"`
|
||||
}
|
||||
|
||||
// Component represents a hardware component
|
||||
type Component struct {
|
||||
ComponentID string `json:"componentId"`
|
||||
ErrorCode string `json:"errorCode"`
|
||||
Notes string `json:"notes"`
|
||||
Result string `json:"result"`
|
||||
Properties []Property `json:"properties"`
|
||||
}
|
||||
|
||||
// Property represents a component property
|
||||
type Property struct {
|
||||
ID string `json:"id"`
|
||||
Value interface{} `json:"value"` // Can be string or number
|
||||
}
|
||||
|
||||
// GetValueAsString returns the value as a string
|
||||
func (p *Property) GetValueAsString() string {
|
||||
switch v := p.Value.(type) {
|
||||
case string:
|
||||
return v
|
||||
case float64:
|
||||
return fmt.Sprintf("%.0f", v)
|
||||
case int:
|
||||
return fmt.Sprintf("%d", v)
|
||||
default:
|
||||
return fmt.Sprintf("%v", v)
|
||||
}
|
||||
}
|
||||
|
||||
// ParseUnifiedSummary parses unified_summary.json file
|
||||
func ParseUnifiedSummary(content []byte, result *models.AnalysisResult) error {
|
||||
var data UnifiedSummaryData
|
||||
if err := json.Unmarshal(content, &data); err != nil {
|
||||
return fmt.Errorf("failed to parse unified_summary.json: %w", err)
|
||||
}
|
||||
|
||||
// Set default board info only if not already set (from output.log)
|
||||
if result.Hardware.BoardInfo.ProductName == "" {
|
||||
result.Hardware.BoardInfo.ProductName = "GPU Server (Field Diag)"
|
||||
}
|
||||
|
||||
// Parse inventory test for hardware details
|
||||
for _, test := range data.Tests {
|
||||
if test.VirtualID == "inventory" || test.Action == "inventory" {
|
||||
parseInventoryComponents(test.Components, result)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// parseInventoryComponents extracts hardware info from inventory test
|
||||
func parseInventoryComponents(components []Component, result *models.AnalysisResult) {
|
||||
for _, comp := range components {
|
||||
// Parse system/board information
|
||||
if parseSystemInfo(comp, result) {
|
||||
// System info was found and parsed
|
||||
continue
|
||||
}
|
||||
|
||||
// Parse GPU components
|
||||
if strings.HasPrefix(comp.ComponentID, "GPUSXM") {
|
||||
gpu := parseGPUComponent(comp)
|
||||
if gpu != nil {
|
||||
result.Hardware.GPUs = append(result.Hardware.GPUs, *gpu)
|
||||
}
|
||||
}
|
||||
|
||||
// Parse NVSwitch components
|
||||
if strings.HasPrefix(comp.ComponentID, "NVSWITCHNVSWITCH") {
|
||||
nvswitch := parseNVSwitchComponent(comp)
|
||||
if nvswitch != nil {
|
||||
// Add as PCIe device for now
|
||||
result.Hardware.PCIeDevices = append(result.Hardware.PCIeDevices, *nvswitch)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// parseSystemInfo extracts system/board information from a component
|
||||
// Returns true if this component contains system info
|
||||
func parseSystemInfo(comp Component, result *models.AnalysisResult) bool {
|
||||
compID := strings.ToUpper(comp.ComponentID)
|
||||
|
||||
// Check if this is a system/board component
|
||||
isSystemComponent := strings.Contains(compID, "BASEBOARD") ||
|
||||
strings.Contains(compID, "SYSTEM") ||
|
||||
strings.Contains(compID, "MOTHERBOARD") ||
|
||||
strings.Contains(compID, "BOARD") ||
|
||||
comp.ComponentID == "Inventory"
|
||||
|
||||
if !isSystemComponent {
|
||||
return false
|
||||
}
|
||||
|
||||
// Extract system properties
|
||||
for _, prop := range comp.Properties {
|
||||
propID := prop.ID
|
||||
value := prop.GetValueAsString()
|
||||
|
||||
if value == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
switch propID {
|
||||
case "Manufacturer", "BoardManufacturer", "SystemManufacturer":
|
||||
// Only set if not already populated (e.g., from output.log)
|
||||
if result.Hardware.BoardInfo.Manufacturer == "" {
|
||||
result.Hardware.BoardInfo.Manufacturer = value
|
||||
}
|
||||
case "ProductName", "Product", "Model", "ModelName", "BoardProduct", "SystemProduct":
|
||||
// Don't overwrite real data from output.log with generic data
|
||||
// Only set if empty or still has the default placeholder value
|
||||
if result.Hardware.BoardInfo.ProductName == "" ||
|
||||
result.Hardware.BoardInfo.ProductName == "GPU Server (Field Diag)" {
|
||||
result.Hardware.BoardInfo.ProductName = value
|
||||
}
|
||||
case "SerialNumber", "Serial", "BoardSerial", "SystemSerial":
|
||||
// Only set if not already populated (e.g., from output.log)
|
||||
if result.Hardware.BoardInfo.SerialNumber == "" {
|
||||
result.Hardware.BoardInfo.SerialNumber = value
|
||||
}
|
||||
case "PartNumber", "BoardPartNumber":
|
||||
// Only set if not already populated
|
||||
if result.Hardware.BoardInfo.PartNumber == "" {
|
||||
result.Hardware.BoardInfo.PartNumber = value
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return true
|
||||
}
|
||||
|
||||
// parseGPUComponent parses GPU component information
|
||||
func parseGPUComponent(comp Component) *models.GPU {
|
||||
gpu := &models.GPU{
|
||||
Slot: comp.ComponentID, // e.g., "GPUSXM1"
|
||||
}
|
||||
|
||||
var deviceID, vbios, pciID string
|
||||
|
||||
for _, prop := range comp.Properties {
|
||||
switch prop.ID {
|
||||
case "DeviceID":
|
||||
deviceID = prop.GetValueAsString()
|
||||
case "Vendor":
|
||||
gpu.Manufacturer = prop.GetValueAsString()
|
||||
case "DeviceName":
|
||||
gpu.Model = prop.GetValueAsString()
|
||||
case "VBIOS_version":
|
||||
vbios = prop.GetValueAsString()
|
||||
case "PCIID":
|
||||
pciID = prop.GetValueAsString()
|
||||
}
|
||||
}
|
||||
|
||||
// Build model string from vendor/device IDs
|
||||
if gpu.Model == "" || strings.Contains(gpu.Model, "Device") {
|
||||
if deviceID != "" {
|
||||
gpu.Model = fmt.Sprintf("NVIDIA Device %s", strings.ToUpper(deviceID))
|
||||
}
|
||||
}
|
||||
|
||||
// Add firmware info
|
||||
if vbios != "" {
|
||||
gpu.Firmware = vbios
|
||||
}
|
||||
|
||||
// Add PCI info
|
||||
if pciID != "" {
|
||||
gpu.BDF = pciID
|
||||
}
|
||||
|
||||
return gpu
|
||||
}
|
||||
|
||||
// parseNVSwitchComponent parses NVSwitch component information
|
||||
func parseNVSwitchComponent(comp Component) *models.PCIeDevice {
|
||||
device := &models.PCIeDevice{
|
||||
Slot: comp.ComponentID, // e.g., "NVSWITCHNVSWITCH0"
|
||||
}
|
||||
|
||||
var vendorIDStr, deviceIDStr, vbios, pciID string
|
||||
var pciSpeedStr, pciWidthStr string
|
||||
var vendor string
|
||||
|
||||
for _, prop := range comp.Properties {
|
||||
switch prop.ID {
|
||||
case "VendorID":
|
||||
vendorIDStr = prop.GetValueAsString()
|
||||
case "DeviceID":
|
||||
deviceIDStr = prop.GetValueAsString()
|
||||
case "Vendor":
|
||||
vendor = prop.GetValueAsString()
|
||||
case "VBIOS_version":
|
||||
vbios = prop.GetValueAsString()
|
||||
case "InfoROM_version":
|
||||
// Store in part number field as we don't have a better place
|
||||
case "PCIID":
|
||||
pciID = prop.GetValueAsString()
|
||||
device.BDF = pciID
|
||||
case "PCISpeed":
|
||||
pciSpeedStr = prop.GetValueAsString()
|
||||
device.LinkSpeed = pciSpeedStr
|
||||
device.MaxLinkSpeed = pciSpeedStr
|
||||
case "PCIWidth":
|
||||
pciWidthStr = prop.GetValueAsString()
|
||||
}
|
||||
}
|
||||
|
||||
// Parse vendor ID
|
||||
if vendorIDStr != "" {
|
||||
fmt.Sscanf(vendorIDStr, "%x", &device.VendorID)
|
||||
}
|
||||
|
||||
// Parse device ID
|
||||
if deviceIDStr != "" {
|
||||
fmt.Sscanf(deviceIDStr, "%x", &device.DeviceID)
|
||||
}
|
||||
|
||||
// Set manufacturer
|
||||
if vendor != "" {
|
||||
device.Manufacturer = vendor
|
||||
}
|
||||
|
||||
// Set device class
|
||||
device.DeviceClass = "NVSwitch"
|
||||
|
||||
// Parse link width
|
||||
if pciWidthStr != "" {
|
||||
fmt.Sscanf(pciWidthStr, "x%d", &device.LinkWidth)
|
||||
device.MaxLinkWidth = device.LinkWidth
|
||||
}
|
||||
|
||||
// Store part number (use for firmware version)
|
||||
if vbios != "" {
|
||||
device.PartNumber = vbios
|
||||
}
|
||||
|
||||
return device
|
||||
}
|
||||
Reference in New Issue
Block a user