docs: introduce project Bible and consolidate all architecture documentation
- Create docs/bible/ with 10 structured chapters (overview, architecture, API, data models, collectors, parsers, exporters, build, testing, decisions) - All documentation in English per ADL-007 - Record all existing architectural decisions in docs/bible/10-decisions.md - Slim README.md to user-facing quick start only - Replace CLAUDE.md with a single directive to read and follow the Bible - Remove absorbed files: REANIMATOR_EXPORT.md, docs/INTEGRATION_GUIDE.md, and all vendor parser README.md files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
153
CLAUDE.md
153
CLAUDE.md
@@ -1,150 +1,11 @@
|
||||
# LOGPile - Engineering Notes (for Claude/Codex)
|
||||
# LOGPile — Instructions for Claude
|
||||
|
||||
## Project summary
|
||||
Read and follow the project Bible before making any changes:
|
||||
|
||||
LOGPile is a standalone Go app for BMC diagnostics analysis with embedded web UI.
|
||||
**[`docs/bible/README.md`](docs/bible/README.md)**
|
||||
|
||||
Current product modes:
|
||||
1. Upload and parse vendor archives / JSON snapshots.
|
||||
2. Collect live data via Redfish and analyze/export it.
|
||||
The Bible is the single source of truth for architecture, data models, API contracts,
|
||||
parsers, exporters, build process, and testing expectations.
|
||||
|
||||
## Runtime architecture
|
||||
|
||||
- Go + `net/http` (`http.ServeMux`)
|
||||
- Embedded UI (`web/embed.go`, `//go:embed templates static`)
|
||||
- In-memory state (`Server.result`, `Server.detectedVendor`)
|
||||
- Job manager for live collect status/logs
|
||||
|
||||
Default port: `8082`.
|
||||
|
||||
## Key flows
|
||||
|
||||
### Upload flow (`POST /api/upload`)
|
||||
- Accepts multipart file field `archive`.
|
||||
- If file looks like JSON, parsed as `models.AnalysisResult` snapshot.
|
||||
- Otherwise passed to archive parser (`parser.NewBMCParser().ParseFromReader(...)`).
|
||||
- Result stored in memory and exposed by API/UI.
|
||||
|
||||
### Live flow (`POST /api/collect`)
|
||||
- Validates request (`host/protocol/port/username/auth_type/tls_mode`).
|
||||
- Runs collector asynchronously with progress callback.
|
||||
- On success:
|
||||
- source metadata set (`source_type=api`, protocol/host/date),
|
||||
- result becomes current in-memory dataset.
|
||||
- On failed/canceled previous dataset stays unchanged.
|
||||
|
||||
## Collectors
|
||||
|
||||
Registry: `internal/collector/registry.go`
|
||||
|
||||
- `redfish` (real collector):
|
||||
- dynamic discovery of Systems/Chassis/Managers,
|
||||
- CPU/RAM/Storage/GPU/PSU/NIC/PCIe/Firmware mapping,
|
||||
- raw Redfish snapshot (`result.RawPayloads["redfish_tree"]`) for offline future analysis,
|
||||
- progress logs include active collection stage and snapshot progress.
|
||||
- `ipmi` is currently a mock collector scaffold.
|
||||
|
||||
## Inspur/Kaytus parser notes
|
||||
|
||||
- Base hardware inventory comes from `asset.json` + `component.log` + `devicefrusdr.log`.
|
||||
- Additional runtime enrichment is applied from `redis-dump.rdb` (if present):
|
||||
- GPU serial/firmware/UUID and selected runtime metrics;
|
||||
- NIC firmware/serial/part fields where text logs are incomplete.
|
||||
- GPU/NIC enrichment from Redis is conservative (fills missing fields, avoids unsafe remapping).
|
||||
|
||||
### External PCI IDs lookup (no hardcoded model mapping)
|
||||
|
||||
`internal/parser/vendors/pciids` now loads IDs from a repo file
|
||||
(`internal/parser/vendors/pciids/pci.ids`, embedded at build time) plus optional external overrides.
|
||||
|
||||
Lookup priority:
|
||||
1. embedded `internal/parser/vendors/pciids/pci.ids`,
|
||||
2. `./pci.ids`,
|
||||
3. `/usr/share/hwdata/pci.ids`,
|
||||
4. `/usr/share/misc/pci.ids`,
|
||||
5. `/opt/homebrew/share/pciids/pci.ids`,
|
||||
6. `LOGPILE_PCI_IDS_PATH` (highest priority overrides; supports path list).
|
||||
|
||||
Implication:
|
||||
- for unknown device IDs (e.g. new NVIDIA GPU IDs), model naming can be updated via `pci.ids`
|
||||
without changing parser code.
|
||||
|
||||
## Export behavior
|
||||
|
||||
Endpoints:
|
||||
- `/api/export/csv`
|
||||
- `/api/export/json`
|
||||
- `/api/export/reanimator`
|
||||
|
||||
Filename pattern for all exports:
|
||||
`YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
|
||||
|
||||
Notes:
|
||||
- JSON export contains full `AnalysisResult`, including `raw_payloads`.
|
||||
- **Reanimator export** (`/api/export/reanimator`):
|
||||
- Exports hardware data in Reanimator format for integration with asset tracking systems.
|
||||
- Format specification: `example/docs/INTEGRATION_GUIDE.md`
|
||||
- Requires `hardware.board.serial_number` to be present.
|
||||
- Key features:
|
||||
- Infers CPU manufacturer from model name (Intel/AMD/ARM/Ampere).
|
||||
- Generates PCIe serial numbers if missing: `{board_serial}-PCIE-{slot}`.
|
||||
- Adds status fields (defaults to "OK").
|
||||
- RFC3339 timestamp format.
|
||||
- Includes GPUs and NetworkAdapters as PCIe devices.
|
||||
- Filters out storage devices and PSUs without serial numbers.
|
||||
|
||||
## Canonical device repository (AI memory)
|
||||
|
||||
Single source of truth for hardware inventory is `hardware.devices`.
|
||||
|
||||
Rules:
|
||||
- UI tabs must read hardware records from `hardware.devices`.
|
||||
- Device Inventory tab includes kinds: `pcie`, `storage`, `gpu`, `network`.
|
||||
- Reanimator export must use the same canonical repository (`hardware.devices`).
|
||||
- Any UI vs Reanimator mismatch is a bug.
|
||||
|
||||
Canonical dedupe (applied once in repository builder):
|
||||
1. usable `serial_number`,
|
||||
2. fallback `bdf` (PCI ID in BDF format),
|
||||
3. if both are absent, keep records distinct (no forced merge).
|
||||
|
||||
Implementation guidance:
|
||||
- Keep `hardware.devices` schema as close as possible to Reanimator JSON fields.
|
||||
- Exporter should mainly group/filter canonical records by section, not rebuild data from multiple sources.
|
||||
- New hardware attributes must be added to canonical device schema first, then mapped to Reanimator/UI.
|
||||
|
||||
## CLI flags (`cmd/logpile/main.go`)
|
||||
|
||||
- `--port`
|
||||
- `--file` (reserved/preload, not active workflow)
|
||||
- `--version`
|
||||
- `--no-browser`
|
||||
- `--hold-on-crash` (default true on Windows) — keeps console open on fatal crash for debugging.
|
||||
|
||||
## Build / release
|
||||
|
||||
- `make build` -> single local binary (`CGO_ENABLED=0`).
|
||||
- `make build-all` -> cross-platform binaries.
|
||||
- Tags/releases are published with `tea`.
|
||||
- Release notes live in `docs/releases/<tag>.md`.
|
||||
|
||||
## Testing expectations
|
||||
|
||||
Before merge:
|
||||
|
||||
```bash
|
||||
go test ./...
|
||||
```
|
||||
|
||||
If touching collectors/handlers, prefer adding or updating tests in:
|
||||
- `internal/collector/*_test.go`
|
||||
- `internal/server/*_test.go`
|
||||
|
||||
## Practical coding guidance
|
||||
|
||||
- Keep API contracts stable with frontend (`web/static/js/app.js`).
|
||||
- When adding Redfish mappings, prefer tolerant/fallback parsing:
|
||||
- alternate collection paths,
|
||||
- `@odata.id` references and embedded members,
|
||||
- deduping by serial/BDF/slot+model.
|
||||
- Avoid breaking snapshot backward compatibility (`AnalysisResult` JSON shape).
|
||||
Every significant architectural decision must be recorded in
|
||||
[`docs/bible/10-decisions.md`](docs/bible/10-decisions.md).
|
||||
|
||||
188
README.md
188
README.md
@@ -4,102 +4,16 @@ LOGPile — standalone Go-приложение для анализа диагн
|
||||
|
||||
Поддерживает два сценария:
|
||||
1. Загрузка архивов/снапшотов и оффлайн-анализ в веб-интерфейсе.
|
||||
2. Live-сбор через Redfish API с последующим экспортом и повторной загрузкой оффлайн.
|
||||
|
||||
## Что умеет
|
||||
|
||||
- Standalone бинарник с embedded UI (без внешних статических файлов).
|
||||
- Парсинг vendor-архивов (Supermicro, Inspur/Kaytus, NVIDIA, fallback generic).
|
||||
- Live-сбор по Redfish (`/api/collect`) с прогрессом и журналом шагов.
|
||||
- Расширенный Redfish snapshot:
|
||||
- нормализованные данные (CPU/RAM/Storage/GPU/PSU/NIC/PCIe/Firmware),
|
||||
- сырой `redfish_tree` для будущего анализа.
|
||||
- Загрузка JSON snapshot обратно через `/api/upload` для оффлайн-работы.
|
||||
- Экспорт в CSV / JSON.
|
||||
|
||||
## Дополнительные источники данных (Inspur/Kaytus)
|
||||
|
||||
Для архивов Inspur/Kaytus парсер использует не только `asset.json` и `component.log`,
|
||||
но и runtime-снимок `onekeylog/runningdata/redis-dump.rdb` (если файл присутствует).
|
||||
|
||||
Что это даёт:
|
||||
- обогащение GPU: `serial_number`, `firmware` (VBIOS/FW), часть runtime telemetry;
|
||||
- обогащение NIC: firmware/serial/part-number (когда в текстовых логах поля пустые).
|
||||
|
||||
## Внешний PCI IDs (без хардкода моделей)
|
||||
|
||||
Источник PCI IDs в проекте: официальный репозиторий
|
||||
[`pciutils/pciids`](https://github.com/pciutils/pciids), подключён как git submodule:
|
||||
`third_party/pciids`.
|
||||
Локальная копия для встроенного lookup хранится в:
|
||||
`internal/parser/vendors/pciids/pci.ids`.
|
||||
|
||||
Обновление локальной копии:
|
||||
|
||||
```bash
|
||||
make update-pci-ids
|
||||
```
|
||||
|
||||
Команда запускает `scripts/update-pci-ids.sh`, который скачивает актуальный
|
||||
`pci.ids` из submodule (`git submodule update --init --remote third_party/pciids`)
|
||||
и синхронизирует его в `internal/parser/vendors/pciids/pci.ids`.
|
||||
|
||||
Автообновление при сборке:
|
||||
- `make build` и `make build-all` запускают `scripts/update-pci-ids.sh --best-effort`;
|
||||
- если submodule уже инициализирован, `pci.ids` синхронизируется перед сборкой;
|
||||
- если submodule не инициализирован/недоступен, используется текущая копия файла,
|
||||
сборка не прерывается.
|
||||
|
||||
Отключить автообновление при сборке:
|
||||
|
||||
```bash
|
||||
SKIP_PCI_IDS_UPDATE=1 make build
|
||||
```
|
||||
|
||||
Если репозиторий клонирован без submodule:
|
||||
|
||||
```bash
|
||||
git submodule update --init third_party/pciids
|
||||
```
|
||||
|
||||
Парсер использует такой порядок lookup:
|
||||
1. встроенный в бинарник `internal/parser/vendors/pciids/pci.ids`;
|
||||
2. `./pci.ids`;
|
||||
3. `/usr/share/hwdata/pci.ids`;
|
||||
4. `/usr/share/misc/pci.ids`;
|
||||
5. `/opt/homebrew/share/pciids/pci.ids`;
|
||||
6. `LOGPILE_PCI_IDS_PATH` (можно передать несколько путей через `:`; имеет наивысший приоритет и переопределяет предыдущие значения).
|
||||
|
||||
Пример запуска:
|
||||
|
||||
```bash
|
||||
LOGPILE_PCI_IDS_PATH=/path/to/pci.ids ./bin/logpile
|
||||
```
|
||||
|
||||
## Требования
|
||||
|
||||
- Go 1.22+
|
||||
2. Live-сбор через Redfish API с последующим экспортом.
|
||||
|
||||
## Сборка
|
||||
|
||||
```bash
|
||||
make build
|
||||
make build # bin/logpile (текущая платформа)
|
||||
make build-all # все платформы в bin/
|
||||
```
|
||||
|
||||
Бинарник будет в `bin/logpile`.
|
||||
|
||||
Для кросс-сборки:
|
||||
|
||||
```bash
|
||||
make build-all
|
||||
```
|
||||
|
||||
Артефакты:
|
||||
- `bin/logpile-linux-amd64`
|
||||
- `bin/logpile-linux-arm64`
|
||||
- `bin/logpile-darwin-amd64`
|
||||
- `bin/logpile-darwin-arm64`
|
||||
- `bin/logpile-windows-amd64.exe`
|
||||
Требования: Go 1.22+
|
||||
|
||||
## Запуск
|
||||
|
||||
@@ -110,14 +24,14 @@ make build-all
|
||||
./bin/logpile --version
|
||||
```
|
||||
|
||||
Отладка падений (чтобы консоль не закрывалась):
|
||||
На Windows `--hold-on-crash` включён по умолчанию (консоль не закрывается при падении).
|
||||
|
||||
## macOS: снятие карантина
|
||||
|
||||
```bash
|
||||
./bin/logpile --hold-on-crash
|
||||
xattr -d com.apple.quarantine /path/to/logpile-darwin-arm64
|
||||
```
|
||||
|
||||
> На Windows `--hold-on-crash` включён по умолчанию.
|
||||
|
||||
## Форматы загрузки
|
||||
|
||||
`POST /api/upload` принимает:
|
||||
@@ -126,15 +40,8 @@ make build-all
|
||||
|
||||
## Live Redfish
|
||||
|
||||
Запуск live-сбора:
|
||||
|
||||
```http
|
||||
```bash
|
||||
POST /api/collect
|
||||
```
|
||||
|
||||
Пример body:
|
||||
|
||||
```json
|
||||
{
|
||||
"host": "bmc01.example.local",
|
||||
"protocol": "redfish",
|
||||
@@ -146,84 +53,15 @@ POST /api/collect
|
||||
}
|
||||
```
|
||||
|
||||
Жизненный цикл задачи:
|
||||
`queued -> running -> success|failed|canceled`
|
||||
|
||||
Статус и прогресс:
|
||||
- `GET /api/collect/{id}`
|
||||
- `POST /api/collect/{id}/cancel`
|
||||
|
||||
## Экспорт
|
||||
|
||||
- `GET /api/export/csv` — серийные номера
|
||||
- `GET /api/export/json` — полный `AnalysisResult` (включая `raw_payloads`)
|
||||
- `GET /api/export/reanimator` — экспорт для Reanimator
|
||||
- `GET /api/export/json` — полный AnalysisResult
|
||||
- `GET /api/export/reanimator` — формат Reanimator
|
||||
|
||||
Имена экспортируемых файлов:
|
||||
## Архитектурная документация
|
||||
|
||||
`YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
|
||||
|
||||
Пример:
|
||||
`2026-02-04 (SYS-421GE-TNHR2) - C8X123456789.json`
|
||||
|
||||
## Canonical inventory (`hardware.devices`)
|
||||
|
||||
В проекте используется единый реестр устройств сервера: `hardware.devices`.
|
||||
Это source of truth для UI и экспорта Reanimator.
|
||||
|
||||
Основные правила:
|
||||
- вкладки конфигурации читают данные устройств из `hardware.devices`;
|
||||
- `Device Inventory` строится по типам `pcie`, `storage`, `gpu`, `network`;
|
||||
- экспорт Reanimator использует тот же canonical-реестр;
|
||||
- расхождение данных UI и Reanimator считается дефектом.
|
||||
|
||||
Дедупликация в canonical-реестре:
|
||||
1. по usable `serial_number` (не пустой и не `N/A/NA/NONE/NULL/UNKNOWN/-`);
|
||||
2. если serial отсутствует — по `bdf`;
|
||||
3. если serial и bdf отсутствуют — записи не схлопываются.
|
||||
|
||||
## API
|
||||
|
||||
```text
|
||||
POST /api/upload
|
||||
POST /api/collect
|
||||
GET /api/collect/{id}
|
||||
POST /api/collect/{id}/cancel
|
||||
GET /api/status
|
||||
GET /api/parsers
|
||||
GET /api/events
|
||||
GET /api/sensors
|
||||
GET /api/config
|
||||
GET /api/serials
|
||||
GET /api/firmware
|
||||
GET /api/export/csv
|
||||
GET /api/export/json
|
||||
GET /api/export/reanimator
|
||||
DELETE /api/clear
|
||||
POST /api/shutdown
|
||||
```
|
||||
|
||||
Примечания:
|
||||
- `GET /api/config` возвращает canonical inventory в `hardware.devices`.
|
||||
- `GET /api/serials` и `GET /api/firmware` строятся из того же canonical inventory.
|
||||
|
||||
`/api/status` и `/api/config` содержат метаданные источника:
|
||||
- `source_type`: `archive` | `api`
|
||||
- `protocol`: `redfish` | `ipmi` (для архивов может быть пустым)
|
||||
- `target_host`
|
||||
- `collected_at`
|
||||
|
||||
## Структура
|
||||
|
||||
```text
|
||||
cmd/logpile/main.go # entrypoint
|
||||
internal/collector/ # live collectors (redfish, ipmi mock)
|
||||
internal/parser/ # archive parsers
|
||||
internal/server/ # HTTP handlers
|
||||
internal/exporter/ # CSV/JSON export
|
||||
internal/models/ # data contracts
|
||||
web/ # embedded templates/static
|
||||
```
|
||||
→ [`docs/bible/`](docs/bible/README.md)
|
||||
|
||||
## Лицензия
|
||||
|
||||
|
||||
@@ -1,227 +0,0 @@
|
||||
# Reanimator Export - Implementation Summary
|
||||
|
||||
## Обзор
|
||||
|
||||
Реализован новый формат экспорта данных LOGPile в формат Reanimator для интеграции с системами отслеживания серверных компонентов (asset tracking).
|
||||
|
||||
## Реализованные компоненты
|
||||
|
||||
### 1. Модели данных (`internal/exporter/reanimator_models.go`)
|
||||
|
||||
Определены структуры для формата Reanimator:
|
||||
- `ReanimatorExport` - корневая структура экспорта
|
||||
- `ReanimatorHardware` - контейнер для всех аппаратных компонентов
|
||||
- `ReanimatorBoard` - материнская плата/сервер
|
||||
- `ReanimatorCPU` - процессоры
|
||||
- `ReanimatorMemory` - модули памяти (DIMM)
|
||||
- `ReanimatorStorage` - накопители
|
||||
- `ReanimatorPCIe` - PCIe устройства
|
||||
- `ReanimatorPSU` - блоки питания
|
||||
- `ReanimatorFirmware` - прошивки
|
||||
|
||||
### 2. Функции конвертации (`internal/exporter/reanimator_converter.go`)
|
||||
|
||||
Главная функция: `ConvertToReanimator(result *models.AnalysisResult) (*ReanimatorExport, error)`
|
||||
|
||||
Вспомогательные функции:
|
||||
- `inferCPUManufacturer()` - определение производителя CPU по модели (Intel/AMD/ARM/Ampere)
|
||||
- `generatePCIeSerialNumber()` - генерация серийных номеров для PCIe устройств
|
||||
- `inferStorageStatus()` - определение статуса накопителей
|
||||
- `convertBoard()`, `convertCPUs()`, `convertMemory()`, и т.д. - конвертация отдельных секций
|
||||
|
||||
**Ключевые особенности конвертации:**
|
||||
- Автоматическое определение производителя CPU из модели
|
||||
- Генерация серийных номеров для PCIe устройств: `{board_serial}-PCIE-{slot}`
|
||||
- Объединение GPUs и NetworkAdapters в секцию pcie_devices
|
||||
- Фильтрация компонентов без серийных номеров (storage, PSU)
|
||||
- Нормализация статусов в допустимые значения (`OK`, `Warning`, `Critical`, `Unknown`; `Empty` только для memory)
|
||||
- RFC3339 формат для collected_at
|
||||
- Вывод target_host из filename (`redfish://`, `ipmi://`) если отсутствует в source
|
||||
- `target_host` опционален: если определить не удалось, поле не включается в JSON
|
||||
- Нормализация `board.manufacturer` и `board.product_name`: строка `"NULL"` трактуется как отсутствующее значение
|
||||
- Нормализация/очистка `source_type` и `protocol`: в экспорт попадают только допустимые значения из гайда
|
||||
|
||||
### 3. HTTP эндпоинт
|
||||
|
||||
**Маршрут:** `GET /api/export/reanimator`
|
||||
|
||||
**Обработчик:** `handleExportReanimator()` в `internal/server/handlers.go`
|
||||
|
||||
**Функциональность:**
|
||||
- Проверка наличия данных hardware
|
||||
- Конвертация в формат Reanimator
|
||||
- Возврат JSON с отступами для читаемости
|
||||
- Установка заголовка Content-Disposition для скачивания
|
||||
|
||||
### 4. Frontend интеграция
|
||||
|
||||
Добавлена кнопка "Экспорт Reanimator" в веб-интерфейсе:
|
||||
- Расположение: вкладка "Конфигурация"
|
||||
- Использует существующую функцию `exportData('reanimator')`
|
||||
|
||||
### 5. Тесты
|
||||
|
||||
**Unit-тесты** (`reanimator_converter_test.go`):
|
||||
- `TestConvertToReanimator` - основная функция конвертации
|
||||
- `TestInferCPUManufacturer` - определение производителя CPU
|
||||
- `TestGeneratePCIeSerialNumber` - генерация серийных номеров
|
||||
- `TestInferStorageStatus` - определение статуса накопителей
|
||||
- `TestConvertCPUs`, `TestConvertMemory`, и т.д. - тесты для каждого типа компонентов
|
||||
|
||||
**Интеграционные тесты** (`reanimator_integration_test.go`):
|
||||
- `TestFullReanimatorExport` - полный экспорт с реалистичными данными
|
||||
- `TestReanimatorExportWithoutTargetHost` - тест вывода target_host
|
||||
|
||||
**Результаты:** Все тесты проходят успешно ✓
|
||||
|
||||
### 6. Документация
|
||||
|
||||
Обновлен `CLAUDE.md`:
|
||||
- Добавлен эндпоинт `/api/export/reanimator` в секцию "Export behavior"
|
||||
- Описаны ключевые особенности экспорта
|
||||
- Добавлена ссылка на спецификацию формата
|
||||
|
||||
### 7. Примеры
|
||||
|
||||
Создан пример экспорта: `example/docs/export-example-logpile.json`
|
||||
|
||||
## Формат экспорта
|
||||
|
||||
### Обязательные поля:
|
||||
- `collected_at` (RFC3339)
|
||||
- `target_host`
|
||||
- `hardware.board.serial_number`
|
||||
|
||||
### Структура экспорта:
|
||||
|
||||
```json
|
||||
{
|
||||
"filename": "redfish://10.10.10.103",
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "10.10.10.103",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"hardware": {
|
||||
"board": {...},
|
||||
"firmware": [...],
|
||||
"cpus": [...],
|
||||
"memory": [...],
|
||||
"storage": [...],
|
||||
"pcie_devices": [...],
|
||||
"power_supplies": [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Соответствие спецификации Reanimator
|
||||
|
||||
Формат полностью соответствует спецификации из `example/docs/INTEGRATION_GUIDE.md`:
|
||||
|
||||
✓ Все обязательные поля присутствуют
|
||||
✓ Правильные типы данных
|
||||
✓ RFC3339 формат времени
|
||||
✓ Генерация серийных номеров для PCIe
|
||||
✓ Определение производителя CPU
|
||||
✓ Статусы компонентов
|
||||
✓ Включение пустых слотов памяти (present=false)
|
||||
|
||||
## Особенности реализации
|
||||
|
||||
### Маппинг моделей LOGPile → Reanimator
|
||||
|
||||
| LOGPile | Reanimator | Примечания |
|
||||
|---------|------------|------------|
|
||||
| `BoardInfo` | `board` | Прямой маппинг |
|
||||
| `CPU` | `cpus` | + manufacturer (выводится) + status=`Unknown` при отсутствии фактического статуса |
|
||||
| `MemoryDIMM` | `memory` | Прямой маппинг |
|
||||
| `Storage` | `storage` | + status=`Unknown` (статус источником не предоставляется) |
|
||||
| `PCIeDevice` | `pcie_devices` | + model + status=`Unknown` |
|
||||
| `GPU` | `pcie_devices` | Объединены как `device_class=DisplayController` |
|
||||
| `NetworkAdapter` | `pcie_devices` | Объединены как NetworkController |
|
||||
| `PSU` | `power_supplies` | Прямой маппинг |
|
||||
| `FirmwareInfo` | `firmware` | Прямой маппинг |
|
||||
|
||||
### Фильтрация данных
|
||||
|
||||
**Исключаются из экспорта:**
|
||||
- Storage без serial_number
|
||||
- PSU без serial_number или present=false
|
||||
- NetworkAdapters с present=false
|
||||
|
||||
**Включаются в экспорт:**
|
||||
- Memory с present=false (как Empty slots)
|
||||
- PCIe устройства без serial_number (генерируется)
|
||||
|
||||
## Использование
|
||||
|
||||
### Через Web UI:
|
||||
1. Загрузить архив или собрать данные через API
|
||||
2. Перейти на вкладку "Конфигурация"
|
||||
3. Нажать "Экспорт Reanimator"
|
||||
|
||||
### Через API:
|
||||
```bash
|
||||
curl http://localhost:8082/api/export/reanimator > reanimator.json
|
||||
```
|
||||
|
||||
### Программно:
|
||||
```go
|
||||
import "git.mchus.pro/mchus/logpile/internal/exporter"
|
||||
|
||||
result := &models.AnalysisResult{...}
|
||||
reanimatorData, err := exporter.ConvertToReanimator(result)
|
||||
if err != nil {
|
||||
// handle error
|
||||
}
|
||||
|
||||
jsonData, _ := json.MarshalIndent(reanimatorData, "", " ")
|
||||
```
|
||||
|
||||
## Тестирование
|
||||
|
||||
Запуск тестов:
|
||||
```bash
|
||||
# Все тесты
|
||||
go test ./internal/exporter/...
|
||||
|
||||
# Только тесты Reanimator
|
||||
go test ./internal/exporter/... -v -run Reanimator
|
||||
|
||||
# С покрытием
|
||||
go test ./internal/exporter/... -cover
|
||||
```
|
||||
|
||||
## Файлы изменений
|
||||
|
||||
**Новые файлы:**
|
||||
- `internal/exporter/reanimator_models.go` (4.6 KB)
|
||||
- `internal/exporter/reanimator_converter.go` (10 KB)
|
||||
- `internal/exporter/reanimator_converter_test.go` (8.0 KB)
|
||||
- `internal/exporter/reanimator_integration_test.go` (7.4 KB)
|
||||
- `internal/exporter/generate_example_test.go` (4.3 KB)
|
||||
- `example/docs/export-example-logpile.json` (2.3 KB)
|
||||
|
||||
**Измененные файлы:**
|
||||
- `internal/server/handlers.go` - добавлен handleExportReanimator
|
||||
- `internal/server/server.go` - добавлен маршрут
|
||||
- `web/templates/index.html` - добавлена кнопка экспорта
|
||||
- `CLAUDE.md` - обновлена документация
|
||||
|
||||
## Совместимость
|
||||
|
||||
- ✓ Обратная совместимость: существующие экспорты (JSON/CSV) не затронуты
|
||||
- ✓ Формат данных: `AnalysisResult` не изменен
|
||||
- ✓ API контракты: новый эндпоинт не влияет на существующие
|
||||
|
||||
## Будущие улучшения
|
||||
|
||||
1. Поддержка статусов из реальных данных (Warning/Critical) для Storage
|
||||
2. Расширенная телеметрия для компонентов
|
||||
3. Валидация экспорта против JSON схемы Reanimator
|
||||
4. Поддержка инкрементальных обновлений
|
||||
|
||||
---
|
||||
|
||||
**Статус:** ✅ Реализация завершена и протестирована
|
||||
**Версия:** LOGPile v1.2.1+
|
||||
**Дата:** 2026-02-12
|
||||
File diff suppressed because it is too large
Load Diff
35
docs/bible/01-overview.md
Normal file
35
docs/bible/01-overview.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# 01 — Overview
|
||||
|
||||
## What is LOGPile?
|
||||
|
||||
LOGPile is a standalone Go application for BMC (Baseboard Management Controller)
|
||||
diagnostics analysis with an embedded web UI.
|
||||
It runs as a single binary with no external file dependencies.
|
||||
|
||||
## Operating modes
|
||||
|
||||
| Mode | Entry point | Description |
|
||||
|------|-------------|-------------|
|
||||
| **Offline / archive** | `POST /api/upload` | Upload a vendor diagnostic archive or a JSON snapshot; parse and display in UI |
|
||||
| **Live / Redfish** | `POST /api/collect` | Connect to a live BMC via Redfish API, collect hardware inventory, display and export |
|
||||
|
||||
Both modes produce the same in-memory `AnalysisResult` structure and expose it
|
||||
through the same API and UI.
|
||||
|
||||
## Key capabilities
|
||||
|
||||
- Single self-contained binary with embedded HTML/JS/CSS (no static file serving required).
|
||||
- Vendor archive parsing: Inspur/Kaytus, Supermicro, NVIDIA HGX Field Diagnostics,
|
||||
NVIDIA Bug Report, XigmaNAS, Generic text fallback.
|
||||
- Live Redfish collection with async progress tracking.
|
||||
- Normalized hardware inventory: CPU / RAM / Storage / GPU / PSU / NIC / PCIe / Firmware.
|
||||
- Raw `redfish_tree` snapshot stored in `RawPayloads` for future offline re-analysis.
|
||||
- Re-upload of a JSON snapshot for offline work (`/api/upload` accepts `AnalysisResult` JSON).
|
||||
- Export in CSV, JSON (full `AnalysisResult`), and Reanimator format.
|
||||
- PCI device model resolution via embedded `pci.ids` (no hardcoded model strings).
|
||||
|
||||
## Non-goals (current scope)
|
||||
|
||||
- No persistent storage — all state is in-memory per process lifetime.
|
||||
- IPMI collector is a mock scaffold only; real IPMI support is not implemented.
|
||||
- No authentication layer on the HTTP server.
|
||||
109
docs/bible/02-architecture.md
Normal file
109
docs/bible/02-architecture.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# 02 — Architecture
|
||||
|
||||
## Runtime stack
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|------------|
|
||||
| Language | Go 1.22+ |
|
||||
| HTTP | `net/http`, `http.ServeMux` |
|
||||
| UI | Embedded via `//go:embed` in `web/embed.go` (templates + static assets) |
|
||||
| State | In-memory only — no database |
|
||||
| Build | `CGO_ENABLED=0`, single static binary |
|
||||
|
||||
Default port: **8082**
|
||||
|
||||
## Directory structure
|
||||
|
||||
```
|
||||
cmd/logpile/main.go # Binary entry point, CLI flag parsing
|
||||
internal/
|
||||
collector/ # Live data collectors
|
||||
registry.go # Collector registration
|
||||
redfish/ # Redfish collector (real)
|
||||
ipmi/ # IPMI collector (mock scaffold)
|
||||
parser/ # Archive parsers
|
||||
bmc_parser.go # Top-level parser dispatcher
|
||||
vendors/ # Vendor-specific parser modules
|
||||
vendors.go # Import-side-effect registrations
|
||||
inspur/
|
||||
supermicro/
|
||||
nvidia/
|
||||
nvidia_bug_report/
|
||||
xigmanas/
|
||||
generic/
|
||||
pciids/ # PCI IDs lookup (embedded pci.ids)
|
||||
server/ # HTTP layer
|
||||
server.go # Server struct, route registration
|
||||
handlers.go # All HTTP handler functions
|
||||
exporter/ # Export formatters
|
||||
reanimator_models.go
|
||||
reanimator_converter.go
|
||||
csv.go / json.go
|
||||
models/ # Shared data contracts
|
||||
web/
|
||||
embed.go # go:embed directive
|
||||
templates/ # HTML templates
|
||||
static/ # JS / CSS
|
||||
js/app.js # Frontend — API contract consumer
|
||||
```
|
||||
|
||||
## In-memory state
|
||||
|
||||
The `Server` struct in `internal/server/server.go` holds:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `result` | `*models.AnalysisResult` | Current parsed/collected dataset |
|
||||
| `detectedVendor` | `string` | Vendor identifier from last parse |
|
||||
| Job manager | internal | Tracks live collect job status/logs |
|
||||
|
||||
State is replaced atomically on successful upload or collect.
|
||||
On a failed/canceled collect, the previous `result` is preserved unchanged.
|
||||
|
||||
## Upload flow (`POST /api/upload`)
|
||||
|
||||
```
|
||||
multipart form field: "archive"
|
||||
│
|
||||
├─ file looks like JSON?
|
||||
│ └─ parse as models.AnalysisResult snapshot → store in Server.result
|
||||
│
|
||||
└─ otherwise
|
||||
└─ parser.NewBMCParser().ParseFromReader(...)
|
||||
│
|
||||
├─ try all registered vendor parsers (highest confidence wins)
|
||||
└─ result → store in Server.result
|
||||
```
|
||||
|
||||
## Live collect flow (`POST /api/collect`)
|
||||
|
||||
```
|
||||
validate request (host / protocol / port / username / auth_type / tls_mode)
|
||||
│
|
||||
└─ launch async job
|
||||
│
|
||||
├─ progress callback → job log (queryable via GET /api/collect/{id})
|
||||
│
|
||||
├─ success:
|
||||
│ set source metadata (source_type=api, protocol, host, date)
|
||||
│ store result in Server.result
|
||||
│
|
||||
└─ failure / cancel:
|
||||
previous Server.result unchanged
|
||||
```
|
||||
|
||||
Job lifecycle states: `queued → running → success | failed | canceled`
|
||||
|
||||
## PCI IDs lookup
|
||||
|
||||
Lookup order (first match wins, `LOGPILE_PCI_IDS_PATH` highest priority):
|
||||
|
||||
1. Embedded `internal/parser/vendors/pciids/pci.ids` (compiled into binary)
|
||||
2. `./pci.ids`
|
||||
3. `/usr/share/hwdata/pci.ids`
|
||||
4. `/usr/share/misc/pci.ids`
|
||||
5. `/opt/homebrew/share/pciids/pci.ids`
|
||||
6. `LOGPILE_PCI_IDS_PATH` env var (colon-separated list; overrides all above)
|
||||
|
||||
This means unknown GPU/NIC model strings can be updated by refreshing `pci.ids`
|
||||
without any code change.
|
||||
154
docs/bible/03-api.md
Normal file
154
docs/bible/03-api.md
Normal file
@@ -0,0 +1,154 @@
|
||||
# 03 — API Reference
|
||||
|
||||
## Conventions
|
||||
|
||||
- All endpoints under `/api/`.
|
||||
- Request bodies: `application/json` or `multipart/form-data` where noted.
|
||||
- Responses: `application/json` unless file download.
|
||||
- Export filenames follow pattern: `YYYY-MM-DD (SERVER MODEL) - SERVER SN.<ext>`
|
||||
|
||||
---
|
||||
|
||||
## Upload & Data Input
|
||||
|
||||
### `POST /api/upload`
|
||||
|
||||
Upload a vendor diagnostic archive or a JSON snapshot.
|
||||
|
||||
**Request:** `multipart/form-data`, field name `archive`.
|
||||
|
||||
Accepted inputs:
|
||||
- `.tar`, `.tar.gz`, `.tgz` — vendor diagnostic archives
|
||||
- `.txt` — plain text files
|
||||
- JSON file containing a serialized `AnalysisResult` — re-loaded as-is
|
||||
|
||||
**Response:** `200 OK` with parsed result summary, or `4xx`/`5xx` on error.
|
||||
|
||||
---
|
||||
|
||||
## Live Collection
|
||||
|
||||
### `POST /api/collect`
|
||||
|
||||
Start a live Redfish collection job.
|
||||
|
||||
**Request body:**
|
||||
```json
|
||||
{
|
||||
"host": "bmc01.example.local",
|
||||
"protocol": "redfish",
|
||||
"port": 443,
|
||||
"username": "admin",
|
||||
"auth_type": "password",
|
||||
"password": "secret",
|
||||
"tls_mode": "insecure"
|
||||
}
|
||||
```
|
||||
|
||||
`tls_mode` values: `insecure` | `verify`
|
||||
|
||||
**Response:** `202 Accepted` with `{ "id": "<job-id>" }`
|
||||
|
||||
### `GET /api/collect/{id}`
|
||||
|
||||
Poll job status and progress log.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "...",
|
||||
"status": "running",
|
||||
"logs": ["Connecting to BMC...", "Collecting CPUs..."]
|
||||
}
|
||||
```
|
||||
|
||||
Status values: `queued` | `running` | `success` | `failed` | `canceled`
|
||||
|
||||
### `POST /api/collect/{id}/cancel`
|
||||
|
||||
Cancel a running job.
|
||||
|
||||
---
|
||||
|
||||
## Data Queries
|
||||
|
||||
### `GET /api/status`
|
||||
|
||||
Returns source metadata for the current dataset.
|
||||
|
||||
```json
|
||||
{
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "bmc01.example.local",
|
||||
"collected_at": "2026-02-10T15:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
`source_type`: `archive` | `api`
|
||||
|
||||
### `GET /api/config`
|
||||
|
||||
Returns the full canonical hardware inventory (`hardware.devices`) plus board info.
|
||||
|
||||
### `GET /api/events`
|
||||
|
||||
Returns parsed diagnostic events.
|
||||
|
||||
### `GET /api/sensors`
|
||||
|
||||
Returns sensor readings (temperatures, voltages, fan speeds).
|
||||
|
||||
### `GET /api/serials`
|
||||
|
||||
Returns serial numbers built from canonical `hardware.devices`.
|
||||
|
||||
### `GET /api/firmware`
|
||||
|
||||
Returns firmware versions built from canonical `hardware.devices`.
|
||||
|
||||
### `GET /api/parsers`
|
||||
|
||||
Returns list of registered vendor parsers with their identifiers.
|
||||
|
||||
---
|
||||
|
||||
## Export
|
||||
|
||||
### `GET /api/export/csv`
|
||||
|
||||
Download serial numbers as CSV.
|
||||
|
||||
### `GET /api/export/json`
|
||||
|
||||
Download full `AnalysisResult` as JSON (includes `raw_payloads`).
|
||||
|
||||
### `GET /api/export/reanimator`
|
||||
|
||||
Download hardware data in Reanimator format for asset tracking integration.
|
||||
See [`07-exporters.md`](07-exporters.md) for full format spec.
|
||||
|
||||
---
|
||||
|
||||
## Management
|
||||
|
||||
### `DELETE /api/clear`
|
||||
|
||||
Clear current in-memory dataset.
|
||||
|
||||
### `POST /api/shutdown`
|
||||
|
||||
Gracefully shut down the server process.
|
||||
|
||||
---
|
||||
|
||||
## Source metadata fields
|
||||
|
||||
Fields present in `/api/status` and `/api/config`:
|
||||
|
||||
| Field | Values |
|
||||
|-------|--------|
|
||||
| `source_type` | `archive` \| `api` |
|
||||
| `protocol` | `redfish` \| `ipmi` (may be empty for archive uploads) |
|
||||
| `target_host` | IP or hostname |
|
||||
| `collected_at` | RFC3339 timestamp |
|
||||
82
docs/bible/04-data-models.md
Normal file
82
docs/bible/04-data-models.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# 04 — Data Models
|
||||
|
||||
## AnalysisResult
|
||||
|
||||
`internal/models/` — the central data contract shared by parsers, collectors, exporters, and the HTTP layer.
|
||||
|
||||
**Stability rule:** Never break the JSON shape of `AnalysisResult`.
|
||||
Backward-compatible additions are allowed; removals or renames are not.
|
||||
|
||||
Key top-level fields:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `hardware` | `Hardware` | All normalized hardware inventory |
|
||||
| `events` | `[]Event` | Diagnostic events from parsers |
|
||||
| `sensors` | `[]Sensor` | Sensor readings |
|
||||
| `raw_payloads` | `map[string]any` | Raw vendor data (e.g. `redfish_tree`) |
|
||||
| `source` | `SourceMeta` | Origin metadata (type, protocol, host, date) |
|
||||
|
||||
### Hardware sub-structure
|
||||
|
||||
```
|
||||
Hardware
|
||||
├── board BoardInfo — server/motherboard identity
|
||||
├── devices []Device — CANONICAL INVENTORY (see below)
|
||||
├── cpus []CPU
|
||||
├── memory []MemoryDIMM
|
||||
├── storage []Storage
|
||||
├── gpus []GPU
|
||||
├── psus []PSU
|
||||
├── nics []NetworkAdapter
|
||||
├── pcie []PCIeDevice
|
||||
└── firmware []FirmwareInfo
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Canonical Device Repository (`hardware.devices`)
|
||||
|
||||
`hardware.devices` is the **single source of truth** for hardware inventory.
|
||||
|
||||
### Rules — must not be violated
|
||||
|
||||
1. All UI tabs displaying hardware components **must read from `hardware.devices`**.
|
||||
2. The Device Inventory tab shows kinds: `pcie`, `storage`, `gpu`, `network`.
|
||||
3. The Reanimator exporter **must use the same `hardware.devices`** as the UI.
|
||||
4. Any discrepancy between UI data and Reanimator export data is a **bug**.
|
||||
5. New hardware attributes must be added to the canonical device schema **first**,
|
||||
then mapped to Reanimator/UI — never the other way around.
|
||||
6. The exporter should group/filter canonical records by section, not rebuild data
|
||||
from multiple sources.
|
||||
|
||||
### Deduplication logic (applied once by repository builder)
|
||||
|
||||
| Priority | Key used |
|
||||
|----------|----------|
|
||||
| 1 | `serial_number` — usable (not empty, not `N/A`, `NA`, `NONE`, `NULL`, `UNKNOWN`, `-`) |
|
||||
| 2 | `bdf` — PCI Bus:Device.Function address |
|
||||
| 3 | No merge — records remain distinct if both serial and bdf are absent |
|
||||
|
||||
### Device schema alignment
|
||||
|
||||
Keep `hardware.devices` schema as close as possible to Reanimator JSON field names.
|
||||
This minimizes translation logic in the exporter and prevents drift.
|
||||
|
||||
---
|
||||
|
||||
## Source metadata (`SourceMeta`)
|
||||
|
||||
Carried by both `/api/status` and `/api/config`:
|
||||
|
||||
```json
|
||||
{
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "10.0.0.1",
|
||||
"collected_at": "2026-02-10T15:30:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
Valid `source_type` values: `archive`, `api`
|
||||
Valid `protocol` values: `redfish`, `ipmi` (empty is allowed for archive uploads)
|
||||
58
docs/bible/05-collectors.md
Normal file
58
docs/bible/05-collectors.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# 05 — Collectors
|
||||
|
||||
Collectors live in `internal/collector/`.
|
||||
Registration: `internal/collector/registry.go`
|
||||
|
||||
---
|
||||
|
||||
## Redfish Collector (`redfish`)
|
||||
|
||||
**Status:** Production-ready.
|
||||
|
||||
### Discovery
|
||||
|
||||
Dynamic — does not assume fixed paths. Discovers:
|
||||
- `Systems` collection → per-system resources
|
||||
- `Chassis` collection → enclosure/board data
|
||||
- `Managers` collection → BMC/firmware info
|
||||
|
||||
### Collected data
|
||||
|
||||
| Category | Notes |
|
||||
|----------|-------|
|
||||
| CPU | Model, cores, threads, socket, status |
|
||||
| Memory | DIMM slot, size, type, speed, serial, manufacturer |
|
||||
| Storage | Slot, type, model, serial, firmware, interface, status |
|
||||
| GPU | Detected via PCIe class + NVIDIA vendor ID |
|
||||
| PSU | Model, serial, wattage, firmware, telemetry (input/output power, voltage) |
|
||||
| NIC | Model, serial, port count, BDF |
|
||||
| PCIe | Slot, vendor_id, device_id, BDF, link width/speed |
|
||||
| Firmware | BIOS, BMC versions |
|
||||
|
||||
### Raw snapshot
|
||||
|
||||
Full Redfish response tree is stored in `result.RawPayloads["redfish_tree"]`.
|
||||
This allows future offline re-analysis without re-collecting from a live BMC.
|
||||
|
||||
### Parsing guidelines
|
||||
|
||||
When adding Redfish mappings, follow these principles:
|
||||
- Support alternate collection paths (resources may appear at different odata URLs).
|
||||
- Follow `@odata.id` references and handle embedded `Members` arrays.
|
||||
- Deduplicate by serial / BDF / slot+model (in that priority order).
|
||||
- Prefer tolerant/fallback parsing — missing fields should be silently skipped,
|
||||
not cause the whole collection to fail.
|
||||
|
||||
### Progress reporting
|
||||
|
||||
The collector emits progress log entries at each stage (connecting, enumerating systems,
|
||||
collecting CPUs, etc.) so the UI can display meaningful status.
|
||||
|
||||
---
|
||||
|
||||
## IPMI Collector (`ipmi`)
|
||||
|
||||
**Status:** Mock scaffold only — not implemented.
|
||||
|
||||
Registered in the collector registry but returns placeholder data.
|
||||
Real IPMI support is a future work item.
|
||||
224
docs/bible/06-parsers.md
Normal file
224
docs/bible/06-parsers.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# 06 — Parsers
|
||||
|
||||
## Framework
|
||||
|
||||
### Registration
|
||||
|
||||
Each vendor parser registers itself via Go's `init()` side-effect import pattern.
|
||||
|
||||
All registrations are collected in `internal/parser/vendors/vendors.go`:
|
||||
```go
|
||||
import (
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/supermicro"
|
||||
// etc.
|
||||
)
|
||||
```
|
||||
|
||||
### VendorParser interface
|
||||
|
||||
```go
|
||||
type VendorParser interface {
|
||||
Name() string // human-readable name
|
||||
Vendor() string // vendor identifier string
|
||||
Detect(files []ExtractedFile) int // confidence 0–100
|
||||
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
|
||||
}
|
||||
```
|
||||
|
||||
### Selection logic
|
||||
|
||||
All registered parsers run `Detect()` against the uploaded archive's file list.
|
||||
The parser with the **highest confidence score** is selected.
|
||||
Multiple parsers may return >0; only the top scorer is used.
|
||||
|
||||
### Adding a new vendor parser
|
||||
|
||||
1. `mkdir -p internal/parser/vendors/VENDORNAME`
|
||||
2. Copy `internal/parser/vendors/template/parser.go.template` as starting point.
|
||||
3. Implement `Detect()` and `Parse()`.
|
||||
4. Add blank import to `vendors/vendors.go`.
|
||||
|
||||
`Detect()` tips:
|
||||
- Look for unique filenames or directory names.
|
||||
- Check file content for vendor-specific markers.
|
||||
- Return 70+ only when confident; return 0 if clearly not a match.
|
||||
|
||||
### Parser versioning
|
||||
|
||||
Each parser file contains a `parserVersion` constant.
|
||||
Increment the version whenever parsing logic changes — this helps trace which
|
||||
version produced a given result.
|
||||
|
||||
---
|
||||
|
||||
## Vendor parsers
|
||||
|
||||
### Inspur / Kaytus (`inspur`)
|
||||
|
||||
**Status:** Ready. Tested on KR4268X2 (onekeylog format).
|
||||
|
||||
**Archive format:** `.tar.gz` onekeylog
|
||||
|
||||
**Primary source files:**
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `asset.json` | Base hardware inventory |
|
||||
| `component.log` | Component list |
|
||||
| `devicefrusdr.log` | FRU and SDR data |
|
||||
| `onekeylog/runningdata/redis-dump.rdb` | Runtime enrichment (optional) |
|
||||
|
||||
**Redis RDB enrichment** (applied conservatively — fills missing fields only):
|
||||
- GPU: `serial_number`, `firmware` (VBIOS/FW), runtime telemetry
|
||||
- NIC: firmware, serial, part number (when text logs leave fields empty)
|
||||
|
||||
**Module structure:**
|
||||
```
|
||||
inspur/
|
||||
parser.go — main parser + registration
|
||||
sdr.go — sensor/SDR parsing
|
||||
fru.go — FRU serial parsing
|
||||
asset.go — asset.json parsing
|
||||
syslog.go — syslog parsing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Supermicro (`supermicro`)
|
||||
|
||||
**Status:** Ready (v1.0.0). Tested on SYS-821GE-TNHR crash dumps.
|
||||
|
||||
**Archive format:** `.tgz` / `.tar.gz` / `.tar`
|
||||
|
||||
**Primary source file:** `CDump.txt` — JSON crashdump file
|
||||
|
||||
**Confidence:** +80 when `CDump.txt` contains `crash_data`, `METADATA`, `bmc_fw_ver` markers.
|
||||
|
||||
**Extracted data:**
|
||||
- CPUs: CPUID, core count, manufacturer (Intel), microcode version (as firmware field)
|
||||
- FRU: BMC firmware version, BIOS version, ME firmware version, CPU PPIN
|
||||
- Events: crashdump collection event + MCA errors
|
||||
|
||||
**MCA error detection:**
|
||||
- Bit 63 (Valid), Bit 61 (UC — uncorrected), Bit 60 (EN — enabled)
|
||||
- Corrected MCA errors → `Warning` severity
|
||||
- Uncorrected MCA errors → `Critical` severity
|
||||
|
||||
**Known limitations:**
|
||||
- TOR dump and extended MCA register data not yet parsed.
|
||||
- No CPU model name (only CPUID hex code available in crashdump format).
|
||||
|
||||
---
|
||||
|
||||
### NVIDIA HGX Field Diagnostics (`nvidia`)
|
||||
|
||||
**Status:** Ready (v1.1.0). Works with any server vendor.
|
||||
|
||||
**Archive format:** `.tar` / `.tar.gz`
|
||||
|
||||
**Confidence scoring:**
|
||||
|
||||
| File | Score |
|
||||
|------|-------|
|
||||
| `unified_summary.json` with "HGX Field Diag" marker | +40 |
|
||||
| `summary.json` | +20 |
|
||||
| `summary.csv` | +15 |
|
||||
| `gpu_fieldiag/` directory | +15 |
|
||||
|
||||
**Source files:**
|
||||
|
||||
| File | Content |
|
||||
|------|---------|
|
||||
| `output.log` | dmidecode — server manufacturer, model, serial number |
|
||||
| `unified_summary.json` | GPU details, NVSwitch devices, PCI addresses |
|
||||
| `summary.json` | Diagnostic test results and error codes |
|
||||
| `summary.csv` | Alternative test results format |
|
||||
|
||||
**Extracted data:**
|
||||
- GPUs: slot, model, manufacturer, firmware (VBIOS), BDF
|
||||
- NVSwitch devices: slot, device_class, vendor_id, device_id, BDF, link speed/width
|
||||
- Events: diagnostic test failures (connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power)
|
||||
|
||||
**Severity mapping:**
|
||||
- `info` — tests passed
|
||||
- `warning` — e.g. "Row remapping failed"
|
||||
- `critical` — error codes 300+
|
||||
|
||||
**Known limitations:**
|
||||
- Detailed logs in `gpu_fieldiag/*.log` are not parsed.
|
||||
- No CPU, memory, or storage extraction (not present in field diag archives).
|
||||
|
||||
---
|
||||
|
||||
### NVIDIA Bug Report (`nvidia_bug_report`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
|
||||
**File format:** `nvidia-bug-report-*.log.gz` (gzip-compressed text)
|
||||
|
||||
**Confidence:** 85 (high priority for matching filename pattern)
|
||||
|
||||
**Source sections parsed:**
|
||||
|
||||
| dmidecode section | Extracts |
|
||||
|-------------------|---------|
|
||||
| System Information | server serial, UUID, manufacturer, product name |
|
||||
| Processor Information | CPU model, serial, core/thread count, frequency |
|
||||
| Memory Device | DIMM slot, size, type, manufacturer, serial, part number, speed |
|
||||
| System Power Supply | PSU location, manufacturer, model, serial, wattage, firmware, status |
|
||||
|
||||
| Other source | Extracts |
|
||||
|--------------|---------|
|
||||
| `lspci -vvv` (Ethernet/Network/IB) | NIC model (from VPD), BDF, slot, P/N, S/N, port count, port type |
|
||||
| `/proc/driver/nvidia/gpus/*/information` | GPU model, BDF, UUID, VBIOS version, IRQ |
|
||||
| NVRM version line | NVIDIA driver version |
|
||||
|
||||
**Known limitations:**
|
||||
- Driver error/warning log lines not yet extracted.
|
||||
- GPU temperature/utilization metrics require additional parsing sections.
|
||||
|
||||
---
|
||||
|
||||
### XigmaNAS (`xigmanas`)
|
||||
|
||||
**Status:** Ready.
|
||||
|
||||
**Archive format:** Plain log files (FreeBSD-based NAS system)
|
||||
|
||||
**Detection:** Files named `xigmanas`, `system`, or `dmesg`; content containing "XigmaNAS" or "FreeBSD"; SMART data presence.
|
||||
|
||||
**Extracted data:**
|
||||
- System: firmware version, uptime, CPU model, memory configuration, hardware platform
|
||||
- Storage: disk models, serial numbers, capacity, health, SMART temperatures
|
||||
- Populates: `Hardware.Firmware`, `Hardware.CPUs`, `Hardware.Memory`, `Hardware.Storage`, `Sensors`
|
||||
|
||||
---
|
||||
|
||||
### Generic text fallback (`generic`)
|
||||
|
||||
**Status:** Ready (v1.0.0).
|
||||
|
||||
**Confidence:** 15 (lowest — only matches if no other parser scores higher)
|
||||
|
||||
**Purpose:** Fallback for any text file or single `.gz` file not matching a specific vendor.
|
||||
|
||||
**Behavior:**
|
||||
- If filename matches `nvidia-bug-report-*.log.gz`: extracts driver version and GPU list.
|
||||
- Otherwise: confirms file is text (not binary) and records a basic "Text File" event.
|
||||
|
||||
---
|
||||
|
||||
## Supported vendor matrix
|
||||
|
||||
| Vendor | ID | Status | Tested on |
|
||||
|--------|----|--------|-----------|
|
||||
| Inspur / Kaytus | `inspur` | Ready | KR4268X2 onekeylog |
|
||||
| Supermicro | `supermicro` | Ready | SYS-821GE-TNHR crashdump |
|
||||
| NVIDIA HGX Field Diag | `nvidia` | Ready | Various HGX servers |
|
||||
| NVIDIA Bug Report | `nvidia_bug_report` | Ready | H100 systems |
|
||||
| XigmaNAS | `xigmanas` | Ready | FreeBSD NAS logs |
|
||||
| Generic fallback | `generic` | Ready | Any text file |
|
||||
| Dell iDRAC | `dell` | Planned | — |
|
||||
| HPE iLO | `hpe` | Planned | — |
|
||||
| Lenovo XCC | `lenovo` | Planned | — |
|
||||
335
docs/bible/07-exporters.md
Normal file
335
docs/bible/07-exporters.md
Normal file
@@ -0,0 +1,335 @@
|
||||
# 07 — Exporters & Reanimator Integration
|
||||
|
||||
## Export endpoints summary
|
||||
|
||||
| Endpoint | Format | Filename pattern |
|
||||
|----------|--------|-----------------|
|
||||
| `GET /api/export/csv` | CSV — serial numbers | `YYYY-MM-DD (MODEL) - SN.csv` |
|
||||
| `GET /api/export/json` | Full `AnalysisResult` JSON (incl. `raw_payloads`) | `YYYY-MM-DD (MODEL) - SN.json` |
|
||||
| `GET /api/export/reanimator` | Reanimator hardware JSON | `YYYY-MM-DD (MODEL) - SN.json` |
|
||||
|
||||
---
|
||||
|
||||
## Reanimator Export
|
||||
|
||||
### Purpose
|
||||
|
||||
Exports hardware inventory data in the format expected by the Reanimator asset tracking
|
||||
system. Enables one-click push from LOGPile to an external asset management platform.
|
||||
|
||||
### Implementation files
|
||||
|
||||
| File | Role |
|
||||
|------|------|
|
||||
| `internal/exporter/reanimator_models.go` | Go structs for Reanimator JSON |
|
||||
| `internal/exporter/reanimator_converter.go` | `ConvertToReanimator()` and helpers |
|
||||
| `internal/server/handlers.go` | `handleExportReanimator()` HTTP handler |
|
||||
|
||||
### Conversion rules
|
||||
|
||||
- Source: canonical `hardware.devices` repository (see [`04-data-models.md`](04-data-models.md))
|
||||
- CPU manufacturer inferred from model string (Intel / AMD / ARM / Ampere)
|
||||
- PCIe serial number generated when absent: `{board_serial}-PCIE-{slot}`
|
||||
- Status values normalized to: `OK`, `Warning`, `Critical`, `Unknown` (`Empty` only for memory slots)
|
||||
- Timestamps in RFC3339 format
|
||||
- `target_host` derived from `filename` field (`redfish://…`, `ipmi://…`) if not in source; omitted if undeterminable
|
||||
- `board.manufacturer` and `board.product_name` values of `"NULL"` treated as absent
|
||||
|
||||
### LOGPile → Reanimator field mapping
|
||||
|
||||
| LOGPile type | Reanimator section | Notes |
|
||||
|---|---|---|
|
||||
| `BoardInfo` | `board` | Direct mapping |
|
||||
| `CPU` | `cpus` | + manufacturer (inferred) |
|
||||
| `MemoryDIMM` | `memory` | Direct; empty slots included (`present=false`) |
|
||||
| `Storage` | `storage` | Excluded if no `serial_number` |
|
||||
| `PCIeDevice` | `pcie_devices` | Serial generated if missing |
|
||||
| `GPU` | `pcie_devices` | `device_class=DisplayController` |
|
||||
| `NetworkAdapter` | `pcie_devices` | `device_class=NetworkController` |
|
||||
| `PSU` | `power_supplies` | Excluded if no serial or `present=false` |
|
||||
| `FirmwareInfo` | `firmware` | Direct mapping |
|
||||
|
||||
### Inclusion / exclusion rules
|
||||
|
||||
**Included:**
|
||||
- Memory slots with `present=false` (as Empty slots)
|
||||
- PCIe devices without serial number (serial is generated)
|
||||
|
||||
**Excluded:**
|
||||
- Storage without `serial_number`
|
||||
- PSU without `serial_number` or with `present=false`
|
||||
- NetworkAdapters with `present=false`
|
||||
|
||||
---
|
||||
|
||||
## Reanimator Integration Guide
|
||||
|
||||
This section documents the Reanimator receiver-side JSON format (what the Reanimator
|
||||
system expects when it ingests a LOGPile export).
|
||||
|
||||
> **Important:** The Reanimator endpoint uses a strict JSON decoder (`DisallowUnknownFields`).
|
||||
> Any unknown field — including nested ones — causes `400 Bad Request`.
|
||||
> Use only `snake_case` keys listed here.
|
||||
|
||||
### Top-level structure
|
||||
|
||||
```json
|
||||
{
|
||||
"filename": "redfish://10.10.10.103",
|
||||
"source_type": "api",
|
||||
"protocol": "redfish",
|
||||
"target_host": "10.10.10.103",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"hardware": {
|
||||
"board": {...},
|
||||
"firmware": [...],
|
||||
"cpus": [...],
|
||||
"memory": [...],
|
||||
"storage": [...],
|
||||
"pcie_devices": [...],
|
||||
"power_supplies": [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Required:** `collected_at`, `hardware.board.serial_number`
|
||||
**Optional:** `target_host`, `source_type`, `protocol`, `filename`
|
||||
|
||||
`source_type` values: `api`, `logfile`, `manual`
|
||||
`protocol` values: `redfish`, `ipmi`, `snmp`, `ssh`
|
||||
|
||||
### Component status fields (all component sections)
|
||||
|
||||
Each component may carry:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `status` | string | `OK`, `Warning`, `Critical`, `Unknown`, `Empty` |
|
||||
| `status_checked_at` | RFC3339 | When status was last verified |
|
||||
| `status_changed_at` | RFC3339 | When status last changed |
|
||||
| `status_at_collection` | object | `{ "status": "...", "at": "..." }` — snapshot-time status |
|
||||
| `status_history` | array | `[{ "status": "...", "changed_at": "...", "details": "..." }]` |
|
||||
| `error_description` | string | Human-readable error for Warning/Critical |
|
||||
|
||||
### Board
|
||||
|
||||
```json
|
||||
{
|
||||
"board": {
|
||||
"manufacturer": "Supermicro",
|
||||
"product_name": "X12DPG-QT6",
|
||||
"serial_number": "21D634101",
|
||||
"part_number": "X12DPG-QT6-REV1.01",
|
||||
"uuid": "d7ef2fe5-2fd0-11f0-910a-346f11040868"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`serial_number` required. `manufacturer` / `product_name` of `"NULL"` treated as absent.
|
||||
|
||||
### CPUs
|
||||
|
||||
```json
|
||||
{
|
||||
"socket": 0,
|
||||
"model": "INTEL(R) XEON(R) GOLD 6530",
|
||||
"cores": 32,
|
||||
"threads": 64,
|
||||
"frequency_mhz": 2100,
|
||||
"max_frequency_mhz": 4000,
|
||||
"manufacturer": "Intel",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`socket` (int) and `model` required. Serial generated: `{board_serial}-CPU-{socket}`.
|
||||
|
||||
LOT format: `CPU_{VENDOR}_{MODEL_NORMALIZED}` → e.g. `CPU_INTEL_XEON_GOLD_6530`
|
||||
|
||||
### Memory
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "CPU0_C0D0",
|
||||
"location": "CPU0_C0D0",
|
||||
"present": true,
|
||||
"size_mb": 32768,
|
||||
"type": "DDR5",
|
||||
"max_speed_mhz": 4800,
|
||||
"current_speed_mhz": 4800,
|
||||
"manufacturer": "Hynix",
|
||||
"serial_number": "80AD032419E17CEEC1",
|
||||
"part_number": "HMCG88AGBRA191N",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot` and `present` required. `serial_number` required when `present=true`.
|
||||
Empty slots (`present=false`, `status="Empty"`) are included but no component created.
|
||||
|
||||
LOT format: `DIMM_{TYPE}_{SIZE_GB}GB` → e.g. `DIMM_DDR5_32GB`
|
||||
|
||||
### Storage
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "OB01",
|
||||
"type": "NVMe",
|
||||
"model": "INTEL SSDPF2KX076T1",
|
||||
"size_gb": 7680,
|
||||
"serial_number": "BTAX41900GF87P6DGN",
|
||||
"manufacturer": "Intel",
|
||||
"firmware": "9CV10510",
|
||||
"interface": "NVMe",
|
||||
"present": true,
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot`, `model`, `serial_number`, `present` required.
|
||||
|
||||
LOT format: `{TYPE}_{INTERFACE}_{SIZE_TB}TB` → e.g. `SSD_NVME_07.68TB`
|
||||
|
||||
### Power Supplies
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "0",
|
||||
"present": true,
|
||||
"model": "GW-CRPS3000LW",
|
||||
"vendor": "Great Wall",
|
||||
"wattage_w": 3000,
|
||||
"serial_number": "2P06C102610",
|
||||
"part_number": "V0310C9000000000",
|
||||
"firmware": "00.03.05",
|
||||
"status": "OK",
|
||||
"input_power_w": 137,
|
||||
"output_power_w": 104,
|
||||
"input_voltage": 215.25
|
||||
}
|
||||
```
|
||||
|
||||
`slot`, `present` required. `serial_number` required when `present=true`.
|
||||
Telemetry fields (`input_power_w`, `output_power_w`, `input_voltage`) stored in observation only.
|
||||
|
||||
LOT format: `PSU_{WATTAGE}W_{VENDOR_NORMALIZED}` → e.g. `PSU_3000W_GREAT_WALL`
|
||||
|
||||
### PCIe Devices
|
||||
|
||||
```json
|
||||
{
|
||||
"slot": "PCIeCard1",
|
||||
"vendor_id": 32902,
|
||||
"device_id": 2912,
|
||||
"bdf": "0000:18:00.0",
|
||||
"device_class": "MassStorageController",
|
||||
"manufacturer": "Intel",
|
||||
"model": "RAID Controller RSP3DD080F",
|
||||
"link_width": 8,
|
||||
"link_speed": "Gen3",
|
||||
"max_link_width": 8,
|
||||
"max_link_speed": "Gen3",
|
||||
"serial_number": "RAID-001-12345",
|
||||
"firmware": "50.9.1-4296",
|
||||
"status": "OK"
|
||||
}
|
||||
```
|
||||
|
||||
`slot` required. Serial generated if absent: `{board_serial}-PCIE-{slot}`.
|
||||
|
||||
`device_class` values: `NetworkController`, `MassStorageController`, `DisplayController`, etc.
|
||||
|
||||
LOT format: `PCIE_{DEVICE_CLASS}_{MODEL_NORMALIZED}` → e.g. `PCIE_NETWORK_CONNECTX5`
|
||||
|
||||
### Firmware
|
||||
|
||||
```json
|
||||
[
|
||||
{ "device_name": "BIOS", "version": "06.08.05" },
|
||||
{ "device_name": "BMC", "version": "5.17.00" }
|
||||
]
|
||||
```
|
||||
|
||||
Both fields required. Changes trigger `FIRMWARE_CHANGED` timeline events.
|
||||
|
||||
---
|
||||
|
||||
### Import process (Reanimator side)
|
||||
|
||||
1. Validate `collected_at` (RFC3339) and `hardware.board.serial_number`.
|
||||
2. Find or create Asset by `board.serial_number` → `vendor_serial`.
|
||||
3. For each component: filter `present=false`, auto-determine LOT, find or create Component,
|
||||
create Observation, update Installations.
|
||||
4. Detect removed components (present in previous snapshot, absent in current) → close Installation.
|
||||
5. Generate timeline events: `LOG_COLLECTED`, `INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`.
|
||||
|
||||
**Idempotency:** Repeated import of the same snapshot (same content hash) returns `200 OK`
|
||||
with `"duplicate": true` and does not create duplicate records.
|
||||
|
||||
### Reanimator API endpoint
|
||||
|
||||
```http
|
||||
POST /ingest/hardware
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
**Success (201):**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"bundle_id": "lb_01J...",
|
||||
"asset_id": "mach_01J...",
|
||||
"collected_at": "2026-02-10T15:30:00Z",
|
||||
"duplicate": false,
|
||||
"summary": {
|
||||
"parts_observed": 15,
|
||||
"parts_created": 2,
|
||||
"installations_created": 2,
|
||||
"timeline_events_created": 9
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Duplicate (200):**
|
||||
```json
|
||||
{ "status": "success", "duplicate": true, "message": "LogBundle with this content hash already exists" }
|
||||
```
|
||||
|
||||
**Error (400):**
|
||||
```json
|
||||
{ "status": "error", "error": "validation_failed", "details": { "field": "...", "message": "..." } }
|
||||
```
|
||||
|
||||
Common `400` causes:
|
||||
- Unknown JSON field (strict decoder)
|
||||
- Wrong key name (e.g. `targetHost` instead of `target_host`)
|
||||
- Invalid `collected_at` format (must be RFC3339)
|
||||
- Empty `hardware.board.serial_number`
|
||||
|
||||
### LOT normalization rules
|
||||
|
||||
1. Remove special chars `( ) - ® ™`; replace spaces with `_`
|
||||
2. Uppercase all
|
||||
3. Collapse multiple underscores to one
|
||||
4. Strip common prefixes like `MODEL:`, `PN:`
|
||||
|
||||
### Status values
|
||||
|
||||
| Value | Meaning | Action |
|
||||
|-------|---------|--------|
|
||||
| `OK` | Normal | — |
|
||||
| `Warning` | Degraded | Create `COMPONENT_WARNING` event (optional) |
|
||||
| `Critical` | Failed | Auto-create `failure_event`, create `COMPONENT_FAILED` event |
|
||||
| `Unknown` | Not determinable | Treat as working |
|
||||
| `Empty` | Slot unpopulated | No component created (memory/PCIe only) |
|
||||
|
||||
### Missing field handling
|
||||
|
||||
| Field | Fallback |
|
||||
|-------|---------|
|
||||
| CPU serial | Generated: `{board_serial}-CPU-{socket}` |
|
||||
| PCIe serial | Generated: `{board_serial}-PCIE-{slot}` |
|
||||
| Other serial | Component skipped if absent |
|
||||
| manufacturer (PCIe) | Looked up from `vendor_id` (8086→Intel, 10de→NVIDIA, 15b3→Mellanox…) |
|
||||
| status | Treated as `Unknown` |
|
||||
| firmware | No `FIRMWARE_CHANGED` event |
|
||||
90
docs/bible/08-build-release.md
Normal file
90
docs/bible/08-build-release.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# 08 — Build & Release
|
||||
|
||||
## CLI flags
|
||||
|
||||
Defined in `cmd/logpile/main.go`:
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--port` | `8082` | HTTP server port |
|
||||
| `--file` | — | Reserved for archive preload (not active) |
|
||||
| `--version` | — | Print version and exit |
|
||||
| `--no-browser` | — | Do not open browser on start |
|
||||
| `--hold-on-crash` | `true` on Windows | Keep console open on fatal crash for debugging |
|
||||
|
||||
## Build
|
||||
|
||||
```bash
|
||||
# Local binary (current OS/arch)
|
||||
make build
|
||||
# Output: bin/logpile
|
||||
|
||||
# Cross-platform binaries
|
||||
make build-all
|
||||
# Output:
|
||||
# bin/logpile-linux-amd64
|
||||
# bin/logpile-linux-arm64
|
||||
# bin/logpile-darwin-amd64
|
||||
# bin/logpile-darwin-arm64
|
||||
# bin/logpile-windows-amd64.exe
|
||||
```
|
||||
|
||||
Both `make build` and `make build-all` run `scripts/update-pci-ids.sh --best-effort`
|
||||
before compilation to sync `pci.ids` from the submodule.
|
||||
|
||||
To skip PCI IDs update:
|
||||
```bash
|
||||
SKIP_PCI_IDS_UPDATE=1 make build
|
||||
```
|
||||
|
||||
Build flags: `CGO_ENABLED=0` — fully static binary, no C runtime dependency.
|
||||
|
||||
## PCI IDs submodule
|
||||
|
||||
Source: `third_party/pciids` (git submodule → `github.com/pciutils/pciids`)
|
||||
Local copy embedded at build time: `internal/parser/vendors/pciids/pci.ids`
|
||||
|
||||
```bash
|
||||
# Manual update
|
||||
make update-pci-ids
|
||||
|
||||
# Init submodule after fresh clone
|
||||
git submodule update --init third_party/pciids
|
||||
```
|
||||
|
||||
## Release process
|
||||
|
||||
```bash
|
||||
scripts/release.sh
|
||||
```
|
||||
|
||||
What it does:
|
||||
1. Reads version from `git describe --tags`
|
||||
2. Validates clean working tree (override: `ALLOW_DIRTY=1`)
|
||||
3. Sets stable `GOPATH` / `GOCACHE` / `GOTOOLCHAIN` env
|
||||
4. Creates `releases/{VERSION}/` directory
|
||||
5. Generates `RELEASE_NOTES.md` template if not present
|
||||
6. Builds `darwin-arm64` and `windows-amd64` binaries
|
||||
7. Packages all binaries found in `bin/` as `.tar.gz` / `.zip`
|
||||
8. Generates `SHA256SUMS.txt`
|
||||
9. Prints next steps (tag, push, create release on git.mchus.pro)
|
||||
|
||||
Release notes live in `docs/releases/<tag>.md`.
|
||||
Tags and releases are published with `tea`.
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
./bin/logpile
|
||||
./bin/logpile --port 9090
|
||||
./bin/logpile --no-browser
|
||||
./bin/logpile --version
|
||||
./bin/logpile --hold-on-crash # keep console open on crash (default on Windows)
|
||||
```
|
||||
|
||||
## macOS Gatekeeper
|
||||
|
||||
After downloading a binary, remove the quarantine attribute:
|
||||
```bash
|
||||
xattr -d com.apple.quarantine /path/to/logpile-darwin-arm64
|
||||
```
|
||||
43
docs/bible/09-testing.md
Normal file
43
docs/bible/09-testing.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# 09 — Testing
|
||||
|
||||
## Required before merge
|
||||
|
||||
```bash
|
||||
go test ./...
|
||||
```
|
||||
|
||||
All tests must pass before any change is merged.
|
||||
|
||||
## Where to add tests
|
||||
|
||||
| Change area | Test location |
|
||||
|-------------|---------------|
|
||||
| Collectors | `internal/collector/*_test.go` |
|
||||
| HTTP handlers | `internal/server/*_test.go` |
|
||||
| Exporters | `internal/exporter/*_test.go` |
|
||||
| Parsers | `internal/parser/vendors/<vendor>/*_test.go` |
|
||||
|
||||
## Exporter tests
|
||||
|
||||
The Reanimator exporter has comprehensive coverage:
|
||||
|
||||
| Test file | Coverage |
|
||||
|-----------|----------|
|
||||
| `reanimator_converter_test.go` | Unit tests per conversion function |
|
||||
| `reanimator_integration_test.go` | Full export with realistic `AnalysisResult` |
|
||||
|
||||
Run exporter tests only:
|
||||
```bash
|
||||
go test ./internal/exporter/...
|
||||
go test ./internal/exporter/... -v -run Reanimator
|
||||
go test ./internal/exporter/... -cover
|
||||
```
|
||||
|
||||
## Guidelines
|
||||
|
||||
- Prefer table-driven tests for parsing logic (multiple input variants).
|
||||
- Do not rely on network access in unit tests.
|
||||
- Test both the happy path and edge cases (missing fields, empty collections).
|
||||
- When adding a new vendor parser, include at minimum:
|
||||
- `Detect()` test with a positive and a negative sample file list.
|
||||
- `Parse()` test with a minimal but representative archive.
|
||||
101
docs/bible/10-decisions.md
Normal file
101
docs/bible/10-decisions.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# 10 — Architectural Decision Log (ADL)
|
||||
|
||||
> **Rule:** Every significant architectural decision **must be recorded here** before or alongside
|
||||
> the code change. This applies to humans and AI assistants alike.
|
||||
>
|
||||
> Format: date · title · context · decision · consequences
|
||||
|
||||
---
|
||||
|
||||
## ADL-001 — In-memory only state (no database)
|
||||
|
||||
**Date:** project start
|
||||
**Context:** LOGPile is designed as a standalone diagnostic tool, not a persistent service.
|
||||
**Decision:** All parsed/collected data lives in `Server.result` (in-memory). No database, no files written.
|
||||
**Consequences:**
|
||||
- Data is lost on process restart — intentional.
|
||||
- Simple deployment: single binary, no setup required.
|
||||
- JSON export is the persistence mechanism for users who want to save results.
|
||||
|
||||
---
|
||||
|
||||
## ADL-002 — Vendor parser auto-registration via init()
|
||||
|
||||
**Date:** project start
|
||||
**Context:** Need an extensible parser registry without a central factory function.
|
||||
**Decision:** Each vendor parser registers itself in its package's `init()` function.
|
||||
`vendors/vendors.go` holds blank imports to trigger registration.
|
||||
**Consequences:**
|
||||
- Adding a new parser requires only: implement interface + add one blank import.
|
||||
- No central list to maintain (other than the import file).
|
||||
- `go test ./...` will include new parsers automatically.
|
||||
|
||||
---
|
||||
|
||||
## ADL-003 — Highest-confidence parser wins
|
||||
|
||||
**Date:** project start
|
||||
**Context:** Multiple parsers may partially match an archive (e.g. generic + specific vendor).
|
||||
**Decision:** Run all parsers' `Detect()`, select the one returning the highest score (0–100).
|
||||
**Consequences:**
|
||||
- Generic fallback (score 15) only activates when no vendor parser scores higher.
|
||||
- Parsers must be conservative with high scores (70+) to avoid false positives.
|
||||
|
||||
---
|
||||
|
||||
## ADL-004 — Canonical hardware.devices as single source of truth
|
||||
|
||||
**Date:** v1.5.0
|
||||
**Context:** UI tabs and Reanimator exporter were reading from different sub-fields of
|
||||
`AnalysisResult`, causing potential drift.
|
||||
**Decision:** Introduce `hardware.devices` as the canonical inventory repository.
|
||||
All UI tabs and all exporters must read exclusively from this repository.
|
||||
**Consequences:**
|
||||
- Any UI vs Reanimator discrepancy is classified as a bug, not a "known difference".
|
||||
- Deduplication logic runs once in the repository builder (serial → bdf → distinct).
|
||||
- New hardware attributes must be added to canonical schema first, then mapped to consumers.
|
||||
|
||||
---
|
||||
|
||||
## ADL-005 — No hardcoded PCI model strings; use pci.ids
|
||||
|
||||
**Date:** v1.5.0
|
||||
**Context:** NVIDIA and other vendors release new GPU models frequently; hardcoded maps
|
||||
required code changes for each new model ID.
|
||||
**Decision:** Use the `pciutils/pciids` database (git submodule, embedded at build time).
|
||||
PCI vendor/device ID → human-readable model name via lookup.
|
||||
**Consequences:**
|
||||
- New GPU models can be supported by updating `pci.ids` without code changes.
|
||||
- `make build` auto-syncs `pci.ids` from submodule before compilation.
|
||||
- External override via `LOGPILE_PCI_IDS_PATH` env var.
|
||||
|
||||
---
|
||||
|
||||
## ADL-006 — Reanimator export uses canonical hardware.devices (not raw sub-fields)
|
||||
|
||||
**Date:** v1.5.0
|
||||
**Context:** Early Reanimator exporter read from `Hardware.GPUs`, `Hardware.NICs`, etc.
|
||||
directly, diverging from UI data.
|
||||
**Decision:** Reanimator exporter must use `hardware.devices` — the same source as the UI.
|
||||
Exporter groups/filters canonical records by section; does not rebuild from sub-fields.
|
||||
**Consequences:**
|
||||
- Guarantees UI and export consistency.
|
||||
- Exporter code is simpler — mainly a filter+map, not a data reconstruction.
|
||||
|
||||
---
|
||||
|
||||
## ADL-007 — Documentation language is English
|
||||
|
||||
**Date:** 2026-02-20
|
||||
**Context:** Codebase documentation was mixed Russian/English, reducing clarity for
|
||||
international contributors and AI assistants.
|
||||
**Decision:** All documentation in `docs/bible/` and new inline code documentation
|
||||
must be written in English. Existing Russian-language user-facing README may remain
|
||||
in Russian but architecture docs migrate to English.
|
||||
**Consequences:**
|
||||
- Bible is authoritative in English.
|
||||
- AI assistants get consistent, unambiguous context.
|
||||
|
||||
---
|
||||
|
||||
<!-- Add new decisions below this line using the format above -->
|
||||
38
docs/bible/README.md
Normal file
38
docs/bible/README.md
Normal file
@@ -0,0 +1,38 @@
|
||||
# LOGPile Bible
|
||||
|
||||
> **Documentation language:** English only. All new documentation must be written in English.
|
||||
>
|
||||
> **Architectural decisions:** Every significant architectural decision **must** be recorded in
|
||||
> [`10-decisions.md`](10-decisions.md) before or alongside the code change.
|
||||
|
||||
This directory is the single source of truth for LOGPile's architecture, design, and integration contracts.
|
||||
It is structured so that both humans and AI assistants can navigate it quickly.
|
||||
|
||||
---
|
||||
|
||||
## Contents
|
||||
|
||||
| # | File | What it covers |
|
||||
|---|------|----------------|
|
||||
| 01 | [overview.md](01-overview.md) | Product goals, operating modes, high-level concept |
|
||||
| 02 | [architecture.md](02-architecture.md) | Runtime structure, key flows, in-memory state |
|
||||
| 03 | [api.md](03-api.md) | All HTTP endpoints — contracts, request/response shapes |
|
||||
| 04 | [data-models.md](04-data-models.md) | `AnalysisResult`, canonical `hardware.devices` repository |
|
||||
| 05 | [collectors.md](05-collectors.md) | Live data collectors (Redfish, IPMI scaffold) |
|
||||
| 06 | [parsers.md](06-parsers.md) | Archive parser framework + all vendor parsers |
|
||||
| 07 | [exporters.md](07-exporters.md) | CSV / JSON / Reanimator export + full Reanimator integration spec |
|
||||
| 08 | [build-release.md](08-build-release.md) | Build system, CLI flags, release process |
|
||||
| 09 | [testing.md](09-testing.md) | Testing expectations and guidelines |
|
||||
| 10 | [decisions.md](10-decisions.md) | Architectural decision log (ADL) |
|
||||
|
||||
---
|
||||
|
||||
## Quick orientation for AI assistants
|
||||
|
||||
- Entry point: `cmd/logpile/main.go`
|
||||
- HTTP server: `internal/server/` — handlers in `handlers.go`, routes in `server.go`
|
||||
- Data contracts: `internal/models/` — never break `AnalysisResult` JSON shape
|
||||
- Frontend contract: `web/static/js/app.js` — keep API responses stable
|
||||
- Canonical inventory: `hardware.devices` in `AnalysisResult` — source of truth for UI and exports
|
||||
- Parser registry: `internal/parser/vendors/` — `init()` auto-registration pattern
|
||||
- Collector registry: `internal/collector/registry.go`
|
||||
96
internal/parser/vendors/README.md
vendored
96
internal/parser/vendors/README.md
vendored
@@ -1,96 +0,0 @@
|
||||
# Vendor Parser Modules
|
||||
|
||||
Каждый производитель серверов имеет свой формат диагностических архивов BMC.
|
||||
Эта директория содержит модули парсеров для разных производителей.
|
||||
|
||||
## Структура модуля
|
||||
|
||||
```
|
||||
vendors/
|
||||
├── vendors.go # Импорты всех модулей (добавьте сюда новый)
|
||||
├── README.md # Эта документация
|
||||
├── template/ # Шаблон для нового модуля
|
||||
│ └── parser.go.template
|
||||
├── inspur/ # Модуль Inspur/Kaytus
|
||||
│ ├── parser.go # Основной парсер + регистрация
|
||||
│ ├── sdr.go # Парсинг SDR (сенсоры)
|
||||
│ ├── fru.go # Парсинг FRU (серийники)
|
||||
│ ├── asset.go # Парсинг asset.json
|
||||
│ └── syslog.go # Парсинг syslog
|
||||
├── supermicro/ # Будущий модуль Supermicro
|
||||
├── dell/ # Будущий модуль Dell iDRAC
|
||||
└── hpe/ # Будущий модуль HPE iLO
|
||||
```
|
||||
|
||||
## Как добавить новый модуль
|
||||
|
||||
### 1. Создайте директорию модуля
|
||||
|
||||
```bash
|
||||
mkdir -p internal/parser/vendors/VENDORNAME
|
||||
```
|
||||
|
||||
### 2. Скопируйте шаблон
|
||||
|
||||
```bash
|
||||
cp internal/parser/vendors/template/parser.go.template \
|
||||
internal/parser/vendors/VENDORNAME/parser.go
|
||||
```
|
||||
|
||||
### 3. Отредактируйте parser.go
|
||||
|
||||
- Замените `VENDORNAME` на идентификатор вендора (например, `supermicro`)
|
||||
- Замените `VENDOR_DESCRIPTION` на описание (например, `Supermicro`)
|
||||
- Реализуйте метод `Detect()` для определения формата
|
||||
- Реализуйте метод `Parse()` для парсинга данных
|
||||
|
||||
### 4. Зарегистрируйте модуль
|
||||
|
||||
Добавьте импорт в `vendors/vendors.go`:
|
||||
|
||||
```go
|
||||
import (
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/inspur"
|
||||
_ "git.mchus.pro/mchus/logpile/internal/parser/vendors/VENDORNAME" // Новый модуль
|
||||
)
|
||||
```
|
||||
|
||||
### 5. Готово!
|
||||
|
||||
Модуль автоматически зарегистрируется при старте приложения через `init()`.
|
||||
|
||||
## Интерфейс VendorParser
|
||||
|
||||
```go
|
||||
type VendorParser interface {
|
||||
// Name возвращает человекочитаемое имя парсера
|
||||
Name() string
|
||||
|
||||
// Vendor возвращает идентификатор вендора
|
||||
Vendor() string
|
||||
|
||||
// Detect проверяет, подходит ли этот парсер для файлов
|
||||
// Возвращает уверенность 0-100 (0 = не подходит, 100 = точно этот формат)
|
||||
Detect(files []ExtractedFile) int
|
||||
|
||||
// Parse парсит извлеченные файлы
|
||||
Parse(files []ExtractedFile) (*models.AnalysisResult, error)
|
||||
}
|
||||
```
|
||||
|
||||
## Советы по реализации Detect()
|
||||
|
||||
- Ищите уникальные файлы/директории для данного вендора
|
||||
- Проверяйте содержимое файлов на характерные маркеры
|
||||
- Возвращайте высокий confidence (70+) только при уверенном совпадении
|
||||
- Несколько парсеров могут вернуть >0, выбирается с максимальным confidence
|
||||
|
||||
## Поддерживаемые вендоры
|
||||
|
||||
| Вендор | Идентификатор | Статус | Протестировано на |
|
||||
|--------|---------------|--------|-------------------|
|
||||
| Inspur/Kaytus | `inspur` | ✅ Готов | KR4268X2 (onekeylog) |
|
||||
| Supermicro | `supermicro` | ⏳ Планируется | - |
|
||||
| Dell iDRAC | `dell` | ⏳ Планируется | - |
|
||||
| HPE iLO | `hpe` | ⏳ Планируется | - |
|
||||
| Lenovo XCC | `lenovo` | ⏳ Планируется | - |
|
||||
72
internal/parser/vendors/generic/README.md
vendored
72
internal/parser/vendors/generic/README.md
vendored
@@ -1,72 +0,0 @@
|
||||
# Generic Text File Parser
|
||||
|
||||
Fallback парсер для текстовых файлов, которые не распознаны другими парсерами.
|
||||
|
||||
## Назначение
|
||||
|
||||
Этот парсер обрабатывает любые текстовые файлы, которые:
|
||||
- Не являются архивами специфичных вендоров
|
||||
- Содержат текстовую информацию (не бинарные данные)
|
||||
- Представляют собой одиночные .gz файлы или простые текстовые файлы
|
||||
|
||||
## Приоритет
|
||||
|
||||
**Confidence score: 15** (низкий приоритет)
|
||||
|
||||
Этот парсер срабатывает только если ни один другой парсер не подошел с более высоким confidence.
|
||||
|
||||
## Поддерживаемые файлы
|
||||
|
||||
### Автоматически распознаваемые типы
|
||||
|
||||
1. **NVIDIA Bug Report** (`nvidia-bug-report-*.log.gz`)
|
||||
- Извлекает информацию о драйвере NVIDIA
|
||||
- Находит GPU устройства
|
||||
- Показывает версию драйвера
|
||||
|
||||
2. **Любые текстовые файлы**
|
||||
- Проверяет, что содержимое - текст (не бинарные данные)
|
||||
- Показывает базовую информацию о файле
|
||||
|
||||
## Извлекаемые данные
|
||||
|
||||
### Events
|
||||
|
||||
- **Text File**: Базовая информация о загруженном файле
|
||||
- **Driver Info**: Информация о NVIDIA драйвере (для nvidia-bug-report)
|
||||
- **GPU Device**: Обнаруженные GPU устройства (для nvidia-bug-report)
|
||||
|
||||
## Пример использования
|
||||
|
||||
```bash
|
||||
# Запуск с nvidia-bug-report
|
||||
./logpile --file nvidia-bug-report-*.log.gz
|
||||
|
||||
# Запуск с любым текстовым файлом
|
||||
./logpile --file system.log.gz
|
||||
```
|
||||
|
||||
## Версионирование
|
||||
|
||||
**Текущая версия парсера:** 1.0.0
|
||||
|
||||
## Ограничения
|
||||
|
||||
1. Этот парсер предоставляет только базовую информацию
|
||||
2. Не выполняет глубокий анализ содержимого
|
||||
3. Для детального анализа специфичных логов рекомендуется создать dedicated парсер
|
||||
|
||||
## Расширение
|
||||
|
||||
Чтобы добавить поддержку нового типа файлов:
|
||||
|
||||
1. Добавьте проверку в функцию `Parse()`
|
||||
2. Создайте функцию `parseXXX()` для извлечения специфичной информации
|
||||
3. Увеличьте версию парсера
|
||||
|
||||
Пример:
|
||||
```go
|
||||
if strings.Contains(strings.ToLower(file.Path), "custom-log") {
|
||||
parseCustomLog(content, result)
|
||||
}
|
||||
```
|
||||
175
internal/parser/vendors/nvidia/README.md
vendored
175
internal/parser/vendors/nvidia/README.md
vendored
@@ -1,175 +0,0 @@
|
||||
# NVIDIA Field Diagnostics Parser
|
||||
|
||||
Парсер для диагностических архивов NVIDIA HGX Field Diagnostics.
|
||||
Универсальный парсер, не привязанный к конкретному производителю серверов.
|
||||
|
||||
## Поддерживаемые архивы
|
||||
|
||||
- NVIDIA HGX Field Diag (работает с любыми серверами: Supermicro, Dell, HPE, и т.д.)
|
||||
- Архивы с результатами GPU диагностики NVIDIA
|
||||
|
||||
## Формат архива
|
||||
|
||||
Парсер работает с архивами в формате:
|
||||
- `.tar` (несжатый tar)
|
||||
- `.tar.gz` (сжатый gzip)
|
||||
|
||||
## Распознаваемые файлы
|
||||
|
||||
### Основные файлы
|
||||
|
||||
1. **output.log** - вывод dmidecode с информацией о системе
|
||||
- Производитель сервера (Manufacturer)
|
||||
- Модель сервера (Product Name) - например, SYS-821GE-TNHR
|
||||
- Серийный номер сервера (Serial Number) - например, A514359X5A07900
|
||||
- UUID, SKU Number, Family
|
||||
|
||||
2. **unified_summary.json** - детальная информация о системе и компонентах
|
||||
- Информация о GPU (модель, производитель, VBIOS, PCI адреса)
|
||||
- Информация о NVSwitch (VendorID, DeviceID, Link speed/width)
|
||||
- Информация о производителе и модели сервера
|
||||
|
||||
3. **summary.json** - результаты тестов диагностики
|
||||
- Результаты тестов GPU (inforom, checkinforom, gpumem, gpustress, pcie, nvlink, nvswitch, power)
|
||||
- Коды ошибок и статусы тестов
|
||||
|
||||
4. **summary.csv** - альтернативный формат результатов тестов
|
||||
|
||||
### Дополнительные файлы
|
||||
|
||||
- `gpu_fieldiag/*.log` - детальные логи диагностики каждого GPU
|
||||
- `inventory/*.json` - дополнительная информация о конфигурации
|
||||
|
||||
## Извлекаемые данные
|
||||
|
||||
### Hardware Configuration
|
||||
|
||||
#### GPUs
|
||||
```json
|
||||
{
|
||||
"slot": "GPUSXM1",
|
||||
"model": "NVIDIA Device 2335",
|
||||
"manufacturer": "NVIDIA Corporation",
|
||||
"firmware": "96.00.D0.00.03",
|
||||
"bdf": "0000:3a:00.0"
|
||||
}
|
||||
```
|
||||
|
||||
#### NVSwitch (как PCIe устройства)
|
||||
```json
|
||||
{
|
||||
"slot": "NVSWITCHNVSWITCH0",
|
||||
"device_class": "NVSwitch",
|
||||
"manufacturer": "NVIDIA Corporation",
|
||||
"vendor_id": 4318,
|
||||
"device_id": 8867,
|
||||
"bdf": "0000:05:00.0",
|
||||
"link_speed": "16GT/s",
|
||||
"link_width": 2
|
||||
}
|
||||
```
|
||||
|
||||
### Events
|
||||
|
||||
События создаются для:
|
||||
- **Предупреждений и ошибок** тестов диагностики
|
||||
- Примеры событий:
|
||||
- `Row remapping failed` - ошибка памяти GPU (Warning)
|
||||
- Различные тесты: connectivity, gpumem, gpustress, pcie, nvlink, nvswitch, power
|
||||
|
||||
Уровни severity:
|
||||
- `info` - информационные события (тесты прошли успешно)
|
||||
- `warning` - предупреждения (например, Row remapping failed)
|
||||
- `critical` - критические ошибки (коды ошибок 300+)
|
||||
|
||||
## Пример использования
|
||||
|
||||
```bash
|
||||
# Запуск веб-интерфейса
|
||||
./logpile --file /path/to/A514359X5A07900_logs-20260122-074208.tar
|
||||
|
||||
# Веб-интерфейс будет доступен на http://localhost:8082
|
||||
```
|
||||
|
||||
## Автоопределение
|
||||
|
||||
Парсер автоматически определяет архивы NVIDIA Field Diag по наличию:
|
||||
- `unified_summary.json` с маркером "HGX Field Diag"
|
||||
- `summary.json` и `summary.csv` с результатами тестов
|
||||
- Директории `gpu_fieldiag/`
|
||||
|
||||
Confidence score:
|
||||
- `unified_summary.json` с маркером "HGX Field Diag": +40
|
||||
- `summary.json`: +20
|
||||
- `summary.csv`: +15
|
||||
- `gpu_fieldiag/` directory: +15
|
||||
|
||||
## Версионирование
|
||||
|
||||
**Текущая версия парсера:** 1.1.0
|
||||
|
||||
При модификации логики парсера необходимо увеличивать версию в константе `parserVersion` в файле `parser.go`.
|
||||
|
||||
### История версий
|
||||
|
||||
- **1.1.0** - Добавлен парсинг output.log (dmidecode) для извлечения модели и серийного номера сервера
|
||||
- **1.0.0** - Первоначальная версия с парсингом unified_summary.json и summary.json/csv
|
||||
|
||||
## Примеры данных
|
||||
|
||||
### Пример unified_summary.json
|
||||
```json
|
||||
{
|
||||
"runInfo": {
|
||||
"diagVersion": "24287-XXXX-FLD-42658",
|
||||
"diagName": "HGX Field Diag",
|
||||
"finalResult": "FAIL",
|
||||
"errorCode": 363
|
||||
},
|
||||
"tests": [{
|
||||
"virtualId": "inventory",
|
||||
"components": [{
|
||||
"componentId": "GPUSXM1",
|
||||
"properties": [
|
||||
{"id": "Manufacturer", "value": "Any Server Vendor"},
|
||||
{"id": "VendorID", "value": "10de"},
|
||||
{"id": "DeviceID", "value": "2335"}
|
||||
]
|
||||
}]
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Пример summary.json
|
||||
```json
|
||||
[
|
||||
{
|
||||
"Error Code": "005-000-1-000000000363",
|
||||
"Test": "gpumem",
|
||||
"Component ID": "SXM5_SN_1653925025497",
|
||||
"Notes": "Row remapping failed",
|
||||
"Virtual ID": "gpumem"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Известные ограничения
|
||||
|
||||
1. Парсер фокусируется на данных из `unified_summary.json` и `summary.json`
|
||||
2. Детальные логи из `gpu_fieldiag/*.log` пока не парсятся
|
||||
3. Информация о CPU, памяти и дисках не извлекается (в архиве отсутствует)
|
||||
|
||||
## Разработка
|
||||
|
||||
### Добавление новых полей
|
||||
|
||||
1. Изучите структуру JSON в архиве
|
||||
2. Добавьте поля в структуры `Component` или `Property`
|
||||
3. Обновите функции `parseGPUComponent` или `parseNVSwitchComponent`
|
||||
4. Увеличьте версию парсера
|
||||
|
||||
### Добавление новых типов файлов
|
||||
|
||||
1. Создайте новый файл с парсером (например, `gpu_logs.go`)
|
||||
2. Добавьте парсинг в функцию `Parse()` в `parser.go`
|
||||
3. Обновите документацию
|
||||
275
internal/parser/vendors/nvidia_bug_report/README.md
vendored
275
internal/parser/vendors/nvidia_bug_report/README.md
vendored
@@ -1,275 +0,0 @@
|
||||
# NVIDIA Bug Report Parser
|
||||
|
||||
Парсер для файлов nvidia-bug-report, генерируемых скриптом `nvidia-bug-report.sh`.
|
||||
|
||||
## Назначение
|
||||
|
||||
Этот парсер обрабатывает диагностические логи NVIDIA драйверов и извлекает:
|
||||
- Информацию о модулях памяти (из dmidecode)
|
||||
- Информацию о GPU устройствах
|
||||
- Версию NVIDIA драйвера
|
||||
|
||||
## Формат файла
|
||||
|
||||
- Имя файла: `nvidia-bug-report-*.log.gz`
|
||||
- Формат: Gzip-сжатый текстовый файл
|
||||
- Генерируется: `nvidia-bug-report.sh` скриптом
|
||||
|
||||
## Confidence Score
|
||||
|
||||
**85** - высокий приоритет для файлов nvidia-bug-report
|
||||
|
||||
## Извлекаемые данные
|
||||
|
||||
### 1. System Information (из dmidecode)
|
||||
|
||||
Информация о сервере:
|
||||
- **Serial Number**: Серийный номер сервера (например, 2KD501412)
|
||||
- **UUID**: Уникальный идентификатор системы (например, 2e4054bc-1dd2-11b2-0284-6b0a21737950)
|
||||
- **Manufacturer**: Производитель сервера
|
||||
- **Product Name**: Модель сервера
|
||||
- **Version**: Версия системы
|
||||
|
||||
### 2. CPU Information (из dmidecode)
|
||||
|
||||
Для каждого процессора извлекается:
|
||||
- **Model**: Модель процессора (например, Intel(R) Xeon(R) Platinum 8480+)
|
||||
- **Serial Number**: Серийный номер (например, 5DB0D6C0DD30ABD8)
|
||||
- **Core Count**: Количество ядер (например, 56)
|
||||
- **Thread Count**: Количество потоков (например, 112)
|
||||
- **Max Speed**: Максимальная частота (например, 3800 MHz)
|
||||
- **Current Speed**: Текущая частота (например, 2000 MHz)
|
||||
|
||||
Пример:
|
||||
```
|
||||
Socket 0: Intel(R) Xeon(R) Platinum 8480+
|
||||
Serial Number: 5DB0D6C0DD30ABD8
|
||||
Cores: 56, Threads: 112
|
||||
Frequency: 2000 MHz (Max: 3800 MHz)
|
||||
```
|
||||
|
||||
### 3. Memory Modules (из dmidecode)
|
||||
|
||||
Для каждого модуля памяти извлекается:
|
||||
- **Slot/Location**: Например, CPU0_C0D0
|
||||
- **Size**: Размер в GB (например, 64 GB)
|
||||
- **Type**: Тип памяти (DDR5, DDR4, etc.)
|
||||
- **Manufacturer**: Производитель (Hynix, Samsung, Micron, etc.)
|
||||
- **Part Number**: P/N модуля (например, HMCG94AGBRA179N)
|
||||
- **Serial Number**: S/N модуля (например, 80AD0224322B3834E6)
|
||||
- **Speed**: Max/Current скорость (например, 5600/4400 MHz)
|
||||
- **Ranks**: Количество рангов
|
||||
|
||||
Пример:
|
||||
```
|
||||
Slot: CPU0_C0D0
|
||||
Size: 64 GB
|
||||
Type: DDR5
|
||||
Manufacturer: Hynix
|
||||
Part Number: HMCG94AGBRA179N
|
||||
Serial Number: 80AD0224322B3834E6
|
||||
Speed: 5600 MT/s (configured: 4400 MT/s)
|
||||
Ranks: 2
|
||||
```
|
||||
|
||||
### 4. Power Supplies (из dmidecode)
|
||||
|
||||
Для каждого блока питания извлекается:
|
||||
- **Location**: Позиция (например, PSU0, PSU1)
|
||||
- **Manufacturer**: Производитель (например, DELTA, Great Wall)
|
||||
- **Model Part Number**: Модель БП (например, V0310DT000000000)
|
||||
- **Serial Number**: Серийный номер (например, DGPLV251500LZ)
|
||||
- **Max Power Capacity**: Максимальная мощность (например, 2700 W)
|
||||
- **Revision**: Версия прошивки (например, 00.01.04)
|
||||
- **Status**: Статус (например, Present, OK)
|
||||
|
||||
Пример:
|
||||
```
|
||||
PSU0: V0310DT000000000 (DELTA)
|
||||
Serial Number: DGPLV251500LZ
|
||||
Power: 2700 W, Revision: 00.01.04
|
||||
Status: Present, OK
|
||||
```
|
||||
|
||||
### 5. Network Adapters (из lspci)
|
||||
|
||||
Для каждого сетевого адаптера (Ethernet, Network, InfiniBand) извлекается:
|
||||
- **Model**: Полное название модели из VPD (например, "NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP, PCIe 5.0 x16")
|
||||
- **Location**: PCI BDF адрес (например, 0000:0e:00.0)
|
||||
- **Slot**: Физический слот (например, 108)
|
||||
- **Part Number**: P/N адаптера (например, MCX75310AAS-NEAT)
|
||||
- **Serial Number**: S/N адаптера (например, MT2430600249)
|
||||
- **Vendor**: Производитель (Mellanox, NVIDIA)
|
||||
- **Vendor ID / Device ID**: PCI идентификаторы (например, 15b3:1021)
|
||||
- **Port Count**: Количество портов (определяется из модели: Dual-port = 2, Single-port = 1)
|
||||
- **Port Type**: Тип портов (QSFP56, OSFP, SFP+)
|
||||
|
||||
Пример:
|
||||
```
|
||||
0000:0e:00.0: NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP
|
||||
Slot: 108
|
||||
P/N: MCX75310AAS-NEAT
|
||||
S/N: MT2430600249
|
||||
Ports: 1 x OSFP
|
||||
```
|
||||
|
||||
### 6. GPU Devices
|
||||
|
||||
Для каждого GPU извлекается:
|
||||
- **Model**: Модель GPU (например, NVIDIA H100 80GB HBM3)
|
||||
- **BDF (Bus:Device.Function)**: PCI адрес (например, 0000:0f:00.0)
|
||||
- **UUID**: Уникальный идентификатор GPU (например, GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3)
|
||||
- **Video BIOS**: Версия BIOS видеокарты (например, 96.00.99.00.01)
|
||||
- **IRQ**: Прерывание (например, 17)
|
||||
- **Bus Type**: Тип шины (PCIe)
|
||||
- **DMA Size**: Размер DMA (например, 52 bits)
|
||||
- **DMA Mask**: Маска DMA (например, 0xfffffffffffff)
|
||||
- **Device Minor**: Номер устройства (например, 0)
|
||||
- **Manufacturer**: NVIDIA
|
||||
|
||||
Пример:
|
||||
```
|
||||
0000:0f:00.0: NVIDIA H100 80GB HBM3
|
||||
UUID: GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3
|
||||
Video BIOS: 96.00.99.00.01
|
||||
IRQ: 17
|
||||
```
|
||||
|
||||
### 7. Events
|
||||
|
||||
- **Memory Configuration**: Сводка по модулям памяти (количество, производители, общий размер)
|
||||
- **GPU Detection**: Обнаруженные GPU устройства
|
||||
- **Driver Version**: Версия NVIDIA драйвера
|
||||
|
||||
## Пример использования
|
||||
|
||||
```bash
|
||||
# Запуск с nvidia-bug-report файлом
|
||||
./logpile --file nvidia-bug-report-2KD501412.log.gz
|
||||
|
||||
# Веб-интерфейс будет доступен на http://localhost:8082
|
||||
```
|
||||
|
||||
## Пример вывода
|
||||
|
||||
```
|
||||
✓ Detected vendor: NVIDIA Bug Report Parser
|
||||
✓ CPUs: 2
|
||||
✓ Memory: 32 modules
|
||||
✓ Power Supplies: 8
|
||||
✓ GPUs: 8
|
||||
✓ Network Adapters: 12
|
||||
|
||||
System Information:
|
||||
Serial Number: 2KD501412
|
||||
UUID: 2e4054bc-1dd2-11b2-0284-6b0a21737950
|
||||
Version: 0
|
||||
|
||||
CPU Information:
|
||||
Socket 0: Intel(R) Xeon(R) Platinum 8480+
|
||||
S/N: 5DB0D6C0DD30ABD8, Cores: 56, Threads: 112
|
||||
Socket 1: Intel(R) Xeon(R) Platinum 8480+
|
||||
S/N: 5DB017C05685B3ED, Cores: 56, Threads: 112
|
||||
|
||||
Power Supplies:
|
||||
PSU0: V0310DT000000000 (DELTA)
|
||||
S/N: DGPLV251500LZ
|
||||
Power: 2700 W, Revision: 00.01.04
|
||||
Status: Present, OK
|
||||
PSU1: V0310DT000000000 (DELTA)
|
||||
S/N: DGPLV251500GY
|
||||
Power: 2700 W, Revision: 00.01.04
|
||||
Status: Present, OK
|
||||
[... 6 more PSUs ...]
|
||||
|
||||
Memory Modules:
|
||||
CPU0_C0D0: 64 GB, Hynix
|
||||
P/N: HMCG94AGBRA179N, S/N: 80AD0224322B3834E6
|
||||
Type: DDR5, Speed: 4400/5600 MHz
|
||||
[... 31 more modules ...]
|
||||
|
||||
Network Adapters: 12 devices
|
||||
0000:0e:00.0: NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default mode), Single-port OSFP
|
||||
Slot: 108
|
||||
P/N: MCX75310AAS-NEAT
|
||||
S/N: MT2430600249
|
||||
Ports: 1 x OSFP
|
||||
0000:1f:00.0: ConnectX-6 Dx EN adapter card, 100GbE, Dual-port QSFP56
|
||||
Slot: 12
|
||||
P/N: MCX623106AN-CDAT
|
||||
S/N: MT2434J00PCD
|
||||
Ports: 2 x QSFP56
|
||||
[... 10 more adapters ...]
|
||||
|
||||
GPUs: 8 devices
|
||||
0000:0f:00.0: NVIDIA H100 80GB HBM3
|
||||
UUID: GPU-64674e47-e036-c12a-3e8d-55a2a9ac8db3
|
||||
Video BIOS: 96.00.99.00.01
|
||||
IRQ: 17
|
||||
0000:34:00.0: NVIDIA H100 80GB HBM3
|
||||
UUID: GPU-fa796345-c23a-54aa-1b67-709ac2542852
|
||||
Video BIOS: 96.00.99.00.01
|
||||
IRQ: 16
|
||||
[... 6 more GPUs ...]
|
||||
```
|
||||
|
||||
## Версионирование
|
||||
|
||||
**Текущая версия парсера:** 1.0.0
|
||||
|
||||
### История версий
|
||||
|
||||
- **1.0.0** - Первоначальная версия с парсингом System Info, CPU, Memory, PSU, GPU, Network Adapters и Driver
|
||||
|
||||
## Структура данных
|
||||
|
||||
Парсер использует следующие секции в bug report:
|
||||
1. **dmidecode output (System Information)** - для извлечения информации о сервере
|
||||
2. **dmidecode output (Processor Information)** - для извлечения информации о CPU
|
||||
3. **dmidecode output (Memory Device)** - для извлечения информации о памяти
|
||||
4. **dmidecode output (System Power Supply)** - для извлечения информации о блоках питания
|
||||
5. **lspci -vvv output (Ethernet/Network/Infiniband controller)** - для извлечения информации о сетевых адаптерах
|
||||
6. **lspci VPD (Vital Product Data)** - для извлечения P/N, S/N и модели сетевых адаптеров
|
||||
7. **/proc/driver/nvidia/gpus/.../information** - для детальной информации о GPU
|
||||
8. **NVRM version** - для версии драйвера
|
||||
|
||||
## Известные ограничения
|
||||
|
||||
1. Ошибки и предупреждения из логов пока не извлекаются
|
||||
2. Некоторые специфичные характеристики GPU (температура, утилизация) не парсятся
|
||||
3. Информация о производительности и метрики GPU требуют парсинга других секций
|
||||
|
||||
## Расширение
|
||||
|
||||
Для добавления новых возможностей:
|
||||
|
||||
1. **Ошибки драйвера**: Парсить секции с ошибками NVIDIA драйвера
|
||||
2. **nvidia-smi output**: Извлекать детальную информацию из вывода nvidia-smi (температура, утилизация)
|
||||
3. **GPU производительность**: Парсить метрики производительности и использования памяти GPU
|
||||
4. **PCIe информация**: Извлекать детали о PCIe конфигурации (скорость линка, ширина)
|
||||
|
||||
## Пример структуры файла
|
||||
|
||||
```
|
||||
Start of NVIDIA bug report log file
|
||||
nvidia-bug-report.sh Version: 34275561
|
||||
Date: Thu Jul 17 18:18:18 EDT 2025
|
||||
|
||||
[... system info ...]
|
||||
|
||||
Memory Device
|
||||
Data Width: 64 bits
|
||||
Size: 64 GB
|
||||
Form Factor: DIMM
|
||||
Locator: CPU0_C0D0
|
||||
Type: DDR5
|
||||
Speed: 5600 MT/s
|
||||
Manufacturer: Hynix
|
||||
Serial Number: 80AD0224322B3834E6
|
||||
Part Number: HMCG94AGBRA179N
|
||||
|
||||
[... more memory modules ...]
|
||||
|
||||
*** /proc/driver/nvidia/./gpus/0000:0f:00.0/power
|
||||
[... GPU info ...]
|
||||
```
|
||||
133
internal/parser/vendors/supermicro/README.md
vendored
133
internal/parser/vendors/supermicro/README.md
vendored
@@ -1,133 +0,0 @@
|
||||
# SMC Crash Dump Parser
|
||||
|
||||
Парсер для архивов Supermicro (SMC) BMC Crash Dump.
|
||||
|
||||
## Поддерживаемые серверы
|
||||
|
||||
- Supermicro SYS-821GE-TNHR
|
||||
- Другие серверы Supermicro с BMC Crashdump функциональностью
|
||||
|
||||
## Формат архива
|
||||
|
||||
Парсер работает с архивами в формате:
|
||||
- `.tgz` / `.tar.gz` (сжатый tar)
|
||||
- `.tar` (несжатый tar)
|
||||
|
||||
## Распознаваемые файлы
|
||||
|
||||
### Основные файлы
|
||||
|
||||
1. **CDump.txt** - JSON файл с данными crashdump
|
||||
- Metadata (BMC, BIOS, ME версии firmware)
|
||||
- CPU информация (CPUID, количество ядер, microcode версия, PPIN)
|
||||
- MCA (Machine Check Architecture) данные - ошибки процессоров
|
||||
|
||||
## Извлекаемые данные
|
||||
|
||||
### Hardware Configuration
|
||||
|
||||
#### CPUs
|
||||
```json
|
||||
{
|
||||
"slot": "CPU0",
|
||||
"model": "CPUID: 0xc06f2",
|
||||
"cores": 56,
|
||||
"manufacturer": "Intel",
|
||||
"firmware": "Microcode: 0x210002b3"
|
||||
}
|
||||
```
|
||||
|
||||
### FRU Information
|
||||
|
||||
- BMC Firmware Version
|
||||
- BIOS Version
|
||||
- ME Firmware Version
|
||||
- CPU PPIN (Protected Processor Inventory Number)
|
||||
|
||||
### Events
|
||||
|
||||
События создаются для:
|
||||
- **Crashdump collection** - когда был собран crashdump
|
||||
- **MCA Errors** - ошибки Machine Check Architecture
|
||||
- Corrected errors (Warning severity)
|
||||
- Uncorrected errors (Critical severity)
|
||||
|
||||
Уровни severity:
|
||||
- `info` - информационные события (crashdump по запросу)
|
||||
- `warning` - предупреждения (corrected MCA errors, reset detected)
|
||||
- `critical` - критические ошибки (uncorrected MCA errors)
|
||||
|
||||
## Пример использования
|
||||
|
||||
```bash
|
||||
# Запуск веб-интерфейса
|
||||
./logpile --file /path/to/CDump_090859_01302026.tgz
|
||||
|
||||
# Веб-интерфейс будет доступен на http://localhost:8082
|
||||
```
|
||||
|
||||
## Автоопределение
|
||||
|
||||
Парсер автоматически определяет архивы SMC Crash Dump по наличию:
|
||||
- `CDump.txt` с маркерами "crash_data", "METADATA", "bmc_fw_ver"
|
||||
|
||||
Confidence score:
|
||||
- `CDump.txt` с маркерами crashdump: +80
|
||||
|
||||
## Версионирование
|
||||
|
||||
**Текущая версия парсера:** 1.0.0
|
||||
|
||||
При модификации логики парсера необходимо увеличивать версию в константе `parserVersion` в файле `parser.go`.
|
||||
|
||||
## Примеры данных
|
||||
|
||||
### Пример CDump.txt (metadata)
|
||||
```json
|
||||
{
|
||||
"crash_data": {
|
||||
"METADATA": {
|
||||
"cpu0": {
|
||||
"cpuid": "0xc06f2",
|
||||
"core_count": "0x38",
|
||||
"ppin": "0xa3ccbe7d45026592",
|
||||
"ucode_patch_ver": "0x210002b3"
|
||||
},
|
||||
"bmc_fw_ver": "01.03.18",
|
||||
"bios_id": "BIOS Date: 08/04/2025 Rev 2.7",
|
||||
"me_fw_ver": "6.1.4.204",
|
||||
"timestamp": "2026-01-30T09:06:52Z",
|
||||
"trigger_type": "On-Demand"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### MCA Error Detection
|
||||
|
||||
Парсер проверяет регистры MCA status на наличие ошибок:
|
||||
- Bit 63 (Valid) - индикатор валидной ошибки
|
||||
- Bit 61 (UC) - uncorrected error
|
||||
- Bit 60 (EN) - error enabled
|
||||
|
||||
## Известные ограничения
|
||||
|
||||
1. Парсер фокусируется на данных из `CDump.txt`
|
||||
2. Детальный анализ MCA errors пока упрощен (только проверка status регистров)
|
||||
3. TOR dump и другие расширенные данные пока не парсятся
|
||||
|
||||
## Разработка
|
||||
|
||||
### Добавление новых полей
|
||||
|
||||
1. Изучите структуру JSON в CDump.txt
|
||||
2. Добавьте поля в структуры `Metadata`, `CPUMetadata`, или `MCAData`
|
||||
3. Обновите функции парсинга
|
||||
4. Увеличьте версию парсера
|
||||
|
||||
### Расширение MCA анализа
|
||||
|
||||
Для более детального анализа MCA ошибок можно:
|
||||
1. Добавить декодирование MCA error codes
|
||||
2. Парсить MISC и ADDR регистры
|
||||
3. Добавить корреляцию ошибок между банками
|
||||
46
internal/parser/vendors/xigmanas/README.md
vendored
46
internal/parser/vendors/xigmanas/README.md
vendored
@@ -1,46 +0,0 @@
|
||||
# Xigmanas Parser
|
||||
|
||||
Parser for Xigmanas (FreeBSD-based NAS) system logs.
|
||||
|
||||
## Supported Files
|
||||
|
||||
- `xigmanas` - Main system log file with configuration and status information
|
||||
- `dmesg` - Kernel messages and hardware initialization information
|
||||
- SMART data from disk monitoring
|
||||
|
||||
## Features
|
||||
|
||||
This parser extracts the following information from Xigmanas logs:
|
||||
|
||||
### System Information
|
||||
- Firmware version
|
||||
- System uptime
|
||||
- CPU model and specifications
|
||||
- Memory configuration
|
||||
- Hardware platform information
|
||||
|
||||
### Storage Information
|
||||
- Disk models and serial numbers
|
||||
- Disk capacity and health status
|
||||
- SMART temperature readings
|
||||
|
||||
### Hardware Configuration
|
||||
- CPU information
|
||||
- Memory modules
|
||||
- Storage devices
|
||||
|
||||
## Detection Logic
|
||||
|
||||
The parser detects Xigmanas format by looking for:
|
||||
- Files with "xigmanas", "system", or "dmesg" in their names
|
||||
- Content containing "XigmaNAS" or "FreeBSD" strings
|
||||
- SMART-related information in log content
|
||||
|
||||
## Example Output
|
||||
|
||||
The parser populates the following fields in AnalysisResult:
|
||||
- `Hardware.Firmware` - Firmware versions
|
||||
- `Hardware.CPUs` - CPU information
|
||||
- `Hardware.Memory` - Memory configuration
|
||||
- `Hardware.Storage` - Storage devices with SMART data
|
||||
- `Sensors` - Temperature readings from SMART data
|
||||
Reference in New Issue
Block a user