reanimator/bee - bee - MCHUS git PRO

Author	SHA1	Message	Date
Mikhail Chusavitin	85d1acdaa3	Split validate/stress into separate fixed-mode pages - Check (2): validate mode only — no mode switcher, no stress-only cards (nvidia-targeted-stress, nvidia-targeted-power, nvidia-pulse hidden) - Load (3): stress mode only — no mode switcher, all cards shown - satStressMode() hardcoded per page; satModeChanged() removed - Profile card with radio buttons removed from both pages - Replaced with simple Run All button + est. time Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-18 19:12:17 +03:00
Mikhail Chusavitin	a2d7513153	Restructure nav to Load/Burn/Benchmark; fix SAA acpidump dependency - Nav steps 3-5: Load (validate), Burn (burn-in), Benchmark (speed+endurance merged) - /load now renders validate mode; /burn renders burn-in; /benchmark replaces /speed+/endurance - Legacy redirects updated: /validate→/load, /burn-in→/burn, /speed+/endurance→/benchmark - Add acpica_bin/acpidump from SAA 1.5.0 package; required by saa GetDmiInfo (ExitCode 8) - build.sh copies acpica_bin/acpidump to /usr/local/bin/acpica_bin/ alongside saa Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-18 19:07:51 +03:00
Mikhail Chusavitin	271dadda03	Restructure web UI navigation into 7 numbered workflow stages Replace the flat menu (Dashboard, Audit, Validate, Burn, Benchmark, Tasks, Tools) with a numbered progression that guides engineers through a logical acceptance workflow: Dashboard (landing) → 1. Audit → 2. Check → 3. Load → 4. Speed → 5. Endurance → 6. Tools → 7. Settings Key changes: - layout.go: numbered nav labels, new hrefs, Tasks removed from nav and replaced with a persistent sidebar badge (polls /api/tasks every 5 s, highlights amber when tasks are active) - server.go: 301 redirects from /validate→/check, /burn→/load, /benchmark→/speed for backward compatibility - pages.go: dispatch cases for all new routes; old routes kept as fallbacks - page_validate.go: add renderCheck() — non-destructive check page with validate-mode tests only (no stress toggle, no targeted-stress/ targeted-power/pulse cards) - page_burn.go: add renderLoad() wrapper; update scope alert to reference /check instead of /validate - page_benchmark.go: add renderSpeed() (performance focus) and renderEndurance() (stability/overnight focus) wrappers - page_settings.go: new Settings page with blackbox logging toggle, NVIDIA driver reset, and build info - server_test.go: update five tests to use new route names and content expectations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-18 11:00:02 +03:00
Michael Chus	7d2e904d14	Bring codebase into compliance with bible contracts (A–E) A (hardware-ingest-json v2.8-2.9): remove sensor location fields from schema and collector; tag HardwareMemory.Location as json:"-"; add PlatformConfig to HardwareSnapshot. B (no-hardcoded-vendors): consolidate PCI vendor IDs into collector/pci_vendors.go; replace all vendor-name string checks in isGPUDevice, isNVIDIADevice, isMellanoxDevice, isAMDGPUDevice, matchesGPUVendor (sat_overlay), and validateIsVendorGPU (page_validate) with numeric vendor_id comparisons. C (module-structure): split app/app.go (1413 lines) into app.go + app_format.go, app_network.go, app_services.go, app_packs.go, app_install.go — no logic changes. D (go-code-style): wrap bare return err in interfaceAdminState and interfaceIPv4Addrs (platform/network.go) with fmt.Errorf context including the interface name. E (go-project-bible): add bible-local/architecture/data-model.md and bible-local/architecture/api-surface.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-13 14:32:08 +03:00
Michael Chus	e169a7722c	Fix NVMe SMART status always Unknown; fix GPU count including NVSwitches nvme-cli emits smart-log counters as JSON strings and uses field names avail_spare / percent_used instead of the prose names in the NVMe spec. The nvmeSmartLog struct had int64 fields with wrong JSON tags — Unmarshal returned an error and the whole health block was skipped, leaving every NVMe drive with status=Unknown. Fix: switch all numeric fields to jsonInt64 (already used for lsblk block sizes) which accepts both bare numbers and quoted strings, and correct the avail_spare / percent_used tag names. Also fix validateIsVendorGPU for NVIDIA: previously counted any NVIDIA PCIe device (including NVSwitch bridges) as a GPU, producing wrong estimates (12 instead of 8 on an HGX H100 system). Now requires device_class to be videocontroller or processingaccelerator, matching the existing AMD filter logic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-04 18:06:32 +03:00
Mikhail Chusavitin	679aeb9947	Run NVIDIA DCGM diag tests on all selected GPUs simultaneously targeted_stress, targeted_power, and the Level 2/3 diag were dispatched one GPU at a time from the UI, turning a single dcgmi command into 8 sequential ~350–450 s runs. DCGM supports -i with a comma-separated list of GPU indices and runs the diagnostic on all of them in parallel. Move nvidia, nvidia-targeted-stress, nvidia-targeted-power into nvidiaAllGPUTargets so expandSATTarget passes all selected indices in one API call. Simplify runNvidiaValidateSet to match runNvidiaFabricValidate. Update sat.go constants and page_validate.go estimates to reflect all-GPU simultaneous execution (remove n× multiplier from total time estimates). Stress test on 8-GPU system: ~5.3 h → ~2.5 h. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-20 11:53:25 +03:00
Michael Chus	65bcc9ce81	refactor(webui): split pages into task modules	2026-04-20 06:56:52 +03:00

7 Commits