Commit Graph

7 Commits

Author SHA1 Message Date
Mikhail Chusavitin
17ff3811f8 fix(webui): improve tasks logs and ordering 2026-04-02 13:43:59 +03:00
Mikhail Chusavitin
1f750d3edd fix(webui): prevent orphaned workers on restart, reduce metrics polling, add Kill Workers button
- tasks: mark TaskRunning tasks as TaskFailed on bee-web restart instead of
  re-queueing them — prevents duplicate gpu-burn-worker spawns when bee-web
  crashes mid-test (each restart was launching a new set of 8 workers on top
  of still-alive orphans from the previous crash)
- server: reduce metrics collector interval 1s→5s, grow ring buffer to 360
  samples (30 min); cuts nvidia-smi/ipmitool/sensors subprocess rate by 5×
- platform: add KillTestWorkers() — scans /proc and SIGKILLs bee-gpu-burn,
  stress-ng, stressapptest, memtester without relying on pkill/killall
- webui: add "Kill Workers" button next to Cancel All; calls
  POST /api/tasks/kill-workers which cancels the task queue then kills
  orphaned OS-level processes; shows toast with killed count
- metricsdb: sort GPU indices and fan/temp names after map iteration to fix
  non-deterministic sample reconstruction order (flaky test)
- server: fix chartYAxisNumber to use one decimal place for 1000–9999
  (e.g. "1,7к" instead of "2к") so Y-axis ticks are distinguishable

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 10:13:43 +03:00
c2aecc6ce9 Fix fan chart gaps and task durations 2026-04-01 22:36:11 +03:00
Mikhail Chusavitin
c394845b34 refactor(webui): queue install and bundle tasks - v3.18 2026-04-01 08:46:46 +03:00
Mikhail Chusavitin
b5b34983f1 fix(webui): repair audit actions and CPU burn flow - v3.15 2026-04-01 08:19:11 +03:00
45221d1e9a fix(stress): label loaders and improve john opencl diagnostics 2026-04-01 07:31:52 +03:00
59a1d4b209 release: v3.1 2026-03-28 22:51:36 +03:00