Files
bee/bible-local/README.md
Mikhail Chusavitin 0b8a2ff83f Add validate test matrix and GPU test methodology docs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 10:47:08 +03:00

1.8 KiB

bee — Project Bible

Project-specific architecture, decisions, and runtime contracts. Generic engineering rules live in bible/rules/patterns/.

Files

File Contents
architecture/system-overview.md What bee does, scope, tech stack
architecture/runtime-flows.md Boot sequence, audit flow, service order
docs/customer-gpu-test-methodology.md Customer-facing GPU PCIe Validate / Validate -> Stress test list
docs/hardware-ingest-contract.md Current Reanimator hardware ingest JSON contract
docs/validate-vs-burn.md Validate and Validate -> Stress hardware test policy
decisions/ Architectural decision log, including read-only submodule policy

Validate Test Matrix

Validate

  • CPU check
    • lscpu
    • sensors
    • stress-ng
  • Memory check
    • free
    • timeout <timeout_sec> memtester
    • free
  • NVMe storage check
    • nvme id-ctrl
    • nvme smart-log
    • nvme device-self-test
  • SATA/SAS storage check
    • smartctl -H -A
    • smartctl -t short
  • Basic NVIDIA GPU check
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dmidecode -t baseboard
    • dmidecode -t system
    • dcgmi diag -r 2
  • Inter-GPU communication check
    • all_reduce_perf
  • GPU bandwidth check
    • dcgmi diag -r nvbandwidth

Validate -> Stress

  • Extended NVIDIA GPU check
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dmidecode -t baseboard
    • dmidecode -t system
    • dcgmi diag -r 3
  • NVIDIA targeted stress
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dcgmi diag -r targeted_stress
  • NVIDIA targeted power
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dcgmi diag -r targeted_power
  • NVIDIA pulse test
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dcgmi diag -r pulse_test
  • Inter-GPU communication check
    • all_reduce_perf
  • GPU bandwidth check
    • dcgmi diag -r nvbandwidth