Files
Michael Chus 7d2e904d14 Bring codebase into compliance with bible contracts (A–E)
A (hardware-ingest-json v2.8-2.9): remove sensor location fields from schema
and collector; tag HardwareMemory.Location as json:"-"; add PlatformConfig to
HardwareSnapshot.

B (no-hardcoded-vendors): consolidate PCI vendor IDs into collector/pci_vendors.go;
replace all vendor-name string checks in isGPUDevice, isNVIDIADevice, isMellanoxDevice,
isAMDGPUDevice, matchesGPUVendor (sat_overlay), and validateIsVendorGPU (page_validate)
with numeric vendor_id comparisons.

C (module-structure): split app/app.go (1413 lines) into app.go + app_format.go,
app_network.go, app_services.go, app_packs.go, app_install.go — no logic changes.

D (go-code-style): wrap bare return err in interfaceAdminState and
interfaceIPv4Addrs (platform/network.go) with fmt.Errorf context including
the interface name.

E (go-project-bible): add bible-local/architecture/data-model.md and
bible-local/architecture/api-surface.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 14:32:08 +03:00
..

bee — Project Bible

Project-specific architecture, decisions, and runtime contracts. Generic engineering rules live in bible/rules/patterns/.

Files

File Contents
architecture/system-overview.md What bee does, scope, tech stack
architecture/runtime-flows.md Boot sequence, audit flow, service order
docs/customer-gpu-test-methodology.md Customer-facing GPU PCIe Validate / Validate -> Stress test list
docs/hardware-ingest-contract.md Current Reanimator hardware ingest JSON contract
docs/validate-vs-burn.md Validate and Validate -> Stress hardware test policy
decisions/ Architectural decision log, including read-only submodule policy

Validate Test Matrix

Validate

  • CPU check
    • lscpu
    • sensors
    • stress-ng
  • Memory check
    • free
    • timeout <timeout_sec> memtester
    • free
  • NVMe storage check
    • nvme id-ctrl
    • nvme smart-log
    • nvme device-self-test
  • SATA/SAS storage check
    • smartctl -H -A
    • smartctl -t short
  • Basic NVIDIA GPU check
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dmidecode -t baseboard
    • dmidecode -t system
    • dcgmi diag -r 2
  • Inter-GPU communication check
    • all_reduce_perf
  • GPU bandwidth check
    • dcgmi diag -r nvbandwidth

Validate -> Stress

  • Extended NVIDIA GPU check
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dmidecode -t baseboard
    • dmidecode -t system
    • dcgmi diag -r 3
  • NVIDIA targeted stress
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dcgmi diag -r targeted_stress
  • NVIDIA targeted power
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dcgmi diag -r targeted_power
  • NVIDIA pulse test
    • nvidia-smi -pm 1
    • nvidia-smi -q
    • dcgmi diag -r pulse_test
  • Inter-GPU communication check
    • all_reduce_perf
  • GPU bandwidth check
    • dcgmi diag -r nvbandwidth