Disabled PCIe devices (sysfs enable==0) carry no data traffic; their link state has no operational impact. Switchtec PCIe switch management endpoints on NVIDIA HGX H100 baseboards (and similar fabric controllers) train at reduced speed intentionally and were producing spurious warnings. Check is vendor-agnostic: reads enable attribute via existing helper, no vendor/device ID hardcoding. Documented in bible-local/decisions/2026-06-12-pcie-disabled-device-link-warning.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bee — Project Bible
Project-specific architecture, decisions, and runtime contracts.
Generic engineering rules live in bible/rules/patterns/.
Files
| File | Contents |
|---|---|
architecture/system-overview.md |
What bee does, scope, tech stack |
architecture/runtime-flows.md |
Boot sequence, audit flow, service order |
docs/customer-gpu-test-methodology.md |
Customer-facing GPU PCIe Validate / Validate -> Stress test list |
docs/hardware-ingest-contract.md |
Current Reanimator hardware ingest JSON contract |
docs/validate-vs-burn.md |
Validate and Validate -> Stress hardware test policy |
decisions/ |
Architectural decision log, including read-only submodule policy |
Validate Test Matrix
Validate
- CPU check
lscpusensorsstress-ng
- Memory check
freetimeout <timeout_sec> memtesterfree
- NVMe storage check
nvme id-ctrlnvme smart-lognvme device-self-test
- SATA/SAS storage check
smartctl -H -Asmartctl -t short
- Basic NVIDIA GPU check
nvidia-smi -pm 1nvidia-smi -qdmidecode -t baseboarddmidecode -t systemdcgmi diag -r 2
- Inter-GPU communication check
all_reduce_perf
- GPU bandwidth check
dcgmi diag -r nvbandwidth
Validate -> Stress
- Extended NVIDIA GPU check
nvidia-smi -pm 1nvidia-smi -qdmidecode -t baseboarddmidecode -t systemdcgmi diag -r 3
- NVIDIA targeted stress
nvidia-smi -pm 1nvidia-smi -qdcgmi diag -r targeted_stress
- NVIDIA targeted power
nvidia-smi -pm 1nvidia-smi -qdcgmi diag -r targeted_power
- NVIDIA pulse test
nvidia-smi -pm 1nvidia-smi -qdcgmi diag -r pulse_test
- Inter-GPU communication check
all_reduce_perf
- GPU bandwidth check
dcgmi diag -r nvbandwidth