# Validate vs Burn: Hardware Impact Policy ## Validate Tests (non-destructive) Tests on the **Validate** page are purely diagnostic. They: - **Do not write to disks** — no data is written to storage devices; SMART counters (power-on hours, load cycle count, reallocated sectors) are not incremented. - **Do not run sustained high load** — commands complete quickly (seconds to minutes) and do not push hardware to thermal or electrical limits. - **Do not increment hardware wear counters** — GPU memory ECC counters, NVMe wear leveling counters, and similar endurance metrics are unaffected. - **Are safe to run repeatedly** — on new, production-bound, or already-deployed hardware without concern for reducing lifespan. ### What Validate tests actually do | Test | What it runs | |---|---| | NVIDIA GPU | `nvidia-smi`, `dcgmi diag` (levels 1–4 read-only diagnostics) | | Memory | `memtester` on a limited allocation; reads/writes to RAM only | | Storage | `smartctl -a`, `nvme smart-log` — reads SMART data only | | CPU | `stress-ng` for a bounded duration; CPU-only, no I/O | | AMD GPU | `rocm-smi --showallinfo`, `dmidecode` — read-only queries | ## Burn Tests (hardware wear) Tests on the **Burn** page run hardware at maximum or near-maximum load for extended durations. They: - **Wear storage**: write-intensive patterns can reduce SSD endurance (P/E cycles). - **Stress GPU memory**: extended ECC stress tests may surface latent defects but also exercise memory cells. - **Accelerate thermal cycling**: repeated heat/cool cycles degrade solder joints and capacitors over time. - **May increment wear counters**: GPU power-on hours, NVMe media wear indicator, and similar metrics will advance. ### Rule > Run **Validate** freely on any server, at any time, before or after deployment. > Run **Burn** only when explicitly required (e.g., initial acceptance after repair, or per customer SLA). > Document when and why Burn tests were run.