# Validate vs Burn: Hardware Impact Policy

## Validate Tests (non-destructive)

Tests on the **Validate** page are purely diagnostic. They:

- **Do not write to disks** — no data is written to storage devices; SMART counters (power-on hours, load cycle count, reallocated sectors) are not incremented.
- **Do not run sustained high load** — commands complete quickly (seconds to minutes) and do not push hardware to thermal or electrical limits.
- **Do not increment hardware wear counters** — GPU memory ECC counters, NVMe wear leveling counters, and similar endurance metrics are unaffected.
- **Are safe to run repeatedly** — on new, production-bound, or already-deployed hardware without concern for reducing lifespan.

### What Validate tests actually do

| Test | What it runs |
|---|---|
| NVIDIA GPU | `nvidia-smi`, `dcgmi diag` (levels 1–4 read-only diagnostics) |
| Memory | `memtester` on a limited allocation; reads/writes to RAM only |
| Storage | `smartctl -a`, `nvme smart-log` — reads SMART data only |
| CPU | `stress-ng` for a bounded duration; CPU-only, no I/O |
| AMD GPU | `rocm-smi --showallinfo`, `dmidecode` — read-only queries |

## Burn Tests (hardware wear)

Tests on the **Burn** page run hardware at maximum or near-maximum load for extended durations. They:

- **Wear storage**: write-intensive patterns can reduce SSD endurance (P/E cycles).
- **Stress GPU memory**: extended ECC stress tests may surface latent defects but also exercise memory cells.
- **Accelerate thermal cycling**: repeated heat/cool cycles degrade solder joints and capacitors over time.
- **May increment wear counters**: GPU power-on hours, NVMe media wear indicator, and similar metrics will advance.

### Rule

> Run **Validate** freely on any server, at any time, before or after deployment.
> Run **Burn** only when explicitly required (e.g., initial acceptance after repair, or per customer SLA).
> Document when and why Burn tests were run.