# GPU PCIe Test Methodology ## Validate - CPU check - `lscpu` - `sensors` - `stress-ng` - Memory check - `free` - `timeout memtester` - `free` - NVMe storage check - `nvme id-ctrl` - `nvme smart-log` - `nvme device-self-test` - SATA/SAS storage check - `smartctl -H -A` - `smartctl -t short` - Basic NVIDIA GPU check - `nvidia-smi -pm 1` - `nvidia-smi -q` - `dmidecode -t baseboard` - `dmidecode -t system` - `dcgmi diag -r 2` - Inter-GPU communication check - `all_reduce_perf` - GPU bandwidth check - `dcgmi diag -r nvbandwidth` ## Validate -> Stress - Extended NVIDIA GPU check - `nvidia-smi -pm 1` - `nvidia-smi -q` - `dmidecode -t baseboard` - `dmidecode -t system` - `dcgmi diag -r 3` - NVIDIA targeted stress - `nvidia-smi -pm 1` - `nvidia-smi -q` - `dcgmi diag -r targeted_stress` - NVIDIA targeted power - `nvidia-smi -pm 1` - `nvidia-smi -q` - `dcgmi diag -r targeted_power` - NVIDIA pulse test - `nvidia-smi -pm 1` - `nvidia-smi -q` - `dcgmi diag -r pulse_test` - Inter-GPU communication check - `all_reduce_perf` - GPU bandwidth check - `dcgmi diag -r nvbandwidth`