fix: pcie gen, nccl binary, netconf sudo, boot noise, firmware cleanup

- nvidia collector: read pcie.link.gen.current/max from nvidia-smi instead
  of sysfs to avoid false Gen1 readings when GPU is in ASPM idle state
- build: remove bee-nccl-gpu-stress from rm -f list so shell script from
  overlay is not silently dropped from the ISO
- smoketest: add explicit checks for bee-gpu-burn, bee-john-gpu-stress,
  bee-nccl-gpu-stress, all_reduce_perf
- netconf: re-exec via sudo when not root to fix RTNETLINK/resolv.conf errors
- auto/config: reduce loglevel 7→3 to show clean systemd output on boot
- auto/config: blacklist snd_hda_intel and related audio modules (unused on servers)
- package-lists: remove firmware-intel-sound and firmware-amd-graphics from
  base list; move firmware-amd-graphics to bee-amd variant only
- bible-local: mark memtest ADR resolved, document working solution

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-04-01 21:25:23 +03:00
parent 2baf3be640
commit eb60100297
9 changed files with 148 additions and 25 deletions

View File

@@ -3,6 +3,11 @@
# Type 'a' at any prompt to abort, 'b' to go back.
set -e
# Requires root for ip/dhclient/resolv.conf — re-exec via sudo if needed.
if [ "$(id -u)" -ne 0 ]; then
exec sudo "$0" "$@"
fi
abort() { echo "Aborted."; exit 0; }
ask() {