Files
bee/iso/builder/build-gpu-burn.sh
Mikhail Chusavitin 1768bb58dd Merge debug/prod into single ISO build, fix NVIDIA module loading
## ISO build consolidation
- Remove separate debug/prod split: overlay-debug/, build-debug.sh,
  mkimg.bee_debug.sh, genapkovl-bee_debug.sh all deleted
- Single overlay: iso/overlay/ (was overlay-debug content)
- Single build script: build.sh (SSH, TUI, NVIDIA, vendor tools, bee-release)
- Single mkimage profile: bee (with dropbear, dialog, strace, gcompat, etc.)

## NVIDIA fixes
- Modules now stored at /usr/local/lib/nvidia/ instead of
  /lib/modules/<kver>/extra/nvidia/ — modloop squashfs mounts over that
  path at boot making overlay content there inaccessible
- bee-nvidia init: load via insmod (absolute path), not modprobe
- bee-nvidia init: create libnvidia-ml.so.1/libcuda.so.1 symlinks in /usr/lib/
- build-nvidia-module.sh: always install linux-lts-dev (not conditional) —
  stale 6.6.x headers caused wrong-kernel modules that never loaded at runtime
- build-nvidia-module.sh: create soname symlinks in cache
- KERNEL_VERSION in VERSIONS updated 6.6 → 6.12
- gcompat added to ISO packages (nvidia-smi is a glibc binary on musl Alpine)

## Service ordering
- bee-audit: add `after bee-nvidia` so NVIDIA enrichment always succeeds

## New tooling
- iso/builder/smoketest.sh: SSH smoke test for post-boot ISO validation
- iso/builder/build-gpu-burn.sh: builds gpu_burn vendor binary (CUDA 12.8+)
- vendor/gpu_burn included automatically if placed in iso/vendor/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 20:14:18 +03:00

83 lines
2.7 KiB
Bash

#!/bin/sh
# build-gpu-burn.sh — build gpu_burn stress tool and output static-ish binary to DIST_DIR
#
# gpu_burn requires nvcc (CUDA toolkit). This script downloads a minimal CUDA toolkit
# runfile, extracts only nvcc + headers, builds gpu_burn, then cleans up the toolkit.
#
# Output: $DIST_DIR/gpu_burn (ready to copy into ISO vendor/)
#
# Usage: sh build-gpu-burn.sh <dist-dir>
set -e
DIST_DIR="$1"
[ -n "$DIST_DIR" ] || { echo "usage: $0 <dist-dir>"; exit 1; }
mkdir -p "$DIST_DIR"
OUTPUT="$DIST_DIR/gpu_burn"
if [ -f "$OUTPUT" ] && [ -s "$OUTPUT" ]; then
echo "=== gpu_burn cached: $OUTPUT ==="
exit 0
fi
# CUDA toolkit version for building — only nvcc + headers needed, not the full runtime.
# Must be <= max CUDA version supported by the NVIDIA driver in VERSIONS.
# Driver 590.48.01 supports up to CUDA 13.1; use 12.6 (stable, widely tested).
CUDA_VERSION="12.8.1"
CUDA_BUILD="570.124.06"
CUDA_RUN="/var/tmp/cuda-${CUDA_VERSION}.run"
CUDA_DIR="/var/tmp/cuda-toolkit-${CUDA_VERSION}"
echo "=== building gpu_burn (CUDA ${CUDA_VERSION}) ==="
# Install build dependencies
apk add --quiet gcc g++ make git wget libxml2
# Download CUDA toolkit runfile if not cached
if [ ! -s "$CUDA_RUN" ]; then
echo "=== downloading CUDA ${CUDA_VERSION} toolkit ==="
wget -q --show-progress -O "$CUDA_RUN" \
"https://developer.download.nvidia.com/compute/cuda/${CUDA_VERSION}/local_installers/cuda_${CUDA_VERSION}_${CUDA_BUILD}_linux.run"
fi
# Extract toolkit (nvcc + headers only — skip driver, samples, docs to save time/space)
if [ ! -d "$CUDA_DIR/bin/nvcc" ] && [ ! -f "$CUDA_DIR/bin/nvcc" ]; then
echo "=== extracting CUDA toolkit ==="
rm -rf "$CUDA_DIR"
sh "$CUDA_RUN" \
--silent \
--toolkit \
--toolkitpath="$CUDA_DIR" \
--no-opengl-libs \
--no-drm \
--override 2>&1 | tail -5
fi
NVCC="$CUDA_DIR/bin/nvcc"
[ -f "$NVCC" ] || { echo "ERROR: nvcc not found after extraction: $NVCC"; exit 1; }
echo "nvcc: $("$NVCC" --version | head -1)"
# Clone gpu_burn source
GPU_BURN_DIR="/var/tmp/gpu-burn-src"
if [ ! -d "$GPU_BURN_DIR/.git" ]; then
echo "=== cloning gpu-burn ==="
git clone --depth=1 https://github.com/wilicc/gpu-burn.git "$GPU_BURN_DIR"
else
echo "=== gpu-burn source already cloned ==="
fi
# Build
echo "=== building gpu_burn ==="
cd "$GPU_BURN_DIR"
make clean 2>/dev/null || true
CUDA_PATH="$CUDA_DIR" make 2>&1
[ -f "$GPU_BURN_DIR/gpu_burn" ] || { echo "ERROR: gpu_burn binary not produced"; exit 1; }
cp "$GPU_BURN_DIR/gpu_burn" "$OUTPUT"
cp "$GPU_BURN_DIR/compare.ptx" "$(dirname "$OUTPUT")/compare.ptx" 2>/dev/null || true
echo "=== gpu_burn build complete ==="
ls -lh "$OUTPUT"
echo "NOTE: compare.ptx must be present in same dir as gpu_burn at runtime"