Tracks origin/main after rebase: adds per-column header filters for
severity in the viewer (feat(viewer): replace severity dropdown).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Picks up new contracts: hardware-ingest-json, submodule-integration,
go-database cursor safety, and several contract deduplication passes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A (hardware-ingest-json v2.8-2.9): remove sensor location fields from schema
and collector; tag HardwareMemory.Location as json:"-"; add PlatformConfig to
HardwareSnapshot.
B (no-hardcoded-vendors): consolidate PCI vendor IDs into collector/pci_vendors.go;
replace all vendor-name string checks in isGPUDevice, isNVIDIADevice, isMellanoxDevice,
isAMDGPUDevice, matchesGPUVendor (sat_overlay), and validateIsVendorGPU (page_validate)
with numeric vendor_id comparisons.
C (module-structure): split app/app.go (1413 lines) into app.go + app_format.go,
app_network.go, app_services.go, app_packs.go, app_install.go — no logic changes.
D (go-code-style): wrap bare return err in interfaceAdminState and
interfaceIPv4Addrs (platform/network.go) with fmt.Errorf context including
the interface name.
E (go-project-bible): add bible-local/architecture/data-model.md and
bible-local/architecture/api-surface.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Disabled PCIe devices (sysfs enable==0) carry no data traffic; their
link state has no operational impact. Switchtec PCIe switch management
endpoints on NVIDIA HGX H100 baseboards (and similar fabric controllers)
train at reduced speed intentionally and were producing spurious warnings.
Check is vendor-agnostic: reads enable attribute via existing helper,
no vendor/device ID hardcoding.
Documented in bible-local/decisions/2026-06-12-pcie-disabled-device-link-warning.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nvme-cli emits smart-log counters as JSON strings and uses field names
avail_spare / percent_used instead of the prose names in the NVMe spec.
The nvmeSmartLog struct had int64 fields with wrong JSON tags — Unmarshal
returned an error and the whole health block was skipped, leaving every
NVMe drive with status=Unknown.
Fix: switch all numeric fields to jsonInt64 (already used for lsblk
block sizes) which accepts both bare numbers and quoted strings, and
correct the avail_spare / percent_used tag names.
Also fix validateIsVendorGPU for NVIDIA: previously counted any NVIDIA
PCIe device (including NVSwitch bridges) as a GPU, producing wrong
estimates (12 instead of 8 on an HGX H100 system). Now requires
device_class to be videocontroller or processingaccelerator, matching
the existing AMD filter logic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nvtop pulled nvidia-tesla-470-* via Recommends into the nogpu build.
Move it from bee.list.chroot into bee-nvidia and bee-amd lists so it
only appears in GPU variants.
Also remove the stray git-bible/ directory (was not gitignored) and
move grub-bitmap-error docs into bible-local/docs/.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Writing to /sys/class/scsi_host/hostX/scan on SAS controllers (e.g.
Adaptec smartpqi/PM8222-SHBA) triggers sas_user_scan which blocks
indefinitely, causing the audit to hang forever. Skip hosts that appear
under /sys/class/sas_host/ — SAS topology is discovered by the driver.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- storage: add jsonInt64 dual-format unmarshaler to handle lsblk output
change in util-linux 2.38 (LOG-SEC/PHY-SEC now emitted as JSON
integers, not quoted strings); fixes SATA disks invisible on Debian 12
- pcie: detect NVLink bridge mezzanine CX-7 cards (Mellanox x2, no host
net ifaces, DeviceName contains "NVLINK" in lspci -v) and mark them
with device_class="NVLinkBridge"; escalate PCIe link speed downgrade to
Critical for these cards (Gen3 on a fixed internal connector = hardware
fault, not a transient warning)
- pcie: cross-reference nvidia-smi topo to capture NVLink bond counts and
active status for all NVLink bridge cards
- packages: add infiniband-diags to ISO; provides ibstat required by
nvidia-fabricmanager-start.sh to enumerate IB devices before FM launch
(absence causes CUDA_ERROR_SYSTEM_NOT_READY)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Network: green if at least one interface has IPv4 (drop PARTIAL state).
Bee Services: treat inactive as OK — oneshot services (bee-sshsetup,
bee-preflight, bee-network, bee-audit, etc.) complete successfully and
exit to inactive; only failed is a real problem.
nvidia-fabricmanager: add ExecCondition=bee-check-nvswitch drop-in so
the service is silently skipped (inactive, not failed) on systems
without NVSwitch hardware (e.g. H200 NVL with direct NVLink, no
NVSwitch chips). bee-check-nvswitch detects NVSwitch via lspci
(vendor 10de, class 0680).
bee-nvidia.service: add ConditionPathExists=/usr/local/bin/bee-nvidia-load
so the unit is a no-op if somehow present in a non-nvidia build.
bee-boot-status: read /etc/bee-gpu-vendor and exclude bee-nvidia from
CRITICAL/ALL on non-nvidia builds, preventing boot hang if the unit
is unexpectedly present.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
EASY_BEE_NVIDIA_LEGACY_V<date> is 33 characters; ISO 9660 volid is
limited to 32. Compute the maximum token length dynamically from the
prefix length and trim ISO_VERSION_LABEL_TOKEN with cut before
assembling BEE_ISO_VOLUME. All four variants now fit within the limit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add nvidia-aer-correctable and pcie-aer-correctable patterns to catch
"bus correctable error" events seen in SEL (Critical Interrupt / offset 7).
Both patterns carry severity "warning" — correctable errors are
hardware-recovered and should not flag a card as failed.
Fix kmsg_watcher routing: GPU-category events were keyed as pcie:<BDF>
but the UI queries for pcie:gpu: prefix. Split the switch so "gpu" →
pcie:gpu:<BDF> and "pcie" → pcie:<BDF>. This applies to both
flushWindow (SAT-window path) and flushImmediate (always-on path).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
\\$1 in a double-quoted string expands as literal backslash + $1 (the
script's first positional arg). With set -u and no CLI args (IP entered
via read), this fails. \$1 correctly escapes the dollar sign, producing
a literal $1 for awk on the remote host.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
HTMX was never loaded on the page, so hx-get on the component label
spans was dead code — the dialog opened empty. Replace with a plain
openComponentDetail() fetch call. Also fix dialog positioning broken
by the CSS reset (*{margin:0} overrode the UA margin:auto that centers
<dialog>). Replace card hx-trigger polling with a setInterval.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All intermediate build artifacts (binaries, live-build work dirs, overlay
stages, NVIDIA/NCCL/cuBLAS/john caches) now live under dist/cache/.
Final ISOs go to dist/release/ instead of scattered dist/easy-bee-v*/ and iso/out/.
dist/ is already gitignored, iso/out/ entry removed as redundant.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Auto-detects build mode: remote VM if BUILDER_HOST is set in .env,
local Docker otherwise. Cache hardcoded to dist/container-cache (gitignored).
All flags forwarded to build-in-container.sh.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- kmsg watcher now records kernel errors (GPU Xid, MCE, EDAC, storage I/O) at all times,
not only during SAT tasks; flushImmediate writes directly to ComponentStatusDB
- New health_poller: polls ipmitool sdr every 60s for PSU health (watchdog:psu source)
- Hardware Summary card auto-refreshes every 30s via htmx without page reload
- Component rows (CPU/Memory/Storage/GPU/PSU) are now clickable -- opens a modal
with per-component status, source, timestamp and last 20 history entries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Installs a local-premount initramfs hook that intercepts bee.wipe=all before
squashfs is mounted. Shows a numbered disk selection TUI (pure POSIX sh), wipes
selected disks (nvme format / blkdiscard / dd fallback), syncs, and reboots.
Works even when squashfs fails to mount.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a "WIPE ALL DISKS" entry to both GRUB and isolinux menus (bee.wipe=all).
Includes bee-wipe-disks for manual use from a running live system.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Squashfs versioning:
- ISO now contains filesystem-v<VERSION>.squashfs instead of the generic
filesystem.squashfs, making it immediately visible which build is
running (visible in /run/live/medium/live/ at boot time).
- Full build path: rename filesystem.squashfs → filesystem-v*.squashfs
after lb build, before lb binary_checksums/binary_iso.
- Fast path: find and unpack whatever filesystem*.squashfs exists, repack
as the new versioned name, remove the old file, update the ISO.
- needs_full_build: accept any filesystem*.squashfs so version changes
alone don't force a full rebuild.
Media selection hardening:
- Add live-media=/dev/disk/by-label/<LABEL> to the kernel boot line in
addition to the existing live-media-label=<LABEL>. live-boot will now
open exactly the labeled device rather than scanning all block devices,
preventing accidental use of squashfs files from local disks or
stale virtual media attached via IPMI.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mksquashfs 4.5.1 (bookworm) writes a non-SQUASHFS_INVALID_BLK value for
xattr_id_table_start in the superblock even when -no-xattrs is passed, if
the source chroot contains POSIX ACL xattrs set by dpkg at install time.
Linux 6.1 squashfs driver then fails with "unable to read xattr id index
table" and refuses to mount the filesystem.
Strip all xattrs from the chroot via Python3 (already present) immediately
before mksquashfs runs. With an xattr-free source tree the resulting
squashfs is guaranteed to have SQUASHFS_INVALID_BLK in the xattr field.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--chroot-squashfs-compression-options does not exist in live-build
bookworm (1:20230502). The correct mechanism is the MKSQUASHFS_OPTIONS
environment variable read by binary_rootfs.
Export MKSQUASHFS_OPTIONS="-no-xattrs" before lb build so live-build's
binary_rootfs picks it up, and add -no-xattrs explicitly to every
direct mksquashfs call in build.sh (fast-path repack and the dormant
split-layers function). Remove the invalid lb config option.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--chroot-squashfs-options is not a valid lb_config option; the correct
name is --chroot-squashfs-compression-options. Without this fix lb config
aborts immediately, so the -no-xattrs flag (which prevents the
"unable to read xattr id index table" boot failure) was never applied.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Kernel squashfs driver fails with "unable to read xattr id index table"
when the squashfs contains POSIX ACL xattrs (system.posix_acl_*) written
by mksquashfs as unrecognised entries. This caused every built ISO to
drop to an initramfs shell at boot.
Add -no-xattrs to mksquashfs options so xattrs are stripped at build
time. xattrs are not needed in a live read-only rootfs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
split_live_squashfs_layers moved /usr out of filesystem.squashfs into a
separate 10-usr.squashfs, leaving a rootfs skeleton that live-boot
(1:20230131+deb12u1) cannot mount: the initramfs panics with
"Can not mount /dev/loop0 ... filesystem.squashfs".
live-boot in bookworm expects a single self-contained filesystem.squashfs.
Revert to the standard single-squashfs layout and remove the dead
multi-squashfs guard in needs_full_build().
The split_live_squashfs_layers function is kept for future reference.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
trap RETURN is a bash extension not supported by /bin/sh on Debian.
With set -e active the unsupported trap call exited the build immediately
after lb build, before bootloader sync and ISO copy steps ran.
Remove both trap RETURN lines — explicit rm -rf at the end of the
function is sufficient for cleanup on the happy path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New in chart:
- event_logs and platform_config sections in viewer
- Storage columns: logical_block_size_bytes, physical_block_size_bytes,
metadata_bytes_per_block
- Compact status/severity icons, severity filtering for event logs
- Fixed JS MIME type and base stylesheet
bee audit schema already has all required fields; no schema changes needed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GRUB's PNG reader (grub2 bookworm) fails to load bee-logo.png despite the
file being valid RGB 8-bit non-interlaced PNG with minimal chunks. Root
cause is a known fragility in GRUB's png.c; exact trigger is unknown.
Switch to uncompressed 24-bit TGA which bypasses the PNG parser entirely.
tga.mod is already present in the ISO (x86_64-efi/tga.mod).
- Convert bee-logo.png → bee-logo.tga (480018 bytes, BGR top-left)
- config.cfg: insmod png → insmod tga
- theme.txt: bee-logo.png → bee-logo.tga
- Document all prior failed attempts in git-bible/grub-bitmap-error.md
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
IPMI hang fix (Lenovo XCC SR650 V3):
- Add pluggable ipmi_profile system with per-vendor timeouts and fruEarlyExit flag
- Lenovo profile: 90s FRU timeout, streaming early-exit stops after PSU blocks found
- collectFRUEarlyExit streams ipmitool fru print and kills process once PSU blocks
are followed by a non-PSU header (~6s instead of ~108s on 54-device FRU list)
- collectBMCFirmware and collectPSUs accept manufacturer and apply profile timeouts
VROC license detection:
- Detect VMD/VROC controller in PCIe list, run mdadm --detail-platform
- Parse "License:" line; store as snap.VROCLicense in HardwareSnapshot
Blackbox service fix:
- bee-blackbox.service was missing from systemctl enable list in ISO build hook
- Service never started on boot; state file never written; UI button stayed "Enable"
Drop qrencode:
- Remove from package list, standardTools API check, and runtime-flows doc
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>