diff --git a/bible-local/docs/grub-bitmap-error-history.md b/bible-local/docs/grub-bitmap-error-history.md new file mode 100644 index 0000000..e29328e --- /dev/null +++ b/bible-local/docs/grub-bitmap-error-history.md @@ -0,0 +1,312 @@ +# GRUB Bitmap Error History + +## Symptom + +On some servers GRUB prints: + +```text +error: null src bitmap in grub_video_bitmap_create_scaled. +Press any key to continue... +``` + +The important new observation as of `v10.7` is: + +- the error still appears even when the logo image block is removed from + `iso/builder/config/bootloaders/grub-efi/live-theme/theme.txt` +- therefore the current error can no longer be explained only by + `bee-logo.png` / `bee-logo.tga` + +That does not prove the theme system is healthy. It proves only that the +currently remaining failure is deeper than "bad logo file". + +## Current State + +Current source files: + +- [iso/builder/config/bootloaders/grub-efi/live-theme/theme.txt](/Users/mchusavitin/Documents/git/bee/iso/builder/config/bootloaders/grub-efi/live-theme/theme.txt:1) + has no `image` block anymore +- [iso/builder/config/bootloaders/grub-efi/config.cfg](/Users/mchusavitin/Documents/git/bee/iso/builder/config/bootloaders/grub-efi/config.cfg:1) + still does `insmod tga` and then `source /boot/grub/theme.cfg` + +Implication: + +- if the error still fires, the trigger is likely elsewhere in GRUB theme + rendering or in the assets/config GRUB resolves while sourcing `theme.cfg` +- the old "PNG parser fragility" story is no longer a sufficient explanation + for the current failure mode + +Current artifact facts: + +- the provided `easy-bee-nvidia-v10.7-amd64.logs` build logs reference + `linux-image-6.1.0-45` +- the provided `easy-bee-nvidia-v10.7-amd64.iso` contains + `live/initrd.img-6.1.0-45-amd64` and `live/vmlinuz-6.1.0-45-amd64` +- a later `BOOT FAILED!` screenshot showed `live/initrd.img-6.1.0-44-amd64` + and `live/vmlinuz-6.1.0-44-amd64` + +Implication: + +- the `BOOT FAILED!` screenshot is not from the same artifact as the provided + `v10.7` ISO/log set +- until the exact ISO filename and checksum are tied to that failure, the + GRUB bitmap issue and the live-boot failure must be treated as separate + problems + +## Chronology + +### 1. Initial bee GRUB theme introduction + +Relevant commit: + +- `d52ec67` `Stability hardening, build script fixes, GRUB bee logo` + +What changed: + +- bee-branded GRUB theme introduced +- image block with explicit `width` / `height` + +Observed result: + +- bitmap error appeared + +### 2. Remove explicit scaling dimensions + +Relevant commit: + +- `aa284ae` `fix(iso): avoid grub logo scaling error` + +What changed: + +- removed `width = 400` +- removed `height = 400` + +Reason stated by the change: + +- try to avoid the scaling path + +Observed result: + +- error persisted + +Conclusion: + +- explicit width/height were not the sole trigger + +### 3. Rework PNG handling and menu rendering + +Relevant commit: + +- `6112094` `fix(grub): fix bitmap error and menu rendering` + +Commit message says the change was intended to: + +- convert `bee-logo.png` to RGBA and strip metadata +- move `terminal_output gfxterm` before `insmod png` / theme load +- remove ASCII banner from GRUB menu area +- fix theme typography/layout fields + +Observed result: + +- error persisted + +Notes: + +- this was still operating under the assumption that the issue was the PNG + payload or the order of gfxterm/theme init + +### 4. Convert logo PNG back to RGB + +Relevant commit: + +- `333c44f` `Fix GRUB splash: convert bee-logo.png from RGBA to RGB` + +Intended reason: + +- GRUB might dislike RGBA PNG and want RGB PNG + +Observed result: + +- error still persisted according to later project notes + +### 5. Add post-build canonical GRUB/isolinux sync + +Relevant commit: + +- `0cdfbc5` `fix(iso): restore boot UX and boot logs` + +What this introduced: + +- post-`lb build` rewriting of `binary/boot/grub/grub.cfg` +- post-`lb build` rewriting of `binary/isolinux/live.cfg` +- forced rebuild of `binary_checksums`, `binary_iso`, `binary_zsync` + +Why it was added: + +- restore canonical EASY-BEE boot UX after live-build wrote its own bootloader + outputs +- restore expected boot menu and logs + +Important note: + +- this commit did not directly solve the bitmap issue +- it added a second layer of bootloader mutation after live-build + +### 6. Switch from PNG to TGA + +Relevant commit: + +- `626763e` `Fix GRUB bitmap error: switch from PNG to TGA for splash logo` + +Commit message says: + +- GRUB PNG reader was considered fragile +- switch to uncompressed 24-bit TGA +- `config.cfg`: `insmod png` -> `insmod tga` +- `theme.txt`: `bee-logo.png` -> `bee-logo.tga` + +Observed result: + +- this did not eliminate the problem in the current lineage +- today the system still errors even after the entire image block was removed + +Conclusion: + +- switching PNG -> TGA was not a durable root-cause fix + +### 7. Patch EFI image after build + +Relevant commit: + +- `4f20c92` `Make UEFI boot safe and remove GRUB logo` + +What this introduced: + +- `sync_efi_grub_theme_assets` +- direct `mtools` patching of `efi.img` +- copying `config.cfg`, `theme.cfg`, and `live-theme/*` into the EFI FAT image +- removal of the logo image block from `theme.txt` + +Why it was added: + +- make UEFI path "safe" +- keep EFI GRUB image aligned with canonical bootloader assets + +Observed result: + +- later this became the direct cause of `Disk full` during build once + `bee-logo.tga` was large enough +- and even with the logo removed from `theme.txt`, the bitmap error still + remained + +Conclusion: + +- EFI post-build patching increased build complexity +- removing the logo alone did not remove the runtime GRUB error + +### 8. Remove ASCII logo banners + +Relevant commit: + +- `14505ef` `Remove easy bee ASCII logo banners` + +What changed: + +- web loading page ASCII cleanup only + +Relevance here: + +- none for GRUB bitmap error +- included here only to avoid confusion with other "logo removal" work + +### 9. Remove EFI post-build patching + +Relevant commit: + +- `5dc022d` `Drop post-build EFI bootloader patching` + +Why it was done: + +- stop mutating `efi.img` post-build +- remove dependence on `mtools` for EFI patching +- remove the `Disk full` failure mode + +Impact: + +- this did not target the GRUB bitmap error directly +- it targeted build-system complexity and EFI image overflow + +### 10. Restore only GRUB/isolinux post-build sync + +Relevant commit: + +- `42774d4` `Restore post-build GRUB and isolinux sync` + +Why it was needed: + +- removing all post-build sync caused final ISO validation to fail with + missing canonical EASY-BEE boot entries +- memtest was still fine, but final GRUB menu was no longer canonical + +What it restored: + +- only `binary/boot/grub/grub.cfg` +- only `binary/isolinux/live.cfg` + +What it did not restore: + +- no EFI FAT image patching +- no `mtools` path + +## What Is Proven False + +The current evidence rules out several simplistic explanations: + +- "the error is only caused by explicit image scaling" +- "the error is only caused by PNG vs TGA" +- "the error is only caused by the logo file itself" + +Why: + +- scaling dimensions were removed and error persisted +- PNG was replaced with TGA and error still survived in the lineage +- the image block itself is now absent, and the error still occurs + +## Working Hypotheses Left + +The remaining plausible layers are: + +- GRUB theme engine still tries to render some bitmap-related element even + without the logo image block +- GRUB is resolving stale theme assets from the built EFI/ISO path rather than + what we think the source tree says +- `theme.cfg` / `theme.txt` / GRUB module loading order still triggers a bitmap + code path elsewhere +- live-build may still package a stale `theme.txt` or stale `live-theme` + directory into the final image +- the GRUB environment on the failing hardware may behave differently from the + assumptions in our source tree + +## Decision Boundary + +Before making another change, the next step should be evidence gathering from +the real built artifact, not another speculative edit. + +That means checking on the actual built ISO or EFI image: + +- exact `boot/grub/theme.cfg` +- exact `boot/grub/live-theme/theme.txt` +- exact contents of `boot/grub/live-theme/` +- whether the final image still contains a stale logo reference +- whether the EFI path and non-EFI path differ + +## Relevant Commits + +- `d52ec67` `Stability hardening, build script fixes, GRUB bee logo` +- `aa284ae` `fix(iso): avoid grub logo scaling error` +- `6112094` `fix(grub): fix bitmap error and menu rendering` +- `333c44f` `Fix GRUB splash: convert bee-logo.png from RGBA to RGB` +- `0cdfbc5` `fix(iso): restore boot UX and boot logs` +- `626763e` `Fix GRUB bitmap error: switch from PNG to TGA for splash logo` +- `4f20c92` `Make UEFI boot safe and remove GRUB logo` +- `5dc022d` `Drop post-build EFI bootloader patching` +- `42774d4` `Restore post-build GRUB and isolinux sync` diff --git a/git-bible/grub-bitmap-error.md b/bible-local/docs/grub-bitmap-error.md similarity index 100% rename from git-bible/grub-bitmap-error.md rename to bible-local/docs/grub-bitmap-error.md diff --git a/iso/builder/config/package-lists/bee-amd.list.chroot b/iso/builder/config/package-lists/bee-amd.list.chroot index 8157879..597fccd 100644 --- a/iso/builder/config/package-lists/bee-amd.list.chroot +++ b/iso/builder/config/package-lists/bee-amd.list.chroot @@ -1,5 +1,6 @@ # AMD GPU firmware firmware-amd-graphics +nvtop # AMD ROCm — GPU monitoring, bandwidth test, and compute stress (RVS GST) rocm-smi-lib=%%ROCM_SMI_VERSION%% diff --git a/iso/builder/config/package-lists/bee-nvidia.list.chroot b/iso/builder/config/package-lists/bee-nvidia.list.chroot index 13ae433..8629fb5 100644 --- a/iso/builder/config/package-lists/bee-nvidia.list.chroot +++ b/iso/builder/config/package-lists/bee-nvidia.list.chroot @@ -5,6 +5,7 @@ # DCGM 4 is packaged per CUDA major. The image ships NVIDIA driver 590 with # CUDA 13 userspace, so install the CUDA 13 build plus proprietary components # explicitly. +nvtop nvidia-fabricmanager=%%NVIDIA_FABRICMANAGER_VERSION%% datacenter-gpu-manager-4-cuda13=1:%%DCGM_VERSION%% datacenter-gpu-manager-4-proprietary=1:%%DCGM_VERSION%% diff --git a/iso/builder/config/package-lists/bee.list.chroot b/iso/builder/config/package-lists/bee.list.chroot index 3c043a2..a1aad59 100644 --- a/iso/builder/config/package-lists/bee.list.chroot +++ b/iso/builder/config/package-lists/bee.list.chroot @@ -47,7 +47,6 @@ less vim-tiny mc htop -nvtop sudo zstd mstflint