docs: add agent bootstrap and contract read router

This commit is contained in:
Mikhail Chusavitin
2026-04-02 13:48:36 +03:00
parent 688b87e98d
commit 1d89a4918e
22 changed files with 883 additions and 1284 deletions

View File

@@ -0,0 +1,80 @@
# Unattended Boot Services Pattern Notes
This file keeps examples and rationale. The normative rules live in `contract.md`.
## Dependency Skeleton
```sh
depend() {
need localmount
after some-service
use logger
}
```
Avoid `need net` for best-effort services.
## Network-Independent SSH
```sh
#!/sbin/openrc-run
description="SSH server"
depend() {
need localmount
after bee-sshsetup
use logger
}
start() {
check_config || return 1
ebegin "Starting dropbear"
/usr/sbin/dropbear ${DROPBEAR_OPTS}
eend $?
}
```
Place this in `etc/init.d/dropbear` in the overlay to override package defaults that require network.
## Persistent DHCP
Wrong:
```sh
udhcpc -i "$iface" -t 3 -T 5 -n -q
```
Correct:
```sh
udhcpc -i "$iface" -b -t 0 -T 3 -q
```
## Typical Start Order
```text
localmount
-> sshsetup
-> dropbear
-> network
-> nvidia
-> audit
```
Use `after` for ordering without turning soft dependencies into hard boot blockers.
## Error Handling Skeleton
```sh
start() {
ebegin "Running audit"
/usr/local/bin/audit --output /var/log/audit.json >> /var/log/audit.log 2>&1
local rc=$?
if [ $rc -eq 0 ]; then
einfo "Audit complete"
else
ewarn "Audit finished with errors — check /var/log/audit.log"
fi
eend 0
}
```

View File

@@ -7,127 +7,16 @@ Version: 1.0
Rules for OpenRC services that run in unattended environments: LiveCDs, kiosks, embedded systems.
No user is present. No TTY prompts. Every failure path must have a silent fallback.
---
## Core Invariants
- **Never block boot.** A service failure must not prevent other services from starting.
- **Never prompt.** No `read`, no `pause`, no interactive input of any kind.
- **Always exit 0.** Use `eend 0` at the end of `start()` regardless of the operation result.
- **Log everything.** Write results to `/var/log/` so SSH inspection is possible after boot.
- **Fail silently, degrade gracefully.** Missing tool → skip that collector. No network → skip network-dependent steps.
---
## Service Dependencies
Use the minimum necessary dependencies:
```sh
depend() {
need localmount # almost always needed
after some-service # ordering without hard dependency
use logger # optional soft dependency
# DO NOT add: need net, need networking, need network-online
}
```
**Never use `need net` or `need networking`** unless the service is genuinely useless without
network and you want it to fail loudly when no cable is connected.
For services that work with or without network, use `after` instead.
---
## Network-Independent SSH
Dropbear (and any SSH server) must start without network being available.
Common mistake: installing dropbear-openrc which adds `need net` in its default init.
Override with a custom init:
```sh
#!/sbin/openrc-run
description="SSH server"
depend() {
need localmount
after bee-sshsetup # key/user setup, not networking
use logger
# NO need net
}
start() {
check_config || return 1
ebegin "Starting dropbear"
/usr/sbin/dropbear ${DROPBEAR_OPTS}
eend $?
}
```
Place this file in the overlay at `etc/init.d/dropbear` — it overrides the package-installed version.
---
## Persistent DHCP
Do not use blocking DHCP (`-n` flag exits if no offer). Use background mode so the client
retries automatically when a cable is connected after boot:
```sh
# Wrong — exits immediately if no DHCP offer
udhcpc -i "$iface" -t 3 -T 5 -n -q
# Correct — background daemon, retries indefinitely
udhcpc -i "$iface" -b -t 0 -T 3 -q
```
The network service itself should complete immediately (exit 0) — udhcpc daemons run in background.
---
## Service Start Order (typical LiveCD)
```
localmount
└── sshsetup (user creation, key injection — before dropbear)
└── dropbear (SSH — independent of network)
└── network (DHCP on all interfaces — does not block anything)
└── nvidia (or other hardware init — after network in case firmware needs it)
└── audit (main workload — after all hardware ready)
```
Services at the same level can start in parallel. Use `after` not `need` for ordering without hard dependency.
---
## Error Handling in start()
```sh
start() {
ebegin "Running audit"
/usr/local/bin/audit --output /var/log/audit.json >> /var/log/audit.log 2>&1
local rc=$?
if [ $rc -eq 0 ]; then
einfo "Audit complete"
else
ewarn "Audit finished with errors — check /var/log/audit.log"
fi
eend 0 # always 0 — never fail the runlevel
}
```
- Capture exit code into a local variable.
- Log the result with `einfo` or `ewarn`.
- Always `eend 0` — a failed audit is not a reason to block the boot runlevel.
- The exception: services whose failure makes SSH impossible (e.g. key setup) may `return 1`.
---
See `README.md` for sample init scripts and ordering sketches.
## Rules
- Every `start()` ends with `eend 0` unless failure makes the entire environment unusable.
- Network is always best-effort. Test for it, don't depend on it.
- Proprietary drivers (NVIDIA, etc.): load failure → log warning → continue without enrichment.
- External tools (ipmitool, smartctl, etc.): not-found → skip that data source → do not abort.
- Timeout all external commands: `timeout 30 smartctl ...` prevents infinite hangs.
- Write all output to `/var/log/` — TTY output is secondary.
- Never block boot. A service failure must not stop the rest of the runlevel.
- Never prompt. Do not use `read`, pause logic, or any interactive fallback.
- Every `start()` must end with `eend 0` unless failure makes the environment fundamentally unusable, such as breaking SSH setup.
- Write service diagnostics to `/var/log/`. TTY output is secondary.
- Missing tools, absent network, or driver load failures must degrade gracefully: log and continue.
- Use the minimum dependency set. Prefer `after` and `use`; do not add `need net`, `need networking`, or `need network-online` unless the service is truly useless without network and failure should be loud.
- SSH services must start without requiring network availability.
- DHCP must be non-blocking and persistent. Run the client in background retry mode rather than failing the boot sequence when no lease is immediately available.
- External commands must be timeout-bounded so a bad device or tool cannot hang boot indefinitely.