- Add bible.git as submodule at bible/ - Rename bible/ → bible-local/ (project-specific architecture) - Update CLAUDE.md to reference both bible/ and bible-local/ - Add AGENTS.md for Codex with same structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
Runtime Flows And Invariants
History-First Mutation Path
All state-changing mutations for assets/components must go through the history domain layer.
Sources covered:
- user registry edits
- user asset-page component actions (
Add / Edit / Remove) - hardware JSON ingest
- manual CSV ingest
History apply transaction invariants:
- lock current projection row (
partsormachines) - load latest snapshot (or bootstrap from projection)
- apply validated patch/domain command
- semantic dedupe (
after_hash == before_hash=> no-op, no event/snapshot/timeline) - write history event + snapshot
- update projections
- write timeline projection rows
Direct projection writes are allowed only inside history recompute/rebuild flows.
observations remains raw ingest trace only:
- runtime current-state reads (UI/API/history projection metadata) must use history + projections (
parts,machines,installations,timeline_events,machine_firmware_states) observationsmay be used for ingest forensics, audits, and backfill/migration only
Event Time Source Priority
Use component event time over ingest time whenever possible.
eventFallbackTime(actual, ingestedAt, collectedAt):
actualingestedAtcollectedAt
collectedFallbackTime(collectedAt, ingestedAt):
collectedAtingestedAt
Status Event Time Parsing Order
parseComponentStatusEventTime resolves time in this order:
status_changed_at- Latest matching
status_historyitem for current status - Latest parseable
status_historyitem status_checked_at
Failure Event Rules
For critical components:
- Timeline event type:
COMPONENT_FAILED failure_events.failure_timeuses resolved failure time (not raw ingest time)failure_events.external_idincludes the same failure timestampfailure_eventsis a projection and may be rebuilt from component history (COMPONENT_STATUS_SET)
Manual UI failure registration invariants:
POST /failuresis history-backed for component status (COMPONENT_STATUS_SET->FAILED)- manual failure registration writes status history event and
failure_eventsprojection in one DB transaction - manual failures use a dedicated projection source (
manual_ui) so history recompute cleanup for ingest/history sources does not delete them failure_datefrom UI is normalized to start-of-day UTC timestamp forfailure_events.failure_timein v1- failure detail
open durationis computed with date precision (UTC day difference from failure date to repair date / current date), not timestamp precision
Active failure / repair-by-replacement semantics (Failures UI):
- active failure = failure event without later replacement install in the same slot on the same server
- repair by replacement is derived from
installationshistory: later install of a different component (part_id != failed part_id) into samemachine_id + slot_name - if slot at failure time cannot be resolved from installation history (e.g. manual failure date normalized to start of day), UI may fallback to current installation slot for display-only slot rendering in
Active Failures
First Seen Rules
parts.first_seen_at must be the earliest known ingest-derived component time.
Candidate sources:
- Parseable
status_history[].changed_at status_changed_atstatus_checked_ateventFallbackTime(nil, ingestedAt, collectedAt)
Persistence rule:
- Keep the minimum value over time.
- If incoming is earlier than stored value, overwrite with incoming value.
- In history mode this is persisted as component metadata correction (
COMPONENT_FIRST_SEEN_CORRECTED), then projected intoparts.first_seen_at.
Duplicate Component Serial Rules (CSV + JSON Ingest)
If serial numbers are not unique within the same p/n (model) inside one ingest payload:
- First occurrence keeps original
vendor_serial. - Each next duplicate occurrence is assigned a service serial placeholder:
- Format:
NO_SN-XXXXXXXX(8-digit zero-padded global counter).
- Format:
- If
vendor_serialis empty, a service serial placeholder is assigned as well. - Counter is global for the whole application and stored in
id_sequencesunderentity_type = 'no_sn_placeholder'.
Component Health Computation In UI
Component health is derived only from the latest status event among:
COMPONENT_FAILEDCOMPONENT_WARNINGCOMPONENT_UNKNOWNCOMPONENT_OK
Non-status timeline events (INSTALLED, REMOVED, FIRMWARE_CHANGED, FIRMWARE_INSTALLED, etc.) must not change health status.
Status event mapping notes:
COMPONENT_STATUS_SETwithruntime.health_status = UNKNOWNmust emitCOMPONENT_UNKNOWNtimeline event.- Unknown/unrecognized status timeline events must never default to
Healthyin UI state derivation.
Firmware Timeline Rules
For component firmware observations:
- First observed version ->
FIRMWARE_INSTALLED(asset + component timeline pair) - Later version change ->
FIRMWARE_CHANGED(asset + component timeline pair) - In history mode, firmware observations are persisted as:
- component:
COMPONENT_FIRMWARE_SET - asset device firmware:
ASSET_FIRMWARE_DEVICE_SET
- component:
Storage details:
FIRMWARE_INSTALLEDstores transition string intimeline_events.firmware_version:- -> <installed_version>FIRMWARE_CHANGEDstores installed/new firmware value
Detection details:
- Previous observation lookup:
ORDER BY observed_at DESC, id DESC LIMIT 1 OFFSET 1
Install / Remove Flow (Cross-Entity)
Install/remove operations are applied as cross-entity history commands in one transaction.
Invariants:
- component and asset history events are both written
- both events share one
correlation_id installationsis updated as a projection- asset/component timeline rows are emitted as paired projection events
- installation slot/location (
installations.slot_name) is projection state synchronized from component installation snapshot fieldinstallation.slot_name
Asset Page Current Components Actions
All asset page component actions are history-backed and transactional.
Add:- either creates a new component and attaches it to current asset, or attaches an existing component
- duplicate preflight checks
vendor_serialandvendor_serial + model - if existing component is installed on another asset and user confirms force attach, runtime executes a move (remove old asset + install new asset) in one transaction with shared
correlation_id
Edit:- applies bulk history patches to selected components currently attached to the asset
- multi-select edits must reject unique fields (
vendor_serial,slot)
Remove:- remove is de-assert only (no detach-only mode)
- runtime always performs:
COMPONENT_REMOVED(detach from asset)COMPONENT_STATUS_SETwith user-selected status mapped fromworking | not_working | unknown->OK | FAILED | UNKNOWN
Log Collected Flow
- Hardware log collection is represented in history as
ASSET_LOG_COLLECTED. - UI/API timeline renders it as
LOG_COLLECTED.
Delete / Rollback / Hard Restore Flows
Mutating history operations are asynchronous and DB-backed (history_recompute_jobs).
- Delete event:
- soft-delete logical history event (
is_deleted) - mark linked timeline rows deleted
- enqueue recompute job
- soft-delete logical history event (
- Rollback (compensating):
- async job creates a new rollback history event (
*_ROLLBACK_APPLIED) - original history remains intact
- async job creates a new rollback history event (
- Hard restore (admin):
- async job physically deletes future history rows after target snapshot/version
- operation is recorded in
history_admin_audit
- Batch cancel by source (admin):
- preview lists matching history events by
source_typeand optional date/source-ref filters - if
date_from/date_toare omitted, date filters do not constrain the result set - date-only filters are interpreted as inclusive day bounds (
00:00:00to23:59:59) - execute soft-deletes matched events and enqueues generic recompute jobs for all affected assets/components
- affected set expands from asset events to linked component entities via
correlation_id
- preview lists matching history events by
Data Repair And Cleanup (Admin)
Repair Allis an operational best-effort flow:- may restore missing
timeline_events.slot_namefromobservations(repair/backfill only) - may enqueue recompute jobs for affected entities
- retries transient DB connection errors and continues on per-entity failures (best effort)
- may restore missing
Cleanup Orphaned Projectionsremoves registry/projection/raw rows for entities that have no active history left after cancellations.observationsusage in repair flows is allowed only as one-time recovery input; runtime state must still come from history + projections.
Recompute Scope And Propagation
- Component recompute rebuilds component projections (
parts,timeline_events,installations,failure_events). - Asset recompute rebuilds asset projections (
machines,machine_firmware_states,timeline_events) and then triggers recompute for linked components to restore cross-entity projection consistency. - Asset delete/hard-restore propagates to correlated component history events via
correlation_id.
MySQL Tx Cursor Safety (Critical)
For MySQL/MariaDB transactional code, do not execute additional SQL on the same tx while iterating an open result set (for rows.Next()).
Observed failure signature when violating this rule:
- driver logs:
[mysql] ... invalid connection,unexpected EOF - app errors:
driver: bad connection - DB warnings:
Aborted connection ... Got an error reading communication packets
Required pattern:
- Read all rows into in-memory structs.
- Close the cursor.
- Perform follow-up
QueryRowContext/ExecContextwrites and projection rebuild logic.
This rule is mandatory for recompute/rebuild flows and any projection repair routines.
Timeline API Grouping
- History timeline endpoints group events by day by default and return timeline cards:
single(one event)dedup(same visual action + context in one day)bulk(mass operation correlated bycorrelation_id)
- Asset timeline adds a fallback synthetic bulk for movement noise reduction:
- if
component_installed/component_removedevents do not havecorrelation_id, - they may be collapsed into a synthetic bulk card by
visual_action + source_type + 1h time bucket - this is a UI/read-model aggregation rule only (raw timeline rows remain unchanged)
- if
- Default timezone for grouping is
UTC; callers may override withtz. - Timeline card drilldown resolves to concrete events and history event details through history API endpoints and must use the same
tzas card grouping when matching card day buckets. - Timeline cards display source labels from history
source_type(ingest_json,ingest_csv,user,system).
Timeline Color Semantics
REMOVED-> yellowCOMPONENT_FAILED-> redCOMPONENT_UNKNOWN-> grayCOMPONENT_WARNINGand related warning semantics followtimelineEventClass
Regression Guardrails
Do not reintroduce these regressions:
- Using ingest timestamp when payload provides better event/failure timestamp
- Letting
INSTALLEDmark failed components as healthy - Missing
Previous Componentssection on asset page - Missing installation history on component page
- Missing firmware information on component page timeline
- Writing ingest state transitions directly to projections/timeline while bypassing history apply
- Creating duplicate history events for semantic no-op updates
- Executing nested SQL on the same transaction while
rows.Next()cursor is still open (must use two-phase read-then-write)