Files
core/bible-local/architecture/runtime-flows.md
Michael Chus c5d253a9df Add shared bible submodule, rename local bible to bible-local
- Add bible.git as submodule at bible/
- Rename bible/ → bible-local/ (project-specific architecture)
- Update CLAUDE.md to reference both bible/ and bible-local/
- Add AGENTS.md for Codex with same structure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 16:37:10 +03:00

250 lines
12 KiB
Markdown

# Runtime Flows And Invariants
## History-First Mutation Path
All state-changing mutations for assets/components must go through the history domain layer.
Sources covered:
- user registry edits
- user asset-page component actions (`Add / Edit / Remove`)
- hardware JSON ingest
- manual CSV ingest
History apply transaction invariants:
1. lock current projection row (`parts` or `machines`)
2. load latest snapshot (or bootstrap from projection)
3. apply validated patch/domain command
4. semantic dedupe (`after_hash == before_hash` => no-op, no event/snapshot/timeline)
5. write history event + snapshot
6. update projections
7. write timeline projection rows
Direct projection writes are allowed only inside history recompute/rebuild flows.
`observations` remains raw ingest trace only:
- runtime current-state reads (UI/API/history projection metadata) must use history + projections (`parts`, `machines`, `installations`, `timeline_events`, `machine_firmware_states`)
- `observations` may be used for ingest forensics, audits, and backfill/migration only
## Event Time Source Priority
Use component event time over ingest time whenever possible.
- `eventFallbackTime(actual, ingestedAt, collectedAt)`:
1. `actual`
2. `ingestedAt`
3. `collectedAt`
- `collectedFallbackTime(collectedAt, ingestedAt)`:
1. `collectedAt`
2. `ingestedAt`
## Status Event Time Parsing Order
`parseComponentStatusEventTime` resolves time in this order:
1. `status_changed_at`
2. Latest matching `status_history` item for current status
3. Latest parseable `status_history` item
4. `status_checked_at`
## Failure Event Rules
For critical components:
- Timeline event type: `COMPONENT_FAILED`
- `failure_events.failure_time` uses resolved failure time (not raw ingest time)
- `failure_events.external_id` includes the same failure timestamp
- `failure_events` is a projection and may be rebuilt from component history (`COMPONENT_STATUS_SET`)
Manual UI failure registration invariants:
- `POST /failures` is history-backed for component status (`COMPONENT_STATUS_SET` -> `FAILED`)
- manual failure registration writes status history event and `failure_events` projection in one DB transaction
- manual failures use a dedicated projection source (`manual_ui`) so history recompute cleanup for ingest/history sources does not delete them
- `failure_date` from UI is normalized to start-of-day UTC timestamp for `failure_events.failure_time` in v1
- failure detail `open duration` is computed with date precision (UTC day difference from failure date to repair date / current date), not timestamp precision
Active failure / repair-by-replacement semantics (Failures UI):
- active failure = failure event without later replacement install in the same slot on the same server
- repair by replacement is derived from `installations` history: later install of a different component (`part_id != failed part_id`) into same `machine_id + slot_name`
- if slot at failure time cannot be resolved from installation history (e.g. manual failure date normalized to start of day), UI may fallback to current installation slot for display-only slot rendering in `Active Failures`
## First Seen Rules
`parts.first_seen_at` must be the earliest known ingest-derived component time.
Candidate sources:
1. Parseable `status_history[].changed_at`
2. `status_changed_at`
3. `status_checked_at`
4. `eventFallbackTime(nil, ingestedAt, collectedAt)`
Persistence rule:
- Keep the minimum value over time.
- If incoming is earlier than stored value, overwrite with incoming value.
- In history mode this is persisted as component metadata correction (`COMPONENT_FIRST_SEEN_CORRECTED`), then projected into `parts.first_seen_at`.
## Duplicate Component Serial Rules (CSV + JSON Ingest)
If serial numbers are not unique within the same `p/n` (`model`) inside one ingest payload:
- First occurrence keeps original `vendor_serial`.
- Each next duplicate occurrence is assigned a service serial placeholder:
- Format: `NO_SN-XXXXXXXX` (8-digit zero-padded global counter).
- If `vendor_serial` is empty, a service serial placeholder is assigned as well.
- Counter is global for the whole application and stored in `id_sequences` under `entity_type = 'no_sn_placeholder'`.
## Component Health Computation In UI
Component health is derived only from the latest status event among:
- `COMPONENT_FAILED`
- `COMPONENT_WARNING`
- `COMPONENT_UNKNOWN`
- `COMPONENT_OK`
Non-status timeline events (`INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`, `FIRMWARE_INSTALLED`, etc.) must not change health status.
Status event mapping notes:
- `COMPONENT_STATUS_SET` with `runtime.health_status = UNKNOWN` must emit `COMPONENT_UNKNOWN` timeline event.
- Unknown/unrecognized status timeline events must never default to `Healthy` in UI state derivation.
## Firmware Timeline Rules
For component firmware observations:
- First observed version -> `FIRMWARE_INSTALLED` (asset + component timeline pair)
- Later version change -> `FIRMWARE_CHANGED` (asset + component timeline pair)
- In history mode, firmware observations are persisted as:
- component: `COMPONENT_FIRMWARE_SET`
- asset device firmware: `ASSET_FIRMWARE_DEVICE_SET`
Storage details:
- `FIRMWARE_INSTALLED` stores transition string in `timeline_events.firmware_version`: `- -> <installed_version>`
- `FIRMWARE_CHANGED` stores installed/new firmware value
Detection details:
- Previous observation lookup: `ORDER BY observed_at DESC, id DESC LIMIT 1 OFFSET 1`
## Install / Remove Flow (Cross-Entity)
Install/remove operations are applied as cross-entity history commands in one transaction.
Invariants:
- component and asset history events are both written
- both events share one `correlation_id`
- `installations` is updated as a projection
- asset/component timeline rows are emitted as paired projection events
- installation slot/location (`installations.slot_name`) is projection state synchronized from component installation snapshot field `installation.slot_name`
## Asset Page Current Components Actions
All asset page component actions are history-backed and transactional.
- `Add`:
- either creates a new component and attaches it to current asset, or attaches an existing component
- duplicate preflight checks `vendor_serial` and `vendor_serial + model`
- if existing component is installed on another asset and user confirms force attach, runtime executes a move (remove old asset + install new asset) in one transaction with shared `correlation_id`
- `Edit`:
- applies bulk history patches to selected components currently attached to the asset
- multi-select edits must reject unique fields (`vendor_serial`, `slot`)
- `Remove`:
- remove is **de-assert only** (no detach-only mode)
- runtime always performs:
1. `COMPONENT_REMOVED` (detach from asset)
2. `COMPONENT_STATUS_SET` with user-selected status mapped from `working | not_working | unknown` -> `OK | FAILED | UNKNOWN`
## Log Collected Flow
- Hardware log collection is represented in history as `ASSET_LOG_COLLECTED`.
- UI/API timeline renders it as `LOG_COLLECTED`.
## Delete / Rollback / Hard Restore Flows
Mutating history operations are asynchronous and DB-backed (`history_recompute_jobs`).
- Delete event:
- soft-delete logical history event (`is_deleted`)
- mark linked timeline rows deleted
- enqueue recompute job
- Rollback (compensating):
- async job creates a new rollback history event (`*_ROLLBACK_APPLIED`)
- original history remains intact
- Hard restore (admin):
- async job physically deletes future history rows after target snapshot/version
- operation is recorded in `history_admin_audit`
- Batch cancel by source (admin):
- preview lists matching history events by `source_type` and optional date/source-ref filters
- if `date_from` / `date_to` are omitted, date filters do not constrain the result set
- date-only filters are interpreted as inclusive day bounds (`00:00:00` to `23:59:59`)
- execute soft-deletes matched events and enqueues generic recompute jobs for all affected assets/components
- affected set expands from asset events to linked component entities via `correlation_id`
## Data Repair And Cleanup (Admin)
- `Repair All` is an operational best-effort flow:
- may restore missing `timeline_events.slot_name` from `observations` (repair/backfill only)
- may enqueue recompute jobs for affected entities
- retries transient DB connection errors and continues on per-entity failures (best effort)
- `Cleanup Orphaned Projections` removes registry/projection/raw rows for entities that have no active history left after cancellations.
- `observations` usage in repair flows is allowed only as one-time recovery input; runtime state must still come from history + projections.
## Recompute Scope And Propagation
- Component recompute rebuilds component projections (`parts`, `timeline_events`, `installations`, `failure_events`).
- Asset recompute rebuilds asset projections (`machines`, `machine_firmware_states`, `timeline_events`) and then triggers recompute for linked components to restore cross-entity projection consistency.
- Asset delete/hard-restore propagates to correlated component history events via `correlation_id`.
## MySQL Tx Cursor Safety (Critical)
For MySQL/MariaDB transactional code, do not execute additional SQL on the same `tx` while iterating an open result set (`for rows.Next()`).
Observed failure signature when violating this rule:
- driver logs: `[mysql] ... invalid connection`, `unexpected EOF`
- app errors: `driver: bad connection`
- DB warnings: `Aborted connection ... Got an error reading communication packets`
Required pattern:
1. Read all rows into in-memory structs.
2. Close the cursor.
3. Perform follow-up `QueryRowContext` / `ExecContext` writes and projection rebuild logic.
This rule is mandatory for recompute/rebuild flows and any projection repair routines.
## Timeline API Grouping
- History timeline endpoints group events by day by default and return **timeline cards**:
- `single` (one event)
- `dedup` (same visual action + context in one day)
- `bulk` (mass operation correlated by `correlation_id`)
- Asset timeline adds a fallback synthetic bulk for movement noise reduction:
- if `component_installed` / `component_removed` events do not have `correlation_id`,
- they may be collapsed into a synthetic bulk card by `visual_action + source_type + 1h time bucket`
- this is a UI/read-model aggregation rule only (raw timeline rows remain unchanged)
- Default timezone for grouping is `UTC`; callers may override with `tz`.
- Timeline card drilldown resolves to concrete events and history event details through history API endpoints and must use the same `tz` as card grouping when matching card day buckets.
- Timeline cards display source labels from history `source_type` (`ingest_json`, `ingest_csv`, `user`, `system`).
## Timeline Color Semantics
- `REMOVED` -> yellow
- `COMPONENT_FAILED` -> red
- `COMPONENT_UNKNOWN` -> gray
- `COMPONENT_WARNING` and related warning semantics follow `timelineEventClass`
## Regression Guardrails
Do not reintroduce these regressions:
- Using ingest timestamp when payload provides better event/failure timestamp
- Letting `INSTALLED` mark failed components as healthy
- Missing `Previous Components` section on asset page
- Missing installation history on component page
- Missing firmware information on component page timeline
- Writing ingest state transitions directly to projections/timeline while bypassing history apply
- Creating duplicate history events for semantic no-op updates
- Executing nested SQL on the same transaction while `rows.Next()` cursor is still open (must use two-phase read-then-write)