# Runtime Flows And Invariants ## History-First Mutation Path All state-changing mutations for assets/components must go through the history domain layer. Sources covered: - user registry edits - user asset-page component actions (`Add / Edit / Remove`) - hardware JSON ingest - manual CSV ingest History apply transaction invariants: 1. lock current projection row (`parts` or `machines`) 2. load latest snapshot (or bootstrap from projection) 3. apply validated patch/domain command 4. semantic dedupe (`after_hash == before_hash` => no-op, no event/snapshot/timeline) 5. write history event + snapshot 6. update projections 7. write timeline projection rows Direct projection writes are allowed only inside history recompute/rebuild flows. `observations` remains raw ingest trace only: - runtime current-state reads (UI/API/history projection metadata) must use history + projections (`parts`, `machines`, `installations`, `timeline_events`, `machine_firmware_states`) - `observations` may be used for ingest forensics, audits, and backfill/migration only ## Event Time Source Priority Use component event time over ingest time whenever possible. - `eventFallbackTime(actual, ingestedAt, collectedAt)`: 1. `actual` 2. `ingestedAt` 3. `collectedAt` - `collectedFallbackTime(collectedAt, ingestedAt)`: 1. `collectedAt` 2. `ingestedAt` ## Status Event Time Parsing Order `parseComponentStatusEventTime` resolves time in this order: 1. `status_changed_at` 2. Latest matching `status_history` item for current status 3. Latest parseable `status_history` item 4. `status_checked_at` ## Failure Event Rules For critical components: - Timeline event type: `COMPONENT_FAILED` - `failure_events.failure_time` uses resolved failure time (not raw ingest time) - `failure_events.external_id` includes the same failure timestamp - `failure_events` is a projection and may be rebuilt from component history (`COMPONENT_STATUS_SET`) Manual UI failure registration invariants: - `POST /failures` is history-backed for component status (`COMPONENT_STATUS_SET` -> `FAILED`) - manual failure registration writes status history event and `failure_events` projection in one DB transaction - manual failures use a dedicated projection source (`manual_ui`) so history recompute cleanup for ingest/history sources does not delete them - `failure_date` from UI is normalized to start-of-day UTC timestamp for `failure_events.failure_time` in v1 - failure detail `open duration` is computed with date precision (UTC day difference from failure date to repair date / current date), not timestamp precision Active failure / repair-by-replacement semantics (Failures UI): - active failure = failure event without later replacement install in the same slot on the same server - repair by replacement is derived from `installations` history: later install of a different component (`part_id != failed part_id`) into same `machine_id + slot_name` - if slot at failure time cannot be resolved from installation history (e.g. manual failure date normalized to start of day), UI may fallback to current installation slot for display-only slot rendering in `Active Failures` ## First Seen Rules `parts.first_seen_at` must be the earliest known ingest-derived component time. Candidate sources: 1. Parseable `status_history[].changed_at` 2. `status_changed_at` 3. `status_checked_at` 4. `eventFallbackTime(nil, ingestedAt, collectedAt)` Persistence rule: - Keep the minimum value over time. - If incoming is earlier than stored value, overwrite with incoming value. - In history mode this is persisted as component metadata correction (`COMPONENT_FIRST_SEEN_CORRECTED`), then projected into `parts.first_seen_at`. ## Duplicate Component Serial Rules (CSV + JSON Ingest) If serial numbers are not unique within the same `p/n` (`model`) inside one ingest payload: - First occurrence keeps original `vendor_serial`. - Each next duplicate occurrence is assigned a service serial placeholder: - Format: `NO_SN-XXXXXXXX` (8-digit zero-padded global counter). - If `vendor_serial` is empty, a service serial placeholder is assigned as well. - Counter is global for the whole application and stored in `id_sequences` under `entity_type = 'no_sn_placeholder'`. ## Component Health Computation In UI Component health is derived only from the latest status event among: - `COMPONENT_FAILED` - `COMPONENT_WARNING` - `COMPONENT_UNKNOWN` - `COMPONENT_OK` Non-status timeline events (`INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`, `FIRMWARE_INSTALLED`, etc.) must not change health status. Status event mapping notes: - `COMPONENT_STATUS_SET` with `runtime.health_status = UNKNOWN` must emit `COMPONENT_UNKNOWN` timeline event. - Unknown/unrecognized status timeline events must never default to `Healthy` in UI state derivation. ## Firmware Timeline Rules For component firmware observations: - First observed version -> `FIRMWARE_INSTALLED` (asset + component timeline pair) - Later version change -> `FIRMWARE_CHANGED` (asset + component timeline pair) - In history mode, firmware observations are persisted as: - component: `COMPONENT_FIRMWARE_SET` - asset device firmware: `ASSET_FIRMWARE_DEVICE_SET` Storage details: - `FIRMWARE_INSTALLED` stores transition string in `timeline_events.firmware_version`: `- -> ` - `FIRMWARE_CHANGED` stores installed/new firmware value Detection details: - Previous observation lookup: `ORDER BY observed_at DESC, id DESC LIMIT 1 OFFSET 1` ## Install / Remove Flow (Cross-Entity) Install/remove operations are applied as cross-entity history commands in one transaction. Invariants: - component and asset history events are both written - both events share one `correlation_id` - `installations` is updated as a projection - asset/component timeline rows are emitted as paired projection events - installation slot/location (`installations.slot_name`) is projection state synchronized from component installation snapshot field `installation.slot_name` ## Asset Page Current Components Actions All asset page component actions are history-backed and transactional. - `Add`: - either creates a new component and attaches it to current asset, or attaches an existing component - duplicate preflight checks `vendor_serial` and `vendor_serial + model` - if existing component is installed on another asset and user confirms force attach, runtime executes a move (remove old asset + install new asset) in one transaction with shared `correlation_id` - `Edit`: - applies bulk history patches to selected components currently attached to the asset - multi-select edits must reject unique fields (`vendor_serial`, `slot`) - `Remove`: - remove is **de-assert only** (no detach-only mode) - runtime always performs: 1. `COMPONENT_REMOVED` (detach from asset) 2. `COMPONENT_STATUS_SET` with user-selected status mapped from `working | not_working | unknown` -> `OK | FAILED | UNKNOWN` ## Log Collected Flow - Hardware log collection is represented in history as `ASSET_LOG_COLLECTED`. - UI/API timeline renders it as `LOG_COLLECTED`. ## Delete / Rollback / Hard Restore Flows Mutating history operations are asynchronous and DB-backed (`history_recompute_jobs`). - Delete event: - soft-delete logical history event (`is_deleted`) - mark linked timeline rows deleted - enqueue recompute job - Rollback (compensating): - async job creates a new rollback history event (`*_ROLLBACK_APPLIED`) - original history remains intact - Hard restore (admin): - async job physically deletes future history rows after target snapshot/version - operation is recorded in `history_admin_audit` - Batch cancel by source (admin): - preview lists matching history events by `source_type` and optional date/source-ref filters - if `date_from` / `date_to` are omitted, date filters do not constrain the result set - date-only filters are interpreted as inclusive day bounds (`00:00:00` to `23:59:59`) - execute soft-deletes matched events and enqueues generic recompute jobs for all affected assets/components - affected set expands from asset events to linked component entities via `correlation_id` ## Data Repair And Cleanup (Admin) - `Repair All` is an operational best-effort flow: - may restore missing `timeline_events.slot_name` from `observations` (repair/backfill only) - may enqueue recompute jobs for affected entities - retries transient DB connection errors and continues on per-entity failures (best effort) - `Cleanup Orphaned Projections` removes registry/projection/raw rows for entities that have no active history left after cancellations. - `observations` usage in repair flows is allowed only as one-time recovery input; runtime state must still come from history + projections. ## Recompute Scope And Propagation - Component recompute rebuilds component projections (`parts`, `timeline_events`, `installations`, `failure_events`). - Asset recompute rebuilds asset projections (`machines`, `machine_firmware_states`, `timeline_events`) and then triggers recompute for linked components to restore cross-entity projection consistency. - Asset delete/hard-restore propagates to correlated component history events via `correlation_id`. ## MySQL Tx Cursor Safety (Critical) For MySQL/MariaDB transactional code, do not execute additional SQL on the same `tx` while iterating an open result set (`for rows.Next()`). Observed failure signature when violating this rule: - driver logs: `[mysql] ... invalid connection`, `unexpected EOF` - app errors: `driver: bad connection` - DB warnings: `Aborted connection ... Got an error reading communication packets` Required pattern: 1. Read all rows into in-memory structs. 2. Close the cursor. 3. Perform follow-up `QueryRowContext` / `ExecContext` writes and projection rebuild logic. This rule is mandatory for recompute/rebuild flows and any projection repair routines. ## Timeline API Grouping - History timeline endpoints group events by day by default and return **timeline cards**: - `single` (one event) - `dedup` (same visual action + context in one day) - `bulk` (mass operation correlated by `correlation_id`) - Asset timeline adds a fallback synthetic bulk for movement noise reduction: - if `component_installed` / `component_removed` events do not have `correlation_id`, - they may be collapsed into a synthetic bulk card by `visual_action + source_type + 1h time bucket` - this is a UI/read-model aggregation rule only (raw timeline rows remain unchanged) - Default timezone for grouping is `UTC`; callers may override with `tz`. - Timeline card drilldown resolves to concrete events and history event details through history API endpoints and must use the same `tz` as card grouping when matching card day buckets. - Timeline cards display source labels from history `source_type` (`ingest_json`, `ingest_csv`, `user`, `system`). ## Timeline Color Semantics - `REMOVED` -> yellow - `COMPONENT_FAILED` -> red - `COMPONENT_UNKNOWN` -> gray - `COMPONENT_WARNING` and related warning semantics follow `timelineEventClass` ## Regression Guardrails Do not reintroduce these regressions: - Using ingest timestamp when payload provides better event/failure timestamp - Letting `INSTALLED` mark failed components as healthy - Missing `Previous Components` section on asset page - Missing installation history on component page - Missing firmware information on component page timeline - Writing ingest state transitions directly to projections/timeline while bypassing history apply - Creating duplicate history events for semantic no-op updates - Executing nested SQL on the same transaction while `rows.Next()` cursor is still open (must use two-phase read-then-write)