- Add bible.git as submodule at bible/ - Rename bible/ → bible-local/ (project-specific architecture) - Update CLAUDE.md to reference both bible/ and bible-local/ - Add AGENTS.md for Codex with same structure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
250 lines
12 KiB
Markdown
250 lines
12 KiB
Markdown
# Runtime Flows And Invariants
|
|
|
|
## History-First Mutation Path
|
|
|
|
All state-changing mutations for assets/components must go through the history domain layer.
|
|
|
|
Sources covered:
|
|
- user registry edits
|
|
- user asset-page component actions (`Add / Edit / Remove`)
|
|
- hardware JSON ingest
|
|
- manual CSV ingest
|
|
|
|
History apply transaction invariants:
|
|
1. lock current projection row (`parts` or `machines`)
|
|
2. load latest snapshot (or bootstrap from projection)
|
|
3. apply validated patch/domain command
|
|
4. semantic dedupe (`after_hash == before_hash` => no-op, no event/snapshot/timeline)
|
|
5. write history event + snapshot
|
|
6. update projections
|
|
7. write timeline projection rows
|
|
|
|
Direct projection writes are allowed only inside history recompute/rebuild flows.
|
|
|
|
`observations` remains raw ingest trace only:
|
|
- runtime current-state reads (UI/API/history projection metadata) must use history + projections (`parts`, `machines`, `installations`, `timeline_events`, `machine_firmware_states`)
|
|
- `observations` may be used for ingest forensics, audits, and backfill/migration only
|
|
|
|
## Event Time Source Priority
|
|
|
|
Use component event time over ingest time whenever possible.
|
|
|
|
- `eventFallbackTime(actual, ingestedAt, collectedAt)`:
|
|
1. `actual`
|
|
2. `ingestedAt`
|
|
3. `collectedAt`
|
|
|
|
- `collectedFallbackTime(collectedAt, ingestedAt)`:
|
|
1. `collectedAt`
|
|
2. `ingestedAt`
|
|
|
|
## Status Event Time Parsing Order
|
|
|
|
`parseComponentStatusEventTime` resolves time in this order:
|
|
|
|
1. `status_changed_at`
|
|
2. Latest matching `status_history` item for current status
|
|
3. Latest parseable `status_history` item
|
|
4. `status_checked_at`
|
|
|
|
## Failure Event Rules
|
|
|
|
For critical components:
|
|
|
|
- Timeline event type: `COMPONENT_FAILED`
|
|
- `failure_events.failure_time` uses resolved failure time (not raw ingest time)
|
|
- `failure_events.external_id` includes the same failure timestamp
|
|
- `failure_events` is a projection and may be rebuilt from component history (`COMPONENT_STATUS_SET`)
|
|
|
|
Manual UI failure registration invariants:
|
|
- `POST /failures` is history-backed for component status (`COMPONENT_STATUS_SET` -> `FAILED`)
|
|
- manual failure registration writes status history event and `failure_events` projection in one DB transaction
|
|
- manual failures use a dedicated projection source (`manual_ui`) so history recompute cleanup for ingest/history sources does not delete them
|
|
- `failure_date` from UI is normalized to start-of-day UTC timestamp for `failure_events.failure_time` in v1
|
|
- failure detail `open duration` is computed with date precision (UTC day difference from failure date to repair date / current date), not timestamp precision
|
|
|
|
Active failure / repair-by-replacement semantics (Failures UI):
|
|
- active failure = failure event without later replacement install in the same slot on the same server
|
|
- repair by replacement is derived from `installations` history: later install of a different component (`part_id != failed part_id`) into same `machine_id + slot_name`
|
|
- if slot at failure time cannot be resolved from installation history (e.g. manual failure date normalized to start of day), UI may fallback to current installation slot for display-only slot rendering in `Active Failures`
|
|
|
|
## First Seen Rules
|
|
|
|
`parts.first_seen_at` must be the earliest known ingest-derived component time.
|
|
|
|
Candidate sources:
|
|
|
|
1. Parseable `status_history[].changed_at`
|
|
2. `status_changed_at`
|
|
3. `status_checked_at`
|
|
4. `eventFallbackTime(nil, ingestedAt, collectedAt)`
|
|
|
|
Persistence rule:
|
|
|
|
- Keep the minimum value over time.
|
|
- If incoming is earlier than stored value, overwrite with incoming value.
|
|
- In history mode this is persisted as component metadata correction (`COMPONENT_FIRST_SEEN_CORRECTED`), then projected into `parts.first_seen_at`.
|
|
|
|
## Duplicate Component Serial Rules (CSV + JSON Ingest)
|
|
|
|
If serial numbers are not unique within the same `p/n` (`model`) inside one ingest payload:
|
|
|
|
- First occurrence keeps original `vendor_serial`.
|
|
- Each next duplicate occurrence is assigned a service serial placeholder:
|
|
- Format: `NO_SN-XXXXXXXX` (8-digit zero-padded global counter).
|
|
- If `vendor_serial` is empty, a service serial placeholder is assigned as well.
|
|
- Counter is global for the whole application and stored in `id_sequences` under `entity_type = 'no_sn_placeholder'`.
|
|
|
|
## Component Health Computation In UI
|
|
|
|
Component health is derived only from the latest status event among:
|
|
|
|
- `COMPONENT_FAILED`
|
|
- `COMPONENT_WARNING`
|
|
- `COMPONENT_UNKNOWN`
|
|
- `COMPONENT_OK`
|
|
|
|
Non-status timeline events (`INSTALLED`, `REMOVED`, `FIRMWARE_CHANGED`, `FIRMWARE_INSTALLED`, etc.) must not change health status.
|
|
|
|
Status event mapping notes:
|
|
- `COMPONENT_STATUS_SET` with `runtime.health_status = UNKNOWN` must emit `COMPONENT_UNKNOWN` timeline event.
|
|
- Unknown/unrecognized status timeline events must never default to `Healthy` in UI state derivation.
|
|
|
|
## Firmware Timeline Rules
|
|
|
|
For component firmware observations:
|
|
|
|
- First observed version -> `FIRMWARE_INSTALLED` (asset + component timeline pair)
|
|
- Later version change -> `FIRMWARE_CHANGED` (asset + component timeline pair)
|
|
- In history mode, firmware observations are persisted as:
|
|
- component: `COMPONENT_FIRMWARE_SET`
|
|
- asset device firmware: `ASSET_FIRMWARE_DEVICE_SET`
|
|
|
|
Storage details:
|
|
|
|
- `FIRMWARE_INSTALLED` stores transition string in `timeline_events.firmware_version`: `- -> <installed_version>`
|
|
- `FIRMWARE_CHANGED` stores installed/new firmware value
|
|
|
|
Detection details:
|
|
|
|
- Previous observation lookup: `ORDER BY observed_at DESC, id DESC LIMIT 1 OFFSET 1`
|
|
|
|
## Install / Remove Flow (Cross-Entity)
|
|
|
|
Install/remove operations are applied as cross-entity history commands in one transaction.
|
|
|
|
Invariants:
|
|
- component and asset history events are both written
|
|
- both events share one `correlation_id`
|
|
- `installations` is updated as a projection
|
|
- asset/component timeline rows are emitted as paired projection events
|
|
- installation slot/location (`installations.slot_name`) is projection state synchronized from component installation snapshot field `installation.slot_name`
|
|
|
|
## Asset Page Current Components Actions
|
|
|
|
All asset page component actions are history-backed and transactional.
|
|
|
|
- `Add`:
|
|
- either creates a new component and attaches it to current asset, or attaches an existing component
|
|
- duplicate preflight checks `vendor_serial` and `vendor_serial + model`
|
|
- if existing component is installed on another asset and user confirms force attach, runtime executes a move (remove old asset + install new asset) in one transaction with shared `correlation_id`
|
|
- `Edit`:
|
|
- applies bulk history patches to selected components currently attached to the asset
|
|
- multi-select edits must reject unique fields (`vendor_serial`, `slot`)
|
|
- `Remove`:
|
|
- remove is **de-assert only** (no detach-only mode)
|
|
- runtime always performs:
|
|
1. `COMPONENT_REMOVED` (detach from asset)
|
|
2. `COMPONENT_STATUS_SET` with user-selected status mapped from `working | not_working | unknown` -> `OK | FAILED | UNKNOWN`
|
|
|
|
## Log Collected Flow
|
|
|
|
- Hardware log collection is represented in history as `ASSET_LOG_COLLECTED`.
|
|
- UI/API timeline renders it as `LOG_COLLECTED`.
|
|
|
|
## Delete / Rollback / Hard Restore Flows
|
|
|
|
Mutating history operations are asynchronous and DB-backed (`history_recompute_jobs`).
|
|
|
|
- Delete event:
|
|
- soft-delete logical history event (`is_deleted`)
|
|
- mark linked timeline rows deleted
|
|
- enqueue recompute job
|
|
- Rollback (compensating):
|
|
- async job creates a new rollback history event (`*_ROLLBACK_APPLIED`)
|
|
- original history remains intact
|
|
- Hard restore (admin):
|
|
- async job physically deletes future history rows after target snapshot/version
|
|
- operation is recorded in `history_admin_audit`
|
|
- Batch cancel by source (admin):
|
|
- preview lists matching history events by `source_type` and optional date/source-ref filters
|
|
- if `date_from` / `date_to` are omitted, date filters do not constrain the result set
|
|
- date-only filters are interpreted as inclusive day bounds (`00:00:00` to `23:59:59`)
|
|
- execute soft-deletes matched events and enqueues generic recompute jobs for all affected assets/components
|
|
- affected set expands from asset events to linked component entities via `correlation_id`
|
|
|
|
## Data Repair And Cleanup (Admin)
|
|
|
|
- `Repair All` is an operational best-effort flow:
|
|
- may restore missing `timeline_events.slot_name` from `observations` (repair/backfill only)
|
|
- may enqueue recompute jobs for affected entities
|
|
- retries transient DB connection errors and continues on per-entity failures (best effort)
|
|
- `Cleanup Orphaned Projections` removes registry/projection/raw rows for entities that have no active history left after cancellations.
|
|
- `observations` usage in repair flows is allowed only as one-time recovery input; runtime state must still come from history + projections.
|
|
|
|
## Recompute Scope And Propagation
|
|
|
|
- Component recompute rebuilds component projections (`parts`, `timeline_events`, `installations`, `failure_events`).
|
|
- Asset recompute rebuilds asset projections (`machines`, `machine_firmware_states`, `timeline_events`) and then triggers recompute for linked components to restore cross-entity projection consistency.
|
|
- Asset delete/hard-restore propagates to correlated component history events via `correlation_id`.
|
|
|
|
## MySQL Tx Cursor Safety (Critical)
|
|
|
|
For MySQL/MariaDB transactional code, do not execute additional SQL on the same `tx` while iterating an open result set (`for rows.Next()`).
|
|
|
|
Observed failure signature when violating this rule:
|
|
- driver logs: `[mysql] ... invalid connection`, `unexpected EOF`
|
|
- app errors: `driver: bad connection`
|
|
- DB warnings: `Aborted connection ... Got an error reading communication packets`
|
|
|
|
Required pattern:
|
|
1. Read all rows into in-memory structs.
|
|
2. Close the cursor.
|
|
3. Perform follow-up `QueryRowContext` / `ExecContext` writes and projection rebuild logic.
|
|
|
|
This rule is mandatory for recompute/rebuild flows and any projection repair routines.
|
|
|
|
## Timeline API Grouping
|
|
|
|
- History timeline endpoints group events by day by default and return **timeline cards**:
|
|
- `single` (one event)
|
|
- `dedup` (same visual action + context in one day)
|
|
- `bulk` (mass operation correlated by `correlation_id`)
|
|
- Asset timeline adds a fallback synthetic bulk for movement noise reduction:
|
|
- if `component_installed` / `component_removed` events do not have `correlation_id`,
|
|
- they may be collapsed into a synthetic bulk card by `visual_action + source_type + 1h time bucket`
|
|
- this is a UI/read-model aggregation rule only (raw timeline rows remain unchanged)
|
|
- Default timezone for grouping is `UTC`; callers may override with `tz`.
|
|
- Timeline card drilldown resolves to concrete events and history event details through history API endpoints and must use the same `tz` as card grouping when matching card day buckets.
|
|
- Timeline cards display source labels from history `source_type` (`ingest_json`, `ingest_csv`, `user`, `system`).
|
|
|
|
## Timeline Color Semantics
|
|
|
|
- `REMOVED` -> yellow
|
|
- `COMPONENT_FAILED` -> red
|
|
- `COMPONENT_UNKNOWN` -> gray
|
|
- `COMPONENT_WARNING` and related warning semantics follow `timelineEventClass`
|
|
|
|
## Regression Guardrails
|
|
|
|
Do not reintroduce these regressions:
|
|
|
|
- Using ingest timestamp when payload provides better event/failure timestamp
|
|
- Letting `INSTALLED` mark failed components as healthy
|
|
- Missing `Previous Components` section on asset page
|
|
- Missing installation history on component page
|
|
- Missing firmware information on component page timeline
|
|
- Writing ingest state transitions directly to projections/timeline while bypassing history apply
|
|
- Creating duplicate history events for semantic no-op updates
|
|
- Executing nested SQL on the same transaction while `rows.Next()` cursor is still open (must use two-phase read-then-write)
|