Files
core/bible-local/architecture/runtime-flows.md
Michael Chus c5d253a9df Add shared bible submodule, rename local bible to bible-local
- Add bible.git as submodule at bible/
- Rename bible/ → bible-local/ (project-specific architecture)
- Update CLAUDE.md to reference both bible/ and bible-local/
- Add AGENTS.md for Codex with same structure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 16:37:10 +03:00

12 KiB

Runtime Flows And Invariants

History-First Mutation Path

All state-changing mutations for assets/components must go through the history domain layer.

Sources covered:

  • user registry edits
  • user asset-page component actions (Add / Edit / Remove)
  • hardware JSON ingest
  • manual CSV ingest

History apply transaction invariants:

  1. lock current projection row (parts or machines)
  2. load latest snapshot (or bootstrap from projection)
  3. apply validated patch/domain command
  4. semantic dedupe (after_hash == before_hash => no-op, no event/snapshot/timeline)
  5. write history event + snapshot
  6. update projections
  7. write timeline projection rows

Direct projection writes are allowed only inside history recompute/rebuild flows.

observations remains raw ingest trace only:

  • runtime current-state reads (UI/API/history projection metadata) must use history + projections (parts, machines, installations, timeline_events, machine_firmware_states)
  • observations may be used for ingest forensics, audits, and backfill/migration only

Event Time Source Priority

Use component event time over ingest time whenever possible.

  • eventFallbackTime(actual, ingestedAt, collectedAt):
  1. actual
  2. ingestedAt
  3. collectedAt
  • collectedFallbackTime(collectedAt, ingestedAt):
  1. collectedAt
  2. ingestedAt

Status Event Time Parsing Order

parseComponentStatusEventTime resolves time in this order:

  1. status_changed_at
  2. Latest matching status_history item for current status
  3. Latest parseable status_history item
  4. status_checked_at

Failure Event Rules

For critical components:

  • Timeline event type: COMPONENT_FAILED
  • failure_events.failure_time uses resolved failure time (not raw ingest time)
  • failure_events.external_id includes the same failure timestamp
  • failure_events is a projection and may be rebuilt from component history (COMPONENT_STATUS_SET)

Manual UI failure registration invariants:

  • POST /failures is history-backed for component status (COMPONENT_STATUS_SET -> FAILED)
  • manual failure registration writes status history event and failure_events projection in one DB transaction
  • manual failures use a dedicated projection source (manual_ui) so history recompute cleanup for ingest/history sources does not delete them
  • failure_date from UI is normalized to start-of-day UTC timestamp for failure_events.failure_time in v1
  • failure detail open duration is computed with date precision (UTC day difference from failure date to repair date / current date), not timestamp precision

Active failure / repair-by-replacement semantics (Failures UI):

  • active failure = failure event without later replacement install in the same slot on the same server
  • repair by replacement is derived from installations history: later install of a different component (part_id != failed part_id) into same machine_id + slot_name
  • if slot at failure time cannot be resolved from installation history (e.g. manual failure date normalized to start of day), UI may fallback to current installation slot for display-only slot rendering in Active Failures

First Seen Rules

parts.first_seen_at must be the earliest known ingest-derived component time.

Candidate sources:

  1. Parseable status_history[].changed_at
  2. status_changed_at
  3. status_checked_at
  4. eventFallbackTime(nil, ingestedAt, collectedAt)

Persistence rule:

  • Keep the minimum value over time.
  • If incoming is earlier than stored value, overwrite with incoming value.
  • In history mode this is persisted as component metadata correction (COMPONENT_FIRST_SEEN_CORRECTED), then projected into parts.first_seen_at.

Duplicate Component Serial Rules (CSV + JSON Ingest)

If serial numbers are not unique within the same p/n (model) inside one ingest payload:

  • First occurrence keeps original vendor_serial.
  • Each next duplicate occurrence is assigned a service serial placeholder:
    • Format: NO_SN-XXXXXXXX (8-digit zero-padded global counter).
  • If vendor_serial is empty, a service serial placeholder is assigned as well.
  • Counter is global for the whole application and stored in id_sequences under entity_type = 'no_sn_placeholder'.

Component Health Computation In UI

Component health is derived only from the latest status event among:

  • COMPONENT_FAILED
  • COMPONENT_WARNING
  • COMPONENT_UNKNOWN
  • COMPONENT_OK

Non-status timeline events (INSTALLED, REMOVED, FIRMWARE_CHANGED, FIRMWARE_INSTALLED, etc.) must not change health status.

Status event mapping notes:

  • COMPONENT_STATUS_SET with runtime.health_status = UNKNOWN must emit COMPONENT_UNKNOWN timeline event.
  • Unknown/unrecognized status timeline events must never default to Healthy in UI state derivation.

Firmware Timeline Rules

For component firmware observations:

  • First observed version -> FIRMWARE_INSTALLED (asset + component timeline pair)
  • Later version change -> FIRMWARE_CHANGED (asset + component timeline pair)
  • In history mode, firmware observations are persisted as:
    • component: COMPONENT_FIRMWARE_SET
    • asset device firmware: ASSET_FIRMWARE_DEVICE_SET

Storage details:

  • FIRMWARE_INSTALLED stores transition string in timeline_events.firmware_version: - -> <installed_version>
  • FIRMWARE_CHANGED stores installed/new firmware value

Detection details:

  • Previous observation lookup: ORDER BY observed_at DESC, id DESC LIMIT 1 OFFSET 1

Install / Remove Flow (Cross-Entity)

Install/remove operations are applied as cross-entity history commands in one transaction.

Invariants:

  • component and asset history events are both written
  • both events share one correlation_id
  • installations is updated as a projection
  • asset/component timeline rows are emitted as paired projection events
  • installation slot/location (installations.slot_name) is projection state synchronized from component installation snapshot field installation.slot_name

Asset Page Current Components Actions

All asset page component actions are history-backed and transactional.

  • Add:
    • either creates a new component and attaches it to current asset, or attaches an existing component
    • duplicate preflight checks vendor_serial and vendor_serial + model
    • if existing component is installed on another asset and user confirms force attach, runtime executes a move (remove old asset + install new asset) in one transaction with shared correlation_id
  • Edit:
    • applies bulk history patches to selected components currently attached to the asset
    • multi-select edits must reject unique fields (vendor_serial, slot)
  • Remove:
    • remove is de-assert only (no detach-only mode)
    • runtime always performs:
      1. COMPONENT_REMOVED (detach from asset)
      2. COMPONENT_STATUS_SET with user-selected status mapped from working | not_working | unknown -> OK | FAILED | UNKNOWN

Log Collected Flow

  • Hardware log collection is represented in history as ASSET_LOG_COLLECTED.
  • UI/API timeline renders it as LOG_COLLECTED.

Delete / Rollback / Hard Restore Flows

Mutating history operations are asynchronous and DB-backed (history_recompute_jobs).

  • Delete event:
    • soft-delete logical history event (is_deleted)
    • mark linked timeline rows deleted
    • enqueue recompute job
  • Rollback (compensating):
    • async job creates a new rollback history event (*_ROLLBACK_APPLIED)
    • original history remains intact
  • Hard restore (admin):
    • async job physically deletes future history rows after target snapshot/version
    • operation is recorded in history_admin_audit
  • Batch cancel by source (admin):
    • preview lists matching history events by source_type and optional date/source-ref filters
    • if date_from / date_to are omitted, date filters do not constrain the result set
    • date-only filters are interpreted as inclusive day bounds (00:00:00 to 23:59:59)
    • execute soft-deletes matched events and enqueues generic recompute jobs for all affected assets/components
    • affected set expands from asset events to linked component entities via correlation_id

Data Repair And Cleanup (Admin)

  • Repair All is an operational best-effort flow:
    • may restore missing timeline_events.slot_name from observations (repair/backfill only)
    • may enqueue recompute jobs for affected entities
    • retries transient DB connection errors and continues on per-entity failures (best effort)
  • Cleanup Orphaned Projections removes registry/projection/raw rows for entities that have no active history left after cancellations.
  • observations usage in repair flows is allowed only as one-time recovery input; runtime state must still come from history + projections.

Recompute Scope And Propagation

  • Component recompute rebuilds component projections (parts, timeline_events, installations, failure_events).
  • Asset recompute rebuilds asset projections (machines, machine_firmware_states, timeline_events) and then triggers recompute for linked components to restore cross-entity projection consistency.
  • Asset delete/hard-restore propagates to correlated component history events via correlation_id.

MySQL Tx Cursor Safety (Critical)

For MySQL/MariaDB transactional code, do not execute additional SQL on the same tx while iterating an open result set (for rows.Next()).

Observed failure signature when violating this rule:

  • driver logs: [mysql] ... invalid connection, unexpected EOF
  • app errors: driver: bad connection
  • DB warnings: Aborted connection ... Got an error reading communication packets

Required pattern:

  1. Read all rows into in-memory structs.
  2. Close the cursor.
  3. Perform follow-up QueryRowContext / ExecContext writes and projection rebuild logic.

This rule is mandatory for recompute/rebuild flows and any projection repair routines.

Timeline API Grouping

  • History timeline endpoints group events by day by default and return timeline cards:
    • single (one event)
    • dedup (same visual action + context in one day)
    • bulk (mass operation correlated by correlation_id)
  • Asset timeline adds a fallback synthetic bulk for movement noise reduction:
    • if component_installed / component_removed events do not have correlation_id,
    • they may be collapsed into a synthetic bulk card by visual_action + source_type + 1h time bucket
    • this is a UI/read-model aggregation rule only (raw timeline rows remain unchanged)
  • Default timezone for grouping is UTC; callers may override with tz.
  • Timeline card drilldown resolves to concrete events and history event details through history API endpoints and must use the same tz as card grouping when matching card day buckets.
  • Timeline cards display source labels from history source_type (ingest_json, ingest_csv, user, system).

Timeline Color Semantics

  • REMOVED -> yellow
  • COMPONENT_FAILED -> red
  • COMPONENT_UNKNOWN -> gray
  • COMPONENT_WARNING and related warning semantics follow timelineEventClass

Regression Guardrails

Do not reintroduce these regressions:

  • Using ingest timestamp when payload provides better event/failure timestamp
  • Letting INSTALLED mark failed components as healthy
  • Missing Previous Components section on asset page
  • Missing installation history on component page
  • Missing firmware information on component page timeline
  • Writing ingest state transitions directly to projections/timeline while bypassing history apply
  • Creating duplicate history events for semantic no-op updates
  • Executing nested SQL on the same transaction while rows.Next() cursor is still open (must use two-phase read-then-write)