openapi: 3.1.0

info:
  title: Deployment Dashboard API
  version: 1.0.0
  summary: Push-based CI/CD deployment event store with matrix + DAG views and SSE fan-out.
  description: |
    REST contract for the Deployment Dashboard backend (`Dashboard.Api`).

    Authoritative sources:
      - `docs/SAD.md`            — architecture, domain model, NFRs.
      - `docs/FRONTEND_REQUIREMENTS.md` — read-side consumer (Matrix + Swimlanes).

    Surfaces:
      - **Write** — append-only ingest of deployment **events**; gated by `X-Api-Key`.
      - **Read**  — matrix + per-slot history + raw event listings + discovery; unauthenticated.
      - **Stream** — SSE fan-out of newly-ingested deployment events; unauthenticated.
      - **Fetcher** — opaque per-adapter cursor state for `Dashboard.Fetcher`; gated by `X-Api-Key`.
      - **Control** — administrative operations + orchestration; gated by `X-Control-API-Key`.
        - `POST /api/control/reset` — initiates an asynchronous, multi-phase system-state reset; returns `202` immediately.
          Progress is signalled on the control stream as `reset-initiated` → `reset-started` → `reset-completed`.
        - `GET  /api/control/stream` — SSE channel; API emits orchestration events (incl. the reset phases); `Last-Event-ID` replay within 2 h window.
        - `POST /api/control/events` — single inbound endpoint; all components post their events here. Gated by `X-Api-Key`; component identified by `X-Component-Id` header; optional `X-Correlation-Id` header groups the event with a control command.
        - `GET  /api/control/events/stream` — SSE fan-out of component-reported events; `Last-Event-ID` replay within 2 h window. Unauthenticated.
      - **Ops** — `/healthz`, `/readyz`.

    **Communication model (Kubernetes-style).**
    The API is the single source of truth for state and orchestration.
    Components (fetcher, demo-driver, …) are always the callers — they never receive
    inbound connections from the API:

    ```
    Component ──GET /api/control/stream──► API   subscribe; receive orchestration events
    Component ──POST /api/control/events─► API   report;    post status / operational events
    ```

    Every arrow originates at the component. The SSE stream is a response to a
    component-initiated GET — the API emits into it, but the connection is inbound.

    **Event-store semantics.** The persistence model (SAD §11) is an append-only
    log of deployment **events** — every `POST /api/deployments` appends one row.
    A single logical deployment emits several rows during its lifecycle
    (e.g. `in-progress` → `success`/`failure`), all sharing one `deployment_id`.
    `deployment_id` is therefore an emitter-supplied **correlation key**, NOT a
    row identity and NOT an idempotency key. Retries from a flaky CI step produce
    additional rows; the read surface always reduces by `happened_at` (latest wins
    per `(service, environment)` for the Matrix; full history for the drawer).

    **`happened_at` is emitter-supplied**, not server-assigned (SAD §11). It is
    the UTC wall-clock at which the deployment transitioned to the reported
    `status` on the CI/CD side, NOT the moment the dashboard wrote the row.
    Order in every read surface is by `happened_at` — a delayed POST from a
    retry queue still sorts correctly relative to its peers.

    Errors follow RFC 9457 `application/problem+json`.

  contact:
    name: Deployment Dashboard
    url: https://github.com/kostiantyn-matsebora/deployment-dashboard
  license:
    name: MIT

servers:
  - url: /
    description: App Gateway (single origin, internal-only, no CORS).

tags:
  - name: ingest
    description: Write surface — append-only deployment events. Requires `X-Api-Key`.
  - name: matrix
    description: Denormalised services × environments view for the Matrix UI.
  - name: deployments
    description: Raw deployment-event listings + lookup. Used by Swimlanes + history drawer.
  - name: discovery
    description: Distinct services and environments derived from stored data.
  - name: analytics
    description: |
      DORA-anchored aggregate reads over the `deployment_events` log (issue #299).
      A small set of **focused, ETag-cacheable** `GET`s under `/api/analytics/*`, one
      per analytics concern — never one consolidated payload. Every endpoint takes a
      `window` query param (`7d` | `14d` | `30d`), **clamps it server-side** to
      `HISTORY_RETENTION_DAYS`, and echoes the resolved window
      (`AnalyticsWindow`: `days`, `from`, `to`, `retention_days`, `clamped`).
      Reads are unauthenticated (internal-only, same trust tier as the other reads)
      and carry a weak `ETag`; clients SHOULD send `If-None-Match`.
  - name: stream
    description: Real-time SSE fan-out of deployment events.
  - name: fetcher
    description: Opaque per-adapter cursor state for `Dashboard.Fetcher`.
  - name: control
    description: |
      Control surface — administrative operations and component orchestration.
      `POST /api/control/reset` triggers an **asynchronous, multi-phase reset**: the API
      drives components through `reset-initiated` → `reset-started` → `reset-completed` on
      the control stream, draining components first, then briefly gating ingest while data
      is cleared. There is no status endpoint — progress is observable on the stream only.
      `POST /api/control/reset` and `GET /api/control/stream` require `X-Control-API-Key`.
      `POST /api/control/events` requires `X-Api-Key` + `X-Component-Id`.
      `GET /api/control/events/stream` is unauthenticated.
  - name: ops
    description: Liveness + readiness probes.

# ─────────────────────────────────────────────────────────────────────────────
# Security
# ─────────────────────────────────────────────────────────────────────────────
security: []  # default: unauthenticated (read surface). Write ops opt-in below.

# ─────────────────────────────────────────────────────────────────────────────
# Paths
# ─────────────────────────────────────────────────────────────────────────────
paths:

  # ───── Write surface ──────────────────────────────────────────────────────
  /api/deployments:
    post:
      tags: [ingest]
      operationId: ingestDeployment
      summary: Append a single deployment event to the log.
      description: |
        Push a deployment-state event from a CI/CD pipeline (or the optional fetcher).

        **Pure append-only.** Every accepted call inserts one row. There is no
        server-side dedup: the server does not compare `(service, deployment_id)`,
        payload hashes, or any other key against existing rows. A retried POST
        produces a duplicate row — handling retries is the caller's concern.

        On success the backend `NOTIFY`s the PostgreSQL `deployment_events` channel,
        which fans out to every connected SSE client across all API instances.

        **Reset window.** During the brief data-clearing phase of a system-state reset
        ingest is unavailable and returns `503` with `Retry-After`; clients retry after
        the indicated delay. The window closes when `reset-completed` is emitted on the
        control stream.
      security:
        - apiKey: []
      parameters:
        - $ref: '#/components/parameters/XProgressReporter'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/DeploymentEventIngest'
            examples:
              promotionWithParent:
                $ref: '#/components/examples/IngestWithParents'
              terminalSuccess:
                $ref: '#/components/examples/IngestMinimal'
              standaloneInProgress:
                $ref: '#/components/examples/IngestInProgressNoParents'
      responses:
        '201':
          description: Event appended.
          headers:
            Location:
              schema: { type: string, format: uri }
              description: URL of the appended row (`/api/deployments/{id}`).
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DeploymentEvent'
        '401':
          $ref: '#/components/responses/Unauthorized'
        '422':
          description: |
            Payload validation failed — missing required field, bad enum,
            non-integer `run_number`, malformed `happened_at`, oversized array, …
          content:
            application/problem+json:
              schema: { $ref: '#/components/schemas/Problem' }
        '429':
          $ref: '#/components/responses/RateLimited'
        '503':
          $ref: '#/components/responses/ServiceUnavailable'

    # ───── Read surface — raw event listing ─────────────────────────────────
    get:
      tags: [deployments]
      operationId: listDeployments
      summary: List deployment events with cursor pagination.
      description: |
        Returns raw events, newest-first by `happened_at` (emitter-supplied).
        Powers the Swimlanes view, the slot-history drawer (when filtered by
        `service`+`environment`), and any general listing.

        Because the store is append-only, multiple rows may share a
        `deployment_id` (one per status transition). The client groups them
        as needed (e.g. "latest row per `deployment_id`" for swimlane nodes).

        DAG correlation is **client-side** — this endpoint never derives parent
        relationships beyond the explicit `parent_deployments` array on the event.
      parameters:
        - { name: service,       in: query, schema: { type: string }, description: Filter by service identifier. }
        - { name: environment,   in: query, schema: { type: string }, description: Filter by environment identifier. }
        - { name: status,        in: query, schema: { $ref: '#/components/schemas/Status' }, description: Filter by status. }
        - { name: deployment_id, in: query, schema: { type: string }, description: Filter to all rows of a single logical deployment. }
        - { name: since,         in: query, schema: { type: string, format: date-time }, description: "Only events with `happened_at >= since`." }
        - { name: until,         in: query, schema: { type: string, format: date-time }, description: "Only events with `happened_at <  until`." }
        - { name: cursor,        in: query, schema: { type: string }, description: Opaque pagination cursor from a previous page. }
        - { name: limit,         in: query, schema: { type: integer, minimum: 1, maximum: 500, default: 100 } }
      responses:
        '200':
          description: A page of deployment events.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DeploymentEventPage'

  /api/deployments/{id}:
    get:
      tags: [deployments]
      operationId: getDeployment
      summary: Fetch a single event row by surrogate id.
      parameters:
        - { name: id, in: path, required: true, schema: { type: string, format: uuid } }
      responses:
        '200':
          description: The event row.
          content:
            application/json:
              schema: { $ref: '#/components/schemas/DeploymentEvent' }
        '404':
          $ref: '#/components/responses/NotFound'

  # ───── Read surface — Matrix view ─────────────────────────────────────────
  /api/matrix:
    get:
      tags: [matrix]
      operationId: getMatrix
      summary: Services × environments deployment matrix (denormalised).
      description: |
        Returns one row per service. Each row contains a `slots` map keyed by
        environment name. Each slot carries:

          - `current` — the most recent **effective** event for that
            `(service, environment)`, where *effective* means `status` is one of
            `in-progress` | `success` | `failure`. Chosen by `MAX(happened_at)`
            among effective events. This is the live deployment the slot's
            primary state renders from.
          - `last_successful` — the most recent `success` event in the same slot
            (omitted when `current` is itself the last success).
          - `next` — the most recent **non-effective** event (`status` one of
            `pending` | `queued` | `waiting` | `cancelled` | `rejected`),
            present only when it is more recent (`happened_at`) than `current`.
            Represents the latest deployment beyond the live one; the frontend
            renders it as a small "next" badge, not as the slot's primary state.

        The frontend resolves the **six box states** (SAD §7 / FR §"Box states (Matrix)")
        from `(current.status, last_successful presence)` — no server-side state
        field — and renders `next` (when present) as a secondary badge.
      parameters:
        - { name: service, in: query, schema: { type: string }, description: Filter to a single service (Matrix inline filter). }
      responses:
        '200':
          description: The matrix snapshot.
          headers:
            ETag:
              schema: { type: string }
              description: Weak ETag of the snapshot. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
          content:
            application/json:
              schema: { $ref: '#/components/schemas/Matrix' }

  # ───── Discovery ──────────────────────────────────────────────────────────
  /api/services:
    get:
      tags: [discovery]
      operationId: listServices
      summary: Distinct service identifiers derived from stored events.
      responses:
        '200':
          description: Sorted list of distinct service identifiers.
          content:
            application/json:
              schema:
                type: object
                required: [items]
                properties:
                  items:
                    type: array
                    items: { type: string }
                examples:
                  - { items: ["service-a", "service-b", "checkout-api"] }

  /api/environments:
    get:
      tags: [discovery]
      operationId: listEnvironments
      summary: Distinct environment identifiers derived from stored events.
      responses:
        '200':
          description: Sorted list of distinct environment identifiers.
          content:
            application/json:
              schema:
                type: object
                required: [items]
                properties:
                  items:
                    type: array
                    items: { type: string }
                examples:
                  - { items: ["dev", "qa", "uat", "prod"] }

  # ───── Analytics — DORA-anchored aggregate reads (issue #299) ─────────────
  /api/analytics/dora:
    get:
      tags: [analytics]
      operationId: getAnalyticsDora
      summary: DORA Four Keys KPI band over the selected window.
      description: |
        The four DORA metrics for the resolved window, each as an `AnalyticsKpi`:
        `deployment_frequency`, `lead_time`, `change_failure_rate`,
        `time_to_restore`. Backs the Analytics KPI band.

        Each KPI carries `value` + `unit`, a `classification`
        (`elite` | `high` | `medium` | `low`), a `trend_delta` versus the **prior
        half-window**, and a per-day `sparkline[]`.

        **Lead-time caveat (binding).** True DORA lead time (commit → prod) is NOT
        in the event log. `lead_time` is APPROXIMATED from `parent_deployments`
        promotion chains that reach the production (terminal ladder) stage and is
        flagged `approximated: true`; the other three keys are `approximated: false`.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsDora'
        '304':
          $ref: '#/components/responses/NotModified'

  /api/analytics/frequency:
    get:
      tags: [analytics]
      operationId: getAnalyticsFrequency
      summary: Deployment frequency over time — per-day success vs failure.
      description: |
        One bucket per UTC day in the resolved window, each carrying `date`,
        `success`, and `failure` counts. Backs the stacked frequency-over-time bars.
        Counts are over terminal events (`success` | `failure`); non-terminal
        statuses are excluded.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsFrequency'
        '304':
          $ref: '#/components/responses/NotModified'

  /api/analytics/change-failure-rate:
    get:
      tags: [analytics]
      operationId: getAnalyticsChangeFailureRate
      summary: Change-failure-rate trend — per-day rate + elite threshold.
      description: |
        Per-day change-failure rate (`failure / (success + failure)`, `0` when the
        day has no terminal events) plus the constant `elite_threshold` of `0.15`
        for the dashed reference line. Backs the CFR trend chart.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsChangeFailureRate'
        '304':
          $ref: '#/components/responses/NotModified'

  /api/analytics/duration-histogram:
    get:
      tags: [analytics]
      operationId: getAnalyticsDurationHistogram
      summary: Deployment-duration distribution — bins + p50 + p95.
      description: |
        Histogram of deployment durations (minutes) over the resolved window, plus
        the `p50` and `p95` percentile markers. Backs the duration-distribution
        histogram.

        **Duration definition.** Per logical deployment (`deployment_id`),
        `last(happened_at) − first(happened_at)` across its event rows, in minutes.
        Single-row deployments (no measurable span) are excluded from both the bins
        and the percentiles.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsDurationHistogram'
        '304':
          $ref: '#/components/responses/NotModified'

  /api/analytics/promotion-funnel:
    get:
      tags: [analytics]
      operationId: getAnalyticsPromotionFunnel
      summary: Promotion funnel — per-stage counts + conversion.
      description: |
        Ordered promotion funnel over an operator-configured environment ladder
        (`ANALYTICS_FUNNEL_ENVIRONMENTS`, default `dev → staging → qa → preprod →
        prod`). Each stage carries its `count` (distinct logical deployments that
        reached the stage in the window) and `conversion` to the next stage (`null`
        for the terminal production stage). Backs the promotion-funnel (sankey)
        chart.

        The ladder's environments and order are configured by the deployment
        operator; environments outside the configured ladder are not represented as
        funnel stages.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsPromotionFunnel'
        '304':
          $ref: '#/components/responses/NotModified'

  /api/analytics/status-distribution:
    get:
      tags: [analytics]
      operationId: getAnalyticsStatusDistribution
      summary: Status distribution over the 8 statuses.
      description: |
        Event count per `Status` over the resolved window. All eight statuses are
        present (zero-filled) so the donut renders a stable slice set. Backs the
        status-distribution donut.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsStatusDistribution'
        '304':
          $ref: '#/components/responses/NotModified'

  /api/analytics/heatmap:
    get:
      tags: [analytics]
      operationId: getAnalyticsHeatmap
      summary: Deploy heatmap — day-of-week × hour counts.
      description: |
        Deployment counts bucketed by UTC day-of-week (`0` = Sunday … `6` =
        Saturday) and hour (`0`–`23`). Returns a flat list of populated cells;
        absent `(day, hour)` pairs are zero. Backs the day×hour heatmap.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsHeatmap'
        '304':
          $ref: '#/components/responses/NotModified'

  /api/analytics/top-deployers:
    get:
      tags: [analytics]
      operationId: getAnalyticsTopDeployers
      summary: Top deployers — actor + deployment count.
      description: |
        Deployment counts grouped by `actor`, descending by `count`, over the
        resolved window. Events with no `actor` are grouped under the `actor` value
        `unknown`. Backs the top-deployers leaderboard.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - { name: limit, in: query, schema: { type: integer, minimum: 1, maximum: 100, default: 10 }, description: Max number of deployers returned (highest counts first). }
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsTopDeployers'
        '304':
          $ref: '#/components/responses/NotModified'

  /api/analytics/incidents:
    get:
      tags: [analytics]
      operationId: getAnalyticsIncidents
      summary: Time-to-restore incidents — worst-first.
      description: |
        Recent restoration incidents over the resolved window, **worst-first**
        (longest `duration_minutes` first). An incident is a `failure` event in a
        `(service, environment)` slot followed by a later `success` in the same
        slot; `duration_minutes` is `restored_at − failed_at`. An unresolved
        failure (no subsequent success in the window) has `restored_at: null` and
        `duration_minutes: null` and sorts first. Backs the MTTR / recent-incidents
        list.
      parameters:
        - $ref: '#/components/parameters/AnalyticsWindowParam'
        - { name: limit, in: query, schema: { type: integer, minimum: 1, maximum: 100, default: 10 }, description: Max number of incidents returned (worst first). }
        - $ref: '#/components/parameters/IfNoneMatch'
      responses:
        '200':
          $ref: '#/components/responses/AnalyticsIncidents'
        '304':
          $ref: '#/components/responses/NotModified'

  # ───── Stream — deployment events (browser / SPA) ─────────────────────────
  /api/events/stream:
    get:
      tags: [stream]
      operationId: streamEvents
      summary: Server-Sent Events stream of newly-appended deployment events.
      description: |
        Long-lived `text/event-stream` connection. Each accepted ingest produces
        exactly one named event:

        ```
        id: 01J9F4WZK3W9G2T6X4QH3DKQF5
        event: deployment
        data: { ...DeploymentEvent... }
        ```

        Clients reconnect transparently with `Last-Event-ID` — the server replays
        every event with a strictly greater id. A heartbeat comment (`: ping`) is
        emitted every 15 s to keep intermediaries from idling the connection.

        Backed by PostgreSQL `LISTEN/NOTIFY` — each API instance fans out only to
        its own connected clients (NFR-05 statelessness).
      parameters:
        - name: Last-Event-ID
          in: header
          schema: { type: string }
          description: Resume cursor from a prior connection.
        - name: service
          in: query
          schema: { type: string }
          description: Optional server-side filter on service identifier.
      responses:
        '200':
          description: SSE stream opened.
          content:
            text/event-stream:
              schema:
                type: string
                examples:
                  - |
                    : ping
                    id: 01J9F4WZK3W9G2T6X4QH3DKQF5
                    event: deployment
                    data: {"id":"01J9F4WZK3W9G2T6X4QH3DKQF5","deployment_id":"gh-9491-1","service":"service-a","environment":"prod","version":"1.4.2","status":"success","happened_at":"2026-05-28T10:14:02Z","actor":"alice","run_number":"9491","parent_deployments":["gh-9482-1"]}

  # ───── Fetcher state ──────────────────────────────────────────────────────
  /api/fetcher/state/{adapter}:
    parameters:
      - name: adapter
        in: path
        required: true
        schema: { type: string, pattern: '^[a-z0-9][a-z0-9-]{0,63}$' }
        description: Adapter identifier (e.g. `github-actions`). Lowercase kebab.

    get:
      tags: [fetcher]
      operationId: getFetcherState
      summary: Read the opaque cursor for a fetcher adapter.
      security:
        - apiKey: []
      responses:
        '200':
          description: Stored cursor.
          content:
            application/json:
              schema: { $ref: '#/components/schemas/FetcherState' }
        '401':
          $ref: '#/components/responses/Unauthorized'
        '404':
          description: No state has been stored for this adapter yet.
          content:
            application/problem+json:
              schema: { $ref: '#/components/schemas/Problem' }

    put:
      tags: [fetcher]
      operationId: putFetcherState
      summary: Upsert the opaque cursor for a fetcher adapter.
      description: |
        The backend treats `cursor` as an opaque string — no parsing, no validation
        of shape. Max 8 KiB. Larger payloads are rejected with `413`. Latest write
        wins; the fetcher state row is the one place in the system that is NOT
        append-only.
      security:
        - apiKey: []
      requestBody:
        required: true
        content:
          application/json:
            schema: { $ref: '#/components/schemas/FetcherStateUpsert' }
      responses:
        '204': { description: Stored. }
        '401': { $ref: '#/components/responses/Unauthorized' }
        '413':
          description: Cursor exceeds the 8 KiB limit.
          content:
            application/problem+json:
              schema: { $ref: '#/components/schemas/Problem' }

  # ───── Control — destructive reset ────────────────────────────────────────
  /api/control/reset:
    post:
      tags: [control]
      operationId: resetState
      summary: Initiate an asynchronous system-state reset.
      description: |
        Initiates an asynchronous system-state reset. Returns `202` immediately;
        progress is signalled on the control stream as
        `reset-initiated` → `reset-started` → `reset-completed`. On completion,
        deployment history and fetcher cursors are cleared.

        The reset runs as a choreography: the API first drains connected components
        (they pause and acknowledge), then briefly gates ingest while it clears data,
        then releases the gate. During the clearing window `POST /api/deployments`
        returns `503`. There is **no status endpoint** — observe the control stream.

        Only one reset may be in flight at a time; a second `POST` while a reset is
        already draining or running returns `409`.

        Intended for test-environment teardown and local-dev resets.
        **Not safe in production.**

        Gated by `X-Control-API-Key` (D8) — a distinct secret from `X-Api-Key`
        so that regular ingest/fetcher credentials cannot trigger destructive operations.
      security:
        - controlApiKey: []
      responses:
        '202':
          description: Reset accepted; the choreography has started. Watch the control stream for progress.
          content:
            application/json:
              schema: { $ref: '#/components/schemas/ResetAccepted' }
        '401':
          $ref: '#/components/responses/Unauthorized'
        '409':
          description: A reset is already in progress (draining or resetting).
          content:
            application/problem+json:
              schema: { $ref: '#/components/schemas/Problem' }

  # ───── Control — orchestration SSE stream (API → components) ──────────────
  /api/control/stream:
    get:
      tags: [control]
      operationId: watchControlStream
      summary: SSE stream of orchestration events emitted by the API to components.
      description: |
        Long-lived `text/event-stream` connection **for internal service components**
        (fetcher, demo-driver, …) — not for browser clients.

        The API emits named events on this stream whenever system state changes.
        Components subscribe, react to each event, and report back via
        `POST /api/control/events`.

        **Communication direction.** Components always initiate the connection (GET).
        The API never calls components — this stream is the only outbound channel,
        and it is still a response to a component-initiated request.

        **`Last-Event-ID` replay.** Control stream events are persisted for **2 hours**
        in the `control_stream_events` table. A component that reconnects with
        `Last-Event-ID` receives all events with a strictly greater `id` still within
        the retention window. This mirrors the pattern of `GET /api/events/stream`.
        Events older than 2 hours are purged and will not be replayed.

        Backed by PostgreSQL `NOTIFY control_events` + `control_stream_events`
        persistence — consistent with the deployment-events fan-out and stateless
        across API instances (NFR-05).

        **Auth note.** Components MUST use `fetch()` + `ReadableStream` to supply
        the `X-Control-API-Key` header. The browser `EventSource` constructor cannot
        send custom headers and MUST NOT be used for this endpoint.

        **Current event types — reset choreography (`component: "*"`):**
        - `reset-initiated` — reset accepted; components drain (stop work, block their
          own surfaces) and ack via `POST /api/control/events`.
        - `reset-started` — all acks in (or the ack timeout elapsed); data is being
          cleared, ingest briefly returns `503`.
        - `reset-completed` — data cleared and gates released; components recover.

        Every frame carries `correlation_id` (the process id). On `reset-initiated`
        it equals the frame's own `id` (origin); on `reset-started` / `reset-completed`
        it is the initiating `reset-initiated` id. Each frame's `id` (SSE cursor) is
        always its own unique value — distinct from `correlation_id`.

        ```
        event: reset-initiated
        id: 01J9F4WZK3W9G2T6X4QH3DKQF6
        data: {"id":"01J9F4WZK3W9G2T6X4QH3DKQF6","type":"reset-initiated","component":"*","correlation_id":"01J9F4WZK3W9G2T6X4QH3DKQF6","occurred_at":"2026-05-31T10:00:00Z"}

        event: reset-started
        id: 01J9F4X0M5A1B2C3D4E5F6G7H8
        data: {"id":"01J9F4X0M5A1B2C3D4E5F6G7H8","type":"reset-started","component":"*","correlation_id":"01J9F4WZK3W9G2T6X4QH3DKQF6","occurred_at":"2026-05-31T10:00:10Z"}

        event: reset-completed
        id: 01J9F4X1N6B2C3D4E5F6G7H8J9
        data: {"id":"01J9F4X1N6B2C3D4E5F6G7H8J9","type":"reset-completed","component":"*","correlation_id":"01J9F4WZK3W9G2T6X4QH3DKQF6","occurred_at":"2026-05-31T10:00:11Z"}

        : ping
        ```
      security:
        - controlApiKey: []
      parameters:
        - name: Last-Event-ID
          in: header
          schema: { type: string }
          description: |
            Resume cursor. The server replays all persisted control events with
            `id` strictly greater than this value, then attaches to the live
            channel. Events older than 2 hours may no longer be available.
        - name: component
          in: query
          schema: { type: string }
          description: |
            Optional component id filter. When set, only events whose `component`
            field equals this id or `"*"` (applies to all) are delivered.
            Components SHOULD always set this to their own id.
      responses:
        '200':
          description: Control stream opened.
          content:
            text/event-stream:
              schema:
                type: string
                examples:
                  - |
                    : ping

                    event: reset-initiated
                    id: 01J9F4WZK3W9G2T6X4QH3DKQF6
                    data: {"id":"01J9F4WZK3W9G2T6X4QH3DKQF6","type":"reset-initiated","component":"*","correlation_id":"01J9F4WZK3W9G2T6X4QH3DKQF6","occurred_at":"2026-05-31T10:00:00Z"}

                    event: reset-started
                    id: 01J9F4X0M5A1B2C3D4E5F6G7H8
                    data: {"id":"01J9F4X0M5A1B2C3D4E5F6G7H8","type":"reset-started","component":"*","correlation_id":"01J9F4WZK3W9G2T6X4QH3DKQF6","occurred_at":"2026-05-31T10:00:10Z"}

                    event: reset-completed
                    id: 01J9F4X1N6B2C3D4E5F6G7H8J9
                    data: {"id":"01J9F4X1N6B2C3D4E5F6G7H8J9","type":"reset-completed","component":"*","correlation_id":"01J9F4WZK3W9G2T6X4QH3DKQF6","occurred_at":"2026-05-31T10:00:11Z"}
        '401':
          $ref: '#/components/responses/Unauthorized'

  # ───── Control — component event ingest (components → API) ────────────────
  /api/control/events:
    post:
      tags: [control]
      operationId: postComponentEvent
      summary: A component posts an operational event to the API.
      description: |
        Single inbound endpoint for **all** components. Component identity is
        carried in the **`X-Component-Id` header** (required), not the body.
        The server stores the header value as `component_id` on the persisted row.

        Components call this endpoint to report status transitions, heartbeats,
        and operational events. The API persists each call as an append-only
        record and fans it out on `GET /api/control/events/stream`.

        **Retention: 2 hours.** Rows older than 2 hours are purged by the
        background retention job. Do not use this endpoint as a durable audit log.

        **Expected call patterns:**
        - On every meaningful state transition (`idle → running`, `running → error`, …).
        - As a periodic heartbeat while in a stable state (≤ 30 s cadence recommended).
        - Immediately after reacting to an event from `GET /api/control/stream`
          (e.g. after completing a reset, post `state: idle`).
      security:
        - apiKey: []
      parameters:
        - name: X-Component-Id
          in: header
          required: true
          schema:
            type: string
            pattern: '^[a-z0-9][a-z0-9.-]{0,127}$'
          description: |
            **Required.** Component identifier. Lowercase kebab/dot.
            Examples: `dashboard-fetcher`, `dashboard-fetcher.github-actions`, `demo-driver`.
            The dotted variant (`.github-actions`) is illustrative of the pattern,
            not a registered component — reset acks MUST use the exact id in the
            server's `ExpectedComponents` (`dashboard-fetcher`, `demo-driver`).
            The server stores this value as `component_id` on the persisted row.
            Missing or pattern-invalid → `422`.
        - name: X-Correlation-Id
          in: header
          required: false
          schema:
            type: string
            maxLength: 128
          description: |
            Opaque correlation token grouping this event with the control command it
            belongs to — the process key. Generic: usable by ANY control command, not
            reset-only. The server stores it verbatim as the nullable `correlation_id`
            column and echoes it on the `GET /api/control/events/stream` frame
            (`ComponentEventRecord.correlation_id`). The format is **not** constrained
            to a UUID so future commands may use any opaque token.

            **Conditionally required (binding):**
            - **REQUIRED on `reset-ack`** — set to the `reset-initiated` event id
              (a UUIDv7). This is the ack-gate key (see Channel 3, `component_acks`):
              the orchestrator counts the ack only when `correlation_id` matches the
              in-flight cycle. A `reset-ack` with a missing/invalid `X-Correlation-Id`
              is still recorded (`204`) but does **NOT** count toward the gate
              (stale/mismatch-safe). There is no `reset_id` body field.
            - **OPTIONAL on non-reset posts** (`status` / `heartbeat` / `error` /
              `rate-limit`). For a post-reset `status`, components SHOULD set it to the
              reset id to correlate recovery to the same process.

            Absent → `correlation_id` is `null`; `204` (no error). Present, length
            1–128 → accepted (`204`). Longer than 128 chars → `422`
            (problem+json, `/X-Correlation-Id` pointer).
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ComponentEvent'
            examples:
              fetcher_running:
                summary: Fetcher reporting normal poll-loop status.
                value:
                  event_type: status
                  state: running
                  detail: "Polling github-actions adapter at 30 s interval"
                  occurred_at: "2026-05-31T10:00:00Z"
                  payload:
                    adapter: github-actions
                    last_poll_at: "2026-05-31T10:00:00Z"
                    events_this_hour: 42
              fetcher_reset_ack:
                summary: Fetcher acking a reset-initiated event (drain complete; paused).
                description: |
                  Body is identity-free as always. The ack is gated by the **required**
                  header `X-Correlation-Id: 01J9F4WZK3W9G2T6X4QH3DKQF6` (the
                  `reset-initiated` event id) — stored as `correlation_id`, echoed on the
                  SSE frame, and matched by the orchestrator. No `reset_id` body field.
                value:
                  event_type: reset-ack
                  state: paused
                  detail: "Drained; poll loop + ingestion stopped"
                  occurred_at: "2026-05-31T10:00:05Z"
              demo_driver_error:
                summary: Demo driver reporting an error.
                value:
                  event_type: error
                  state: error
                  detail: "Write API unreachable: connection refused"
                  occurred_at: "2026-05-31T10:01:00Z"
                  payload:
                    http_status: 0
                    attempt: 3
              heartbeat:
                summary: Periodic heartbeat (no state change).
                value:
                  event_type: heartbeat
                  state: running
                  occurred_at: "2026-05-31T10:00:30Z"
      responses:
        '204': { description: Event recorded. }
        '401':
          $ref: '#/components/responses/Unauthorized'
        '413':
          description: Payload field exceeds 8 KiB.
          content:
            application/problem+json:
              schema: { $ref: '#/components/schemas/Problem' }
        '422':
          description: Validation failure (body) or missing/invalid `X-Component-Id` header.
          content:
            application/problem+json:
              schema: { $ref: '#/components/schemas/Problem' }

  # ───── Stream — component events (browser / observability) ────────────────
  /api/control/events/stream:
    get:
      tags: [control]
      operationId: watchComponentEventStream
      summary: Server-Sent Events stream of component-reported events.
      description: |
        Long-lived `text/event-stream` connection carrying component-reported
        events. Each accepted `POST /api/control/events` produces exactly one
        named event:

        ```
        id: 01J9F4WZK3W9G2T6X4QH3DKQF5
        event: component
        data: { ...ComponentEventRecord... }
        ```

        Mirrors `GET /api/events/stream` exactly, differing only in payload:
        it carries component events instead of deployment events. Unauthenticated —
        a browser/observability read surface, same trust tier as the deployment
        stream. There are no filter query params; the only request input is the
        `Last-Event-ID` header.

        **Fresh connect (no `Last-Event-ID`)** attaches live only — no history is
        replayed. Clients reconnect transparently with `Last-Event-ID`: the server
        replays every record with a strictly greater `id` still within the **2-hour**
        retention window, then attaches to the live channel. A heartbeat comment
        (`: ping`) is emitted every 15 s to keep intermediaries from idling the
        connection.

        Backed by PostgreSQL `LISTEN/NOTIFY` — each API instance fans out only to
        its own connected clients (NFR-05 statelessness).
      parameters:
        - name: Last-Event-ID
          in: header
          schema: { type: string }
          description: |
            Resume cursor from a prior connection. The server replays all records
            with `id` strictly greater than this value within the 2-hour retention
            window, then attaches to the live channel. Records older than 2 hours
            may no longer be available.
      responses:
        '200':
          description: SSE stream opened.
          content:
            text/event-stream:
              schema:
                type: string
                examples:
                  - |
                    : ping
                    id: 01J9F4WZK3W9G2T6X4QH3DKQF5
                    event: component
                    data: {"id":"01J9F4WZK3W9G2T6X4QH3DKQF5","component_id":"dashboard-fetcher","correlation_id":null,"event_type":"status","state":"running","detail":"Polling github-actions adapter at 30 s interval","occurred_at":"2026-05-31T10:00:00Z","received_at":"2026-05-31T10:00:00Z","payload":{"adapter":"github-actions","events_this_hour":42}}

                    id: 01J9F4WZK3W9G2T6X4QH3DKQF7
                    event: component
                    data: {"id":"01J9F4WZK3W9G2T6X4QH3DKQF7","component_id":"dashboard-fetcher","correlation_id":"01J9F4WZK3W9G2T6X4QH3DKQF6","event_type":"reset-ack","state":"paused","detail":"Drained; poll loop + ingestion stopped","occurred_at":"2026-05-31T10:00:05Z","received_at":"2026-05-31T10:00:05Z","payload":{}}

  # ───── Ops ────────────────────────────────────────────────────────────────
  /healthz:
    get:
      tags: [ops]
      operationId: liveness
      summary: Liveness probe (process is up).
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                type: object
                required: [status]
                properties:
                  status: { type: string, enum: [ok] }

  /readyz:
    get:
      tags: [ops]
      operationId: readiness
      summary: Readiness probe (dependencies reachable).
      responses:
        '200':
          description: |
            Ready. Checks: DB reachable; `deployment_events` channel LISTEN attached;
            `control_events` channel LISTEN attached.
          content:
            application/json:
              schema: { $ref: '#/components/schemas/Readiness' }
        '503':
          description: Not ready.
          content:
            application/problem+json:
              schema: { $ref: '#/components/schemas/Problem' }

# ─────────────────────────────────────────────────────────────────────────────
# Components
# ─────────────────────────────────────────────────────────────────────────────
components:

  securitySchemes:
    apiKey:
      type: apiKey
      in: header
      name: X-Api-Key
      description: |
        Static, server-side-configured shared secret. Required on ingest, fetcher
        state, and component event reporting.
        Missing or invalid → `401`. The dev fake key is never embedded in the SPA
        bundle (NFR-04).
    controlApiKey:
      type: apiKey
      in: header
      name: X-Control-API-Key
      description: |
        Static, server-side-configured shared secret for control operations (D8).
        Distinct from `X-Api-Key` — least-privilege separation; ingest/fetcher callers
        cannot trigger destructive operations or subscribe to the control stream.
        Missing or invalid → `401`.

  parameters:
    XProgressReporter:
      name: X-Progress-Reporter
      in: header
      schema:
        type: string
        pattern: '^[a-z0-9][a-z0-9./-]{0,127}$'
      description: |
        Optional attribution header on deployment ingest (`POST /api/deployments`).
        Format: `<emitter>/<adapter>`, e.g. `dashboard-fetcher/github-actions`.
        Stored alongside the deployment event row; never rendered as an authoritative actor.

    AnalyticsWindowParam:
      name: window
      in: query
      schema:
        type: string
        enum: [7d, 14d, 30d]
        default: 7d
      description: |
        Look-back window for the analytics aggregate. One of `7d` | `14d` | `30d`.
        **Clamped server-side** to `HISTORY_RETENTION_DAYS` (default 365, min 90):
        when the requested span exceeds available retention the server narrows it
        and reports the effective span in the response's `AnalyticsWindow`
        (`days` + `clamped: true`). An absent or out-of-enum value resolves to the
        `7d` default.

    IfNoneMatch:
      name: If-None-Match
      in: header
      schema: { type: string }
      description: |
        Weak ETag from a prior analytics read. When it matches the current
        aggregate the server returns `304 Not Modified` with no body.

  responses:
    Unauthorized:
      description: Missing or invalid API key.
      content:
        application/problem+json:
          schema: { $ref: '#/components/schemas/Problem' }
    NotFound:
      description: Resource not found.
      content:
        application/problem+json:
          schema: { $ref: '#/components/schemas/Problem' }
    RateLimited:
      description: Too many requests.
      headers:
        Retry-After:
          schema: { type: integer }
          description: Seconds until the next request is permitted.
      content:
        application/problem+json:
          schema: { $ref: '#/components/schemas/Problem' }
    ServiceUnavailable:
      description: |
        Temporarily unavailable. On `POST /api/deployments` this signals the brief
        data-clearing window of a system-state reset; clients retry after `Retry-After`.
      headers:
        Retry-After:
          schema: { type: integer }
          description: Seconds until the request may be retried.
      content:
        application/problem+json:
          schema: { $ref: '#/components/schemas/Problem' }

    # ── Analytics shared responses ────────────────────────────────────────────
    NotModified:
      description: |
        The analytics aggregate is unchanged since the `If-None-Match` ETag.
        No body. The weak `ETag` is re-emitted so the client can keep validating.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the (unchanged) aggregate.

    AnalyticsDora:
      description: DORA Four Keys KPI band for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsDora' }

    AnalyticsFrequency:
      description: Per-day success/failure deployment counts for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsFrequency' }

    AnalyticsChangeFailureRate:
      description: Per-day change-failure rate + elite threshold for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsChangeFailureRate' }

    AnalyticsDurationHistogram:
      description: Deployment-duration histogram (bins + p50 + p95) for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsDurationHistogram' }

    AnalyticsPromotionFunnel:
      description: Promotion funnel (per-stage count + conversion) for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsPromotionFunnel' }

    AnalyticsStatusDistribution:
      description: Event count per status (all 8, zero-filled) for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsStatusDistribution' }

    AnalyticsHeatmap:
      description: Day-of-week × hour deployment counts for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsHeatmap' }

    AnalyticsTopDeployers:
      description: Deployment counts grouped by actor (descending) for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsTopDeployers' }

    AnalyticsIncidents:
      description: Worst-first restoration incidents for the resolved window.
      headers:
        ETag:
          schema: { type: string }
          description: Weak ETag of the aggregate. Clients SHOULD send `If-None-Match` to short-circuit unchanged reads.
      content:
        application/json:
          schema: { $ref: '#/components/schemas/AnalyticsIncidents' }

  schemas:

    # ── Deployment domain ────────────────────────────────────────────────────

    Status:
      type: string
      enum: [pending, queued, waiting, in-progress, success, failure, cancelled, rejected]
      description: |
        State carried by this individual event row.
        - `pending`     — created but not yet started; no work has begun.
        - `queued`      — accepted and waiting in line to start; not yet running.
        - `waiting`     — blocked awaiting an approval or wait-timer gate; has not started.
        - `in-progress` — actively running.
        - `success`     — completed successfully.
        - `failure`     — ran and ended unsuccessfully.
        - `cancelled`   — stopped before completing; did not finish.
        - `rejected`    — denied at an approval gate; never ran.

    DeploymentEventIngest:
      type: object
      description: |
        Request body for `POST /api/deployments`.

        Every accepted body appends exactly one row. Multiple rows MAY share
        `deployment_id`, `version`, `sha`, etc. — the store does not deduplicate.
      required: [deployment_id, service, environment, status, happened_at]
      additionalProperties: false
      properties:
        deployment_id:
          type: string
          minLength: 1
          maxLength: 200
          description: |
            Emitter-supplied **correlation key** that groups all events of one
            logical deployment (the `in-progress` row, the terminal `success`
            or `failure` row, any retried rows). NOT unique per row, NOT an
            idempotency key. Conventionally prefixed by tool family
            (`gh-…`, `ado-…`, `jenkins-…`) so that values do not collide
            across CI/CD tools. Also serves as the handle that
            `parent_deployments` of downstream events points at.
          examples: [gh-9491-1]
        service:
          type: string
          minLength: 1
          maxLength: 128
          description: Service / component identifier.
          examples: [service-a]
        environment:
          type: string
          minLength: 1
          maxLength: 64
          description: Environment identifier (e.g. `dev`, `qa`, `prod`).
          examples: [prod]
        version:
          type: string
          maxLength: 50
          description: Free-form version string (semver, SHA, build number). FR §"Responsive sizing" caps at 50 chars.
          examples: ["1.4.2"]
        status:
          $ref: '#/components/schemas/Status'
        happened_at:
          type: string
          format: date-time
          description: |
            **Emitter-supplied** UTC wall-clock at which the deployment
            transitioned to `status` on the CI/CD side (SAD §11). RFC 3339;
            MUST end with `Z` or a `±HH:MM` offset. NOT the moment the dashboard
            persisted the row — that is an internal concern and is not exposed
            on the wire. All read surfaces order by this value.
          examples: ["2026-05-28T10:14:02Z"]
        run_url:
          type: string
          format: uri
          maxLength: 2048
          examples: ["https://github.com/acme/repo/actions/runs/9491"]
        run_number:
          type: string
          maxLength: 128
          description: CI/CD run identifier. Format is tool-specific (e.g. a GitHub Actions run id, an ADO build number). Used for display and correlation only; never parsed.
          examples: ["26602875882"]
        actor:
          type: string
          maxLength: 128
          examples: [alice]
        ref:
          type: string
          maxLength: 256
          description: Opaque git ref (branch, tag, `PR-42`, `refs/…`). Not parsed.
          examples: [refs/heads/main]
        sha:
          type: string
          maxLength: 128
          description: Opaque commit SHA. Not parsed for length or hex shape.
          examples: ["3f2c1a9"]
        parent_deployments:
          type: array
          maxItems: 32
          uniqueItems: true
          items:
            type: string
            description: A `deployment_id` correlation key naming an upstream logical deployment.
          description: |
            Explicit upstream-deployment correlation keys for client-side DAG
            rendering (Swimlanes view, "explicit parent" predicate per
            FRONTEND_REQUIREMENTS.md). Each entry is a `deployment_id` value
            (a logical-deployment handle, not a row id). The server stores
            values verbatim and does NOT resolve them against existing rows
            at ingest time — forward references are allowed (the named
            upstream may be appended later, or never). The Swimlanes view
            resolves these strings to event rows at render time.
          examples:
            - [gh-9482-1]

    DeploymentEvent:
      allOf:
        - $ref: '#/components/schemas/DeploymentEventIngest'
        - type: object
          required: [id]
          properties:
            id:
              type: string
              format: uuid
              description: Server-assigned surrogate row identifier (event-row primary key).
            progress_reporter:
              type: string
              maxLength: 128
              description: |
                Attribution captured from the `X-Progress-Reporter` header at ingest
                (e.g. `dashboard-fetcher/github-actions`, `demo-driver/demo`). Returned on
                `GET /api/deployments` items and the `GET /api/events/stream` `deployment`
                frame. Absent when the header was not supplied.
              examples: [dashboard-fetcher/github-actions]

    DeploymentEventPage:
      type: object
      required: [items]
      properties:
        items:
          type: array
          items: { $ref: '#/components/schemas/DeploymentEvent' }
        next_cursor:
          type: string
          nullable: true
          description: Opaque cursor for the next page. `null` when the page is final.

    Matrix:
      type: object
      required: [environments, rows, generated_at]
      properties:
        generated_at:
          type: string
          format: date-time
          description: Snapshot timestamp.
        environments:
          type: array
          description: Stable column order — sorted environment identifiers present in the snapshot.
          items: { type: string }
        rows:
          type: array
          items: { $ref: '#/components/schemas/MatrixRow' }

    MatrixRow:
      type: object
      required: [service, slots]
      properties:
        service:
          type: string
        slots:
          type: object
          additionalProperties:
            $ref: '#/components/schemas/MatrixSlot'
          description: Map of `environment` → slot. Missing keys mean "never deployed here".

    MatrixSlot:
      type: object
      required: [current]
      properties:
        current:
          $ref: '#/components/schemas/DeploymentEvent'
          description: |
            The most recent **effective** deployment for this slot — `status` is
            one of `in-progress` | `success` | `failure`. This is the live
            deployment; the slot's primary status renders from `current.status`.
        last_successful:
          $ref: '#/components/schemas/DeploymentEvent'
          description: |
            Omitted when `current.status == success` (then `current` IS the last
            successful). Also omitted when no successful deployment has ever
            occurred in this slot.
        prev_failed:
          type: boolean
          description: |
            Present and `true` when `current.status == in-progress` AND the most recent
            terminal deployment (status `success` | `failure`) strictly older than `current`
            in this slot was a **failure**. Omitted (or `false`) when `current` is not
            in-progress, when no prior terminal deployment exists, or when the prior
            terminal deployment succeeded. Distinguishes S3/S6 (prev failed) from S2/S5
            (prev succeeded or no history).
        next:
          $ref: '#/components/schemas/DeploymentEvent'
          description: |
            The latest **non-effective** deployment beyond the live one —
            `status` is one of `pending` | `queued` | `waiting` | `cancelled` |
            `rejected`. Present only when such an event exists AND its
            `happened_at` is more recent than `current.happened_at` (i.e. a newer
            deployment has been observed that has not yet become effective, or
            ended without running). Omitted otherwise. Mirrors a deployment and
            carries the same fields as `current` (`status`, `version`, `run_url`,
            `run_number`, `sha`, `ref`, `actor`, `happened_at`, …). The frontend
            renders this as a secondary "next" badge, never as the slot's primary
            state.

    # ── Analytics (issue #299) ────────────────────────────────────────────────

    AnalyticsWindow:
      type: object
      required: [days, from, to, retention_days, clamped]
      description: |
        The window the server actually resolved for an analytics aggregate. Every
        `/api/analytics/*` response embeds this so the SPA can label the period and
        flag when retention narrowed the request.
      properties:
        days:
          type: integer
          minimum: 1
          description: Effective span in days (after the retention clamp).
          examples: [7]
        from:
          type: string
          format: date-time
          description: Inclusive UTC start of the resolved window (`to − days`).
          examples: ["2026-06-03T00:00:00Z"]
        to:
          type: string
          format: date-time
          description: Exclusive UTC end of the resolved window (server "now").
          examples: ["2026-06-10T00:00:00Z"]
        retention_days:
          type: integer
          description: Effective `HISTORY_RETENTION_DAYS` the request was clamped against.
          examples: [365]
        clamped:
          type: boolean
          description: |
            `true` when the requested `window` exceeded `retention_days` and the
            server narrowed it (then `days == retention_days`); `false` otherwise.

    AnalyticsClassification:
      type: string
      enum: [elite, high, medium, low]
      description: DORA performance band for a KPI value (Elite / High / Medium / Low).

    AnalyticsKpi:
      type: object
      required: [value, unit, classification, trend_delta, sparkline, approximated]
      description: One DORA key — value, unit, performance band, trend, and sparkline.
      properties:
        value:
          type: number
          description: |
            The metric value in `unit`. `null` only when the window holds no
            qualifying events (e.g. no incidents for `time_to_restore`).
          nullable: true
          examples: [9.2]
        unit:
          type: string
          description: |
            Unit of `value`. One of: `per_day` (deployment_frequency),
            `hours` (lead_time), `ratio` (change_failure_rate, 0–1),
            `minutes` (time_to_restore).
          examples: [per_day]
        classification:
          $ref: '#/components/schemas/AnalyticsClassification'
        trend_delta:
          type: number
          nullable: true
          description: |
            Signed fractional change of `value` versus the **prior half-window**
            (e.g. `0.12` = +12 %, `-0.08` = −8 %). `null` when the prior half-window
            has no comparable value. Direction is raw — the SPA decides whether
            up is good (frequency) or bad (CFR/MTTR/lead-time).
          examples: [0.12]
        sparkline:
          type: array
          description: Per-day series of the metric across the resolved window (oldest → newest), for the inline sparkline.
          items: { type: number }
          examples:
            - [7, 9, 6, 11, 8, 10, 12]
        approximated:
          type: boolean
          description: |
            `true` only on `lead_time` — it is APPROXIMATED from `parent_deployments`
            promotion chains reaching the production (terminal ladder) stage, not
            measured commit→prod. `false` on the other three keys.

    AnalyticsDora:
      type: object
      required: [window, deployment_frequency, lead_time, change_failure_rate, time_to_restore]
      description: DORA Four Keys KPI band — backs the Analytics KPI band.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        deployment_frequency:
          $ref: '#/components/schemas/AnalyticsKpi'
          description: "Deployments per day (`unit: per_day`). `approximated: false`."
        lead_time:
          $ref: '#/components/schemas/AnalyticsKpi'
          description: |
            Approximated lead time for changes (`unit: hours`, `approximated: true`).
            Derived from `parent_deployments` promotion chains reaching the production
            (terminal ladder) stage — NOT true commit→prod lead time, which is not in
            the event log.
        change_failure_rate:
          $ref: '#/components/schemas/AnalyticsKpi'
          description: "Failed deployments ÷ total terminal deployments (`unit: ratio`, 0–1). `approximated: false`."
        time_to_restore:
          $ref: '#/components/schemas/AnalyticsKpi'
          description: "Median time to restore after a failure (`unit: minutes`). `approximated: false`."

    AnalyticsFrequencyBucket:
      type: object
      required: [date, success, failure]
      description: One UTC-day bucket of terminal deployment counts.
      properties:
        date:
          type: string
          format: date
          description: UTC calendar day (`YYYY-MM-DD`).
          examples: ["2026-06-09"]
        success:
          type: integer
          minimum: 0
          description: Count of `success` events on this day.
        failure:
          type: integer
          minimum: 0
          description: Count of `failure` events on this day.

    AnalyticsFrequency:
      type: object
      required: [window, buckets]
      description: Per-day success/failure deployment counts — backs the frequency-over-time bars.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        buckets:
          type: array
          description: One bucket per UTC day in the window, oldest → newest.
          items: { $ref: '#/components/schemas/AnalyticsFrequencyBucket' }

    AnalyticsCfrBucket:
      type: object
      required: [date, rate]
      description: One UTC-day bucket of the change-failure rate.
      properties:
        date:
          type: string
          format: date
          examples: ["2026-06-09"]
        rate:
          type: number
          minimum: 0
          maximum: 1
          description: "`failure / (success + failure)` for the day; `0` when the day has no terminal events."
          examples: [0.18]

    AnalyticsChangeFailureRate:
      type: object
      required: [window, elite_threshold, buckets]
      description: Per-day CFR trend + the elite reference line — backs the CFR trend chart.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        elite_threshold:
          type: number
          description: Constant DORA "elite" CFR reference (`0.15`) for the dashed line.
          examples: [0.15]
        buckets:
          type: array
          description: One bucket per UTC day in the window, oldest → newest.
          items: { $ref: '#/components/schemas/AnalyticsCfrBucket' }

    AnalyticsDurationBin:
      type: object
      required: [label, lower_minutes, upper_minutes, count]
      description: One histogram bin of deployment durations (minutes).
      properties:
        label:
          type: string
          description: Display label for the bin (e.g. `0-10`, `60+`).
          examples: ["0-10"]
        lower_minutes:
          type: integer
          minimum: 0
          description: Inclusive lower bound of the bin, in minutes.
          examples: [0]
        upper_minutes:
          type: integer
          nullable: true
          description: Exclusive upper bound in minutes; `null` for the open-ended top bin.
          examples: [10]
        count:
          type: integer
          minimum: 0
          description: Number of deployments whose duration falls in this bin.

    AnalyticsDurationHistogram:
      type: object
      required: [window, bins, p50_minutes, p95_minutes]
      description: Duration distribution (bins + percentiles) — backs the duration histogram.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        bins:
          type: array
          description: Contiguous duration bins, ascending.
          items: { $ref: '#/components/schemas/AnalyticsDurationBin' }
        p50_minutes:
          type: number
          nullable: true
          description: Median duration in minutes; `null` when no measurable deployment exists.
          examples: [22]
        p95_minutes:
          type: number
          nullable: true
          description: 95th-percentile duration in minutes; `null` when no measurable deployment exists.
          examples: [78]

    AnalyticsFunnelStage:
      type: object
      required: [environment, count, conversion]
      description: One stage of the promotion funnel.
      properties:
        environment:
          type: string
          description: "A configured promotion-ladder stage (default ladder: `dev`, `staging`, `qa`, `preprod`, `prod`)."
          examples: [staging]
        count:
          type: integer
          minimum: 0
          description: Distinct logical deployments that reached this stage in the window.
        conversion:
          type: number
          nullable: true
          minimum: 0
          maximum: 1
          description: |
            Fraction of this stage's deployments that also reached the next stage
            (`next.count / this.count`). `null` for the terminal (last ladder)
            stage, and `null` when `count` is `0`.
          examples: [0.82]

    AnalyticsPromotionFunnel:
      type: object
      required: [window, stages]
      description: The ordered promotion-ladder stages (operator-configured; default `dev` first → `prod` last) — backs the funnel chart.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        stages:
          type: array
          description: The ordered promotion-ladder stages (operator-configured; default `dev` first → `prod` last).
          items: { $ref: '#/components/schemas/AnalyticsFunnelStage' }

    AnalyticsStatusCount:
      type: object
      required: [status, count]
      description: Event count for one status value.
      properties:
        status: { $ref: '#/components/schemas/Status' }
        count:
          type: integer
          minimum: 0

    AnalyticsStatusDistribution:
      type: object
      required: [window, statuses]
      description: Event count per status (all 8, zero-filled) — backs the status donut.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        statuses:
          type: array
          minItems: 8
          maxItems: 8
          description: All eight `Status` values, in enum order, zero-filled.
          items: { $ref: '#/components/schemas/AnalyticsStatusCount' }

    AnalyticsHeatmapCell:
      type: object
      required: [day_of_week, hour, count]
      description: One populated day-of-week × hour cell.
      properties:
        day_of_week:
          type: integer
          minimum: 0
          maximum: 6
          description: UTC day of week — `0` = Sunday … `6` = Saturday.
        hour:
          type: integer
          minimum: 0
          maximum: 23
          description: UTC hour of day (`0`–`23`).
        count:
          type: integer
          minimum: 0

    AnalyticsHeatmap:
      type: object
      required: [window, cells]
      description: |
        Day-of-week × hour deployment counts — backs the heatmap. Sparse: only
        non-zero `(day_of_week, hour)` cells are returned; absent pairs are `0`.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        cells:
          type: array
          items: { $ref: '#/components/schemas/AnalyticsHeatmapCell' }

    AnalyticsDeployer:
      type: object
      required: [actor, count]
      description: One actor's deployment count.
      properties:
        actor:
          type: string
          description: "`actor` from the event, or `unknown` when absent."
          examples: [alice]
        count:
          type: integer
          minimum: 0

    AnalyticsTopDeployers:
      type: object
      required: [window, deployers]
      description: Deployment counts grouped by actor (descending) — backs the leaderboard.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        deployers:
          type: array
          description: Highest counts first, capped at the `limit` query param.
          items: { $ref: '#/components/schemas/AnalyticsDeployer' }

    AnalyticsSeverity:
      type: string
      enum: [low, medium, high, critical]
      description: |
        Severity band of a restoration incident, derived from `duration_minutes`:
        longer outages → higher severity. An unresolved incident
        (`duration_minutes: null`) is `critical`.

    AnalyticsIncident:
      type: object
      required: [service, environment, failed_at, restored_at, duration_minutes, severity]
      description: One restoration incident — a failure later restored in the same slot.
      properties:
        service:
          type: string
          examples: [checkout]
        environment:
          type: string
          examples: [prod]
        failed_at:
          type: string
          format: date-time
          description: "`happened_at` of the `failure` event that opened the incident."
          examples: ["2026-06-08T14:02:00Z"]
        restored_at:
          type: string
          format: date-time
          nullable: true
          description: "`happened_at` of the restoring `success` event; `null` when still unresolved in the window."
          examples: ["2026-06-08T14:48:00Z"]
        duration_minutes:
          type: number
          nullable: true
          description: "`restored_at − failed_at` in minutes; `null` when unresolved (sorts first)."
          examples: [46]
        severity:
          $ref: '#/components/schemas/AnalyticsSeverity'

    AnalyticsIncidents:
      type: object
      required: [window, incidents]
      description: Worst-first restoration incidents — backs the MTTR / incidents list.
      properties:
        window: { $ref: '#/components/schemas/AnalyticsWindow' }
        incidents:
          type: array
          description: Longest `duration_minutes` first; unresolved incidents (`null`) sort first. Capped at the `limit` query param.
          items: { $ref: '#/components/schemas/AnalyticsIncident' }

    # ── Fetcher ──────────────────────────────────────────────────────────────

    FetcherState:
      type: object
      required: [adapter, cursor, updated_at]
      properties:
        adapter:    { type: string }
        cursor:     { type: string, description: Opaque cursor blob. }
        updated_at: { type: string, format: date-time }

    FetcherStateUpsert:
      type: object
      required: [cursor]
      additionalProperties: false
      properties:
        cursor:
          type: string
          maxLength: 8192
          description: Opaque cursor blob; max 8 KiB.

    # ── Control plane ─────────────────────────────────────────────────────────

    ResetState:
      type: string
      enum: [idle, draining, resetting]
      description: |
        Phase of the system-state reset choreography.
        - `idle`      — no reset in flight; `POST /api/control/reset` accepted.
        - `draining`  — `reset-initiated` emitted; awaiting component acks or ack timeout.
        - `resetting` — `reset-started` emitted; data being cleared, ingest briefly `503`.
        `draining` and `resetting` both reject a new reset with `409`.

    ResetAccepted:
      type: object
      required: [correlation_id, state]
      description: Acknowledgement body for an accepted `POST /api/control/reset` (`202`).
      properties:
        correlation_id:
          type: string
          format: uuid
          description: |
            The process id for this reset — the `id` of the emitted `reset-initiated`
            control event (where `correlation_id == id`). Every downstream command frame
            (`reset-started` / `reset-completed`) and component event (`reset-ack`,
            post-reset `status`) carries this same value as `correlation_id`, so the
            whole saga is filterable end-to-end by this one key.
        state:
          $ref: '#/components/schemas/ResetState'
          description: Phase entered by accepting the reset (always `draining`).
        accepted_at:
          type: string
          format: date-time
          description: Server timestamp when the reset was accepted.

    ComponentState:
      type: string
      enum: [running, idle, paused, error]
      description: |
        Operational state self-reported by a component.
        - `running` — actively doing work (polling, emitting events, …).
        - `idle`    — connected and healthy; no active task.
        - `paused`  — work intentionally suspended; component is connected but not processing.
        - `error`   — unhealthy; `detail` SHOULD carry the reason.

    ControlStreamEvent:
      type: object
      required: [id, type, correlation_id, component, occurred_at]
      description: |
        Data payload of a named event frame on `GET /api/control/stream`.
        Also the row type stored in `control_stream_events` (enables `Last-Event-ID` replay).

        `type` is an open string — components MUST treat unknown values as no-ops
        (forward-compatibility). Known values: `reset-initiated` | `reset-started` |
        `reset-completed`.

        `id` and `correlation_id` are independent fields: `id` is this event's own
        row PK / SSE cursor; `correlation_id` is the process key shared across the
        saga. They coincide in value only on `reset-initiated` (the origin).
      properties:
        id:
          type: string
          format: uuid
          description: |
            Server-assigned UUIDv7 for this control event. Also the `id:` field of
            the SSE frame — used as `Last-Event-ID` by reconnecting components.
            Always unique per event; never equal to another event's `id`.
        type:
          type: string
          description: |
            Event type. Known values (reset choreography):
            - `reset-initiated` — reset accepted; components drain and ack `paused`.
            - `reset-started`   — acks in or timeout elapsed; data being cleared.
            - `reset-completed` — data cleared, gates released; components recover.
            Treat unknown values as no-op.
        correlation_id:
          type: string
          format: uuid
          description: |
            The process id this event belongs to — present on EVERY command frame.
            On `reset-initiated` (the origin) it equals this event's own `id`; on
            `reset-started` / `reset-completed` it is the initiating `reset-initiated`
            event's `id`. The same value is carried as `correlation_id` by the
            components' `reset-ack` and post-reset `status` events, so the whole
            saga shares one filterable key.
        component:
          type: string
          description: |
            Target component id, or `"*"` meaning all components.
            Components SHOULD ignore events where `component` neither matches
            their own id nor equals `"*"`.
        occurred_at:
          type: string
          format: date-time
          description: Server-assigned UTC timestamp when the event was generated.

    ComponentEvent:
      type: object
      required: [event_type, state, occurred_at]
      additionalProperties: false
      description: |
        Request body for `POST /api/control/events`.

        **Component identity is NOT in the body.** It is carried by the required
        `X-Component-Id` header and stored as `component_id` on the persisted
        row by the server.

        `payload` is an opaque JSON object for component-specific observability data
        (counters, cursors, error details). The API stores it verbatim; never parsed.
      properties:
        event_type:
          type: string
          description: |
            Event category. Known values (not exhaustive — new types are additive):
            - `status`    — a state transition or periodic status report.
            - `heartbeat` — periodic liveness ping; no state change.
            - `error`     — component encountered an error; `state` will be `error`.
            - `reset-ack` — drain-complete ack for a `reset-initiated` event; sent with
              `state: paused` and the **required** `X-Correlation-Id` header set to the
              initiating event id (the ack-gate key). No `reset_id` body field.
        state:
          $ref: '#/components/schemas/ComponentState'
        detail:
          type: string
          maxLength: 512
          description: Human-readable description of the current activity or error. Optional on heartbeats.
        occurred_at:
          type: string
          format: date-time
          description: |
            **Component-supplied** UTC wall-clock at which this event occurred.
            Mirrors the `happened_at` semantics of deployment events — the
            component is the authority on when the state transition happened.
        payload:
          type: object
          additionalProperties: true
          description: |
            Opaque key-value pairs for observability (e.g. `last_poll_at`,
            `events_this_hour`, `adapter`, `http_status`, `control_event_id`).
            Serialised size ≤ 8 KiB — else `413`.

    ComponentEventRecord:
      type: object
      required: [id, component_id, event_type, state, occurred_at, received_at]
      description: |
        Component event record emitted on the `component` frame of
        `GET /api/control/events/stream`. Retained for 2 hours.
        `component_id` is the value of the `X-Component-Id` header from the
        originating POST request.
      properties:
        id:
          type: string
          format: uuid
          description: Server-assigned UUIDv7 row identifier (time-ordered; doubles as sort key).
        component_id:
          type: string
          description: Value of the `X-Component-Id` header from the originating POST.
        correlation_id:
          type: string
          nullable: true
          maxLength: 128
          description: |
            Value of the `X-Correlation-Id` header from the originating POST, or
            `null` if the header was absent. The process key correlating this event
            with a control command — for reset, the `reset-initiated` event id.
            Distinct from `id` (this row's own PK / SSE cursor). **This is the reset
            ack-gate key:** on a `reset-ack` the header is REQUIRED and the orchestrator
            gates on this value (see `POST /api/control/events`). `null` on posts that
            carry no control command (e.g. periodic `heartbeat`).
        event_type:
          type: string
        state:
          $ref: '#/components/schemas/ComponentState'
        detail:
          type: string
          nullable: true
        occurred_at:
          type: string
          format: date-time
          description: Component-supplied timestamp (as submitted).
        received_at:
          type: string
          format: date-time
          description: Server-assigned timestamp when the row was inserted.
        payload:
          type: object
          nullable: true

    # ── Ops ───────────────────────────────────────────────────────────────────

    Readiness:
      type: object
      required: [status, checks]
      properties:
        status: { type: string, enum: [ready, degraded] }
        checks:
          type: object
          additionalProperties: { type: string, enum: [ok, fail] }

    Problem:
      description: RFC 9457 problem detail.
      type: object
      required: [type, title, status]
      properties:
        type:     { type: string, format: uri, default: about:blank }
        title:    { type: string }
        status:   { type: integer }
        detail:   { type: string }
        instance: { type: string, format: uri }
        errors:
          type: array
          description: Optional per-field validation errors (422).
          items:
            type: object
            required: [pointer, message]
            properties:
              pointer: { type: string, description: JSON Pointer into the request body. }
              message: { type: string }

  examples:

    IngestWithParents:
      summary: Promotion with an explicit parent (default Swimlanes use case).
      description: |
        In-progress event for the prod promotion of `service-a 1.4.2`, pointing
        at the dev deployment it inherits from. The same `deployment_id`
        (`gh-9491-1`) will reappear on a follow-up event carrying the terminal
        `success` / `failure` status — both rows persist (append-only).
      value:
        deployment_id: gh-9491-1
        service: service-a
        environment: prod
        version: 1.4.2
        status: in-progress
        happened_at: "2026-05-28T10:14:02Z"
        run_url: https://github.com/acme/repo/actions/runs/9491
        run_number: "9491"
        actor: alice
        ref: refs/heads/main
        sha: 3f2c1a9
        parent_deployments: [gh-9482-1]

    IngestMinimal:
      summary: Terminal success event (no upstream linkage).
      description: |
        Smallest reasonable body. `parent_deployments` is intentionally an empty
        array to show the field shape on the wire — sending `[]` and omitting
        the field entirely are semantically equivalent.
      value:
        deployment_id: gh-9482-1
        service: service-a
        environment: dev
        version: 1.4.2
        status: success
        happened_at: "2026-05-28T09:42:17Z"
        run_url: https://github.com/acme/repo/actions/runs/9482
        run_number: "9482"
        actor: alice
        ref: refs/heads/main
        sha: 3f2c1a9
        parent_deployments: []

    IngestInProgressNoParents:
      summary: Root deployment with no upstream (the dev row from the chain above).
      description: |
        First row of `gh-9482-1`'s lifecycle. A second POST with the same
        `deployment_id` and `status: success` will be appended when the run
        finishes; that second row's `happened_at` will be later than this one's.
        Together the two rows form one logical deployment.
      value:
        deployment_id: gh-9482-1
        service: service-a
        environment: dev
        version: 1.4.2
        status: in-progress
        happened_at: "2026-05-28T09:40:55Z"
        run_url: https://github.com/acme/repo/actions/runs/9482
        run_number: "9482"
        actor: alice
        ref: refs/heads/main
        sha: 3f2c1a9
