accelerando.wiki ↗ app ↗ github

SPC Dashboard

The doctrine from 4G - Web Dev Stack/14_BABYAI_LORAS.md made operational. Every tool call writes one row to audit_log. The dashboard reads from that one table to answer "is this stable, and how stable is it."

The substrate

audit_log is owned by src/audit-log.ts, columns:

id              INTEGER PRIMARY KEY
tenant_id       TEXT NOT NULL
tool_name       TEXT NOT NULL
actor           TEXT          -- bearer label / agent id
composition_id  TEXT          -- workflow tag, from X-Workflow-Id header
success         INTEGER (0/1)
schema_valid    INTEGER (0/1)
latency_ms      INTEGER
output_hash     TEXT          -- djb2 over stable-stringified result
error           TEXT          -- on success=0
human_verdict   TEXT          -- accept / reject / null (signal-4)
verdict_by      TEXT          -- who stamped the verdict
verdict_at      TEXT
created_at      TEXT

Content-free by construction — payloads aren't retained, just the structural fingerprint. The doc-14 inference contract: a CNC machine doesn't remember the part, only the craft improves over many parts.

The emitter wraps InvoiceToolExecutor.execute:

audit?: {
  backend: ProjectionBackend;
  tenantIdFallback?: string;
  compositionId?: string;  // from X-Workflow-Id
}

Telemetry is best-effort — a sink failure never blocks the tool call.

The four signals

Per doc-14:

  1. Schema valid — args parsed cleanly (free, purely structural)
  2. Tool returned 2xx{ok: true} (free, structural; tool itself signals)
  3. Workflow completed — composition_id-grouped success rate (free if a workflow is active)
  4. Human verdict — reviewer stamps accept/reject (needs human input)

Signals 1–3 are floor; signal 4 is ceiling. Structural signals govern *how fast* sampling can ramp down; semantic verdict governs *how low* it can drop. samplingRecommendation() in src/ui.ts respects that ordering — no drop below 20% sampling without ≥5 reviews + ≥95% acceptance.

The Stats tab

Ops → Activity → Tool stats. Per-tool row carries:

- ±1σ band: green - ±2σ band: amber - ±3σ band: red - Mean: thin grey line - 3σ UCL: dashed red

Click a row → drill-down modal with 720×220 control chart + the last 500 invocations.

Western Electric rules

src/audit-log.ts exports annotateWesternElectric:

Two-sided detection — a tool suddenly running fast is a drift signal too (could be no-op responses).

The by-workflow tab

Same dashboard, grouped by composition_id instead of tool_name. Useful when you care about "did the Monthly close run complete cleanly" not "is the trial_balance tool stable in general." Joined to Workflow.name for human-readable rows.

Signal-4 capture

✓/✗ buttons on every audit_log row in the Tool calls tab + the workflow drawer's recent-activity table. POST to /audit-log/:id/verdict with {verdict: "accept" | "reject"} — the row stamps with the reviewer's bearer label + ISO timestamp + result. The audit_log row is immutable except for the verdict triple (intentional — review is a *separate* event from the original call).

Per-tool and per-workflow accept_count / reject_count / accept_rate aggregate from these stamps.

Why this matters

The doc-14 SPC trust-budget doctrine: AI in production is fine if its action space is closed by a deterministic QC program AND its outputs earn trust quantitatively, on charts the customer's compliance team already knows how to read.

Accelerando's tool surface is the closed action space. This dashboard is where the trust is earned. The same vocabulary the compliance officer uses for sterilization processes and drug-dispensing accuracy now applies to AI tool calls.

The sales hook: *"we automate gradually, you set the trust threshold."*