SPC Dashboard
The doctrine from 4G - Web Dev Stack/14_BABYAI_LORAS.md made operational. Every tool call writes one row to audit_log. The dashboard reads from that one table to answer "is this stable, and how stable is it."
The substrate
audit_log is owned by src/audit-log.ts, columns:
id INTEGER PRIMARY KEY
tenant_id TEXT NOT NULL
tool_name TEXT NOT NULL
actor TEXT -- bearer label / agent id
composition_id TEXT -- workflow tag, from X-Workflow-Id header
success INTEGER (0/1)
schema_valid INTEGER (0/1)
latency_ms INTEGER
output_hash TEXT -- djb2 over stable-stringified result
error TEXT -- on success=0
human_verdict TEXT -- accept / reject / null (signal-4)
verdict_by TEXT -- who stamped the verdict
verdict_at TEXT
created_at TEXT
Content-free by construction — payloads aren't retained, just the structural fingerprint. The doc-14 inference contract: a CNC machine doesn't remember the part, only the craft improves over many parts.
The emitter wraps InvoiceToolExecutor.execute:
audit?: {
backend: ProjectionBackend;
tenantIdFallback?: string;
compositionId?: string; // from X-Workflow-Id
}
Telemetry is best-effort — a sink failure never blocks the tool call.
The four signals
Per doc-14:
- Schema valid — args parsed cleanly (free, purely structural)
- Tool returned 2xx —
{ok: true}(free, structural; tool itself signals) - Workflow completed — composition_id-grouped success rate (free if a workflow is active)
- Human verdict — reviewer stamps accept/reject (needs human input)
Signals 1–3 are floor; signal 4 is ceiling. Structural signals govern *how fast* sampling can ramp down; semantic verdict governs *how low* it can drop. samplingRecommendation() in src/ui.ts respects that ordering — no drop below 20% sampling without ≥5 reviews + ≥95% acceptance.
The Stats tab
Ops → Activity → Tool stats. Per-tool row carries:
- Count + success rate (pill: green ≥99% / amber 95-99% / red below)
- Accept rate (signal-4) — green ≥99% / amber 95-99% / red below; "—" when no reviews yet
- Inline sparkline (newest-60 latency) with σ control bands behind the line:
- ±1σ band: green - ±2σ band: amber - ±3σ band: red - Mean: thin grey line - 3σ UCL: dashed red
- p50 / p95 / Avg / Max
- Anomaly count — total of Western Electric rule fires (1, 2, 3, 4) over the window; tooltip shows per-rule breakdown
- Sampling recommendation pill (100% / 50% / 20% / 5% / exception)
- Drift badge (composite of success-rate / p95-spike / WE-rule fires)
Click a row → drill-down modal with 720×220 control chart + the last 500 invocations.
Western Electric rules
src/audit-log.ts exports annotateWesternElectric:
- Rule 1 — any point beyond ±3σ. Critical; dot renders red.
- Rule 2 — 2 of 3 consecutive points in same-side zone B (2σ-3σ) or beyond. Warning.
- Rule 3 — 4 of 5 consecutive points in same-side zone C (1σ-2σ) or beyond. Warning.
- Rule 4 — 8 consecutive points on the same side of the mean. Warning.
Two-sided detection — a tool suddenly running fast is a drift signal too (could be no-op responses).
The by-workflow tab
Same dashboard, grouped by composition_id instead of tool_name. Useful when you care about "did the Monthly close run complete cleanly" not "is the trial_balance tool stable in general." Joined to Workflow.name for human-readable rows.
Signal-4 capture
✓/✗ buttons on every audit_log row in the Tool calls tab + the workflow drawer's recent-activity table. POST to /audit-log/:id/verdict with {verdict: "accept" | "reject"} — the row stamps with the reviewer's bearer label + ISO timestamp + result. The audit_log row is immutable except for the verdict triple (intentional — review is a *separate* event from the original call).
Per-tool and per-workflow accept_count / reject_count / accept_rate aggregate from these stamps.
Why this matters
The doc-14 SPC trust-budget doctrine: AI in production is fine if its action space is closed by a deterministic QC program AND its outputs earn trust quantitatively, on charts the customer's compliance team already knows how to read.
Accelerando's tool surface is the closed action space. This dashboard is where the trust is earned. The same vocabulary the compliance officer uses for sterilization processes and drug-dispensing accuracy now applies to AI tool calls.
The sales hook: *"we automate gradually, you set the trust threshold."*