BabyAI tool surface

Accelerando is a deterministic Worker. Agents talk to it through a tight tool catalog. The agents are external — the BabyAI MoE (Qwen3-Coder + GLM + Llama) running in the BabyAI Playground, or Anthropic models via the Playground's fallback. Accelerando doesn't ship its own model.

The catalog

124 tools across 25 entities. GET /tools returns the OpenAI shape; GET /tools?format=anthropic returns the input_schema shape. The same schema, two renames. Each tool maps to a specific intent ("convert quote to invoice", "submit assessment response", "scan for hygiene flags") rather than a generic CRUD verb.

The catalog grew with the modules; the *discipline* didn't change. Tight by intent, server-side math, atomic transactions for parent+children writes, every tool's audit_log row content-free.

Sales arc — quote → invoice → payment

list_customers / read_customer / create_customer / edit_customer / delete_customer / import_customers
list_quotes / read_quote / create_quote_with_lines / list_quote_line_items / update_quote_status / edit_quote / delete_quote
convert_quote_to_invoice(quote_id) — atomic; copies customer + lines + total + currency + tax; sets bidirectional Invoice.quote_id / Quote.invoice_id; refuses if quote already linked
list_invoices / read_invoice / create_invoice_with_lines / edit_invoice_lines / update_invoice_status
list_tickets / create_ticket / update_ticket_status / edit_ticket / delete_ticket / import_tickets

Purchasing arc

list_vendors / create_vendor / edit_vendor / read_vendor / delete_vendor
list_bills / create_bill / update_bill_status / read_bill / delete_bill / edit_bill / import_bills
list_recurring_invoices / create_recurring_invoice / edit / delete / run_now
list_recurring_bills / create_recurring_bill / edit / delete / run_now

Books — double-entry, period close, aging

create_journal_entry / import_journal_entries / list / read / edit / reverse_journal_entry
list_accounts / create_account / edit / delete
list_bank_transactions / import / reconcile_bank_transaction / unreconcile / bank_reconciliation
close_period / list_period_closes / reopen_period
ar_aging / ap_aging / total_outstanding / revenue_by_month / trial_balance / profit_and_loss

Workflows — the SPC unit

start_workflow / complete_workflow / list_workflows / read_workflow — composition_id flows through every tool call via the X-Workflow-Id header
list_workflow_templates / create / edit / delete / read_workflow_template / start_workflow_from_template — predefined workflow shapes ("Monthly close", "AR collection run") with expected_tools list for progress tracking

Compliance LMS — knowledge as a state

list_compliance_domains / create_compliance_domain — HIPAA, OSHA, etc. with passing + critical thresholds
list_questions / create_question — scenario-based MCQ with the teaching-moment explanation field
list_learners / create_learner — per-learner streak + points + overall_compliance
list_compliance_scores — per-learner per-domain knowledge score
start_daily_assessment(learner_id) → returns DailyAssessment + 3 picked questions
submit_assessment_response → returns is_correct + explanation + new_score (EMA-updated server-side)
complete_daily_assessment → stamps streak + recomputes overall_compliance
compliance_score_history — per-day score series for the SPC sparkline
department_compliance — per-dept × per-domain rollup

Quality — QMS + PI-CoE unified

list_defects / read_defect / create_defect / update_defect_status — lane-agnostic (source ∈ human/system/ai)
list_root_causes / create_root_cause — 5-Why / Ishikawa / fault-tree
list_corrective_actions / create_corrective_action / complete_corrective_action — 90-day effectiveness check scheduled by default
verify_corrective_action_effectiveness — effective closes the defect; ineffective re-opens it as 'recurring' (systemic)
list_process_metrics / create_process_metric / record_metric_reading — sustainability score erodes at decay_per_week
metric_history — Nelson Rules (>3σ, 9 same-side, 6 monotonic) annotated on every reading
operator_scorecard — per-actor three-lane composite (AI tool SPC + defects + CAPA effectiveness + LMS compliance)

Legal — eDiscovery + Hygiene

list_legal_matters / create_legal_matter — litigation / regulatory / investigation / contract / employment / ip
trigger_legal_hold — flips matter.litigation_hold + starts the 24h notification clock
list_legal_holds / release_legal_hold
add_hold_custodian / list_hold_custodians / ack_hold_custodian
scan_for_hygiene_flags(text, persist?) — deterministic regex over the six README patterns; returns matches + risk score; persist=true logs a LegalHygieneFlag per match
list_hygiene_flags / update_hygiene_flag_status

Attachments + OIE + audit

list_attachments / delete_attachment — R2-backed file uploads
list_findings / delete_finding / delete_findings_older_than — OIE nightly findings
who_changed(entity, id) — git-log slice over the GitHub repo for one record

Tight by design

Each tool maps to a specific intent. Schemas use additionalProperties: false everywhere — there's no quiet "unknown field, I'll ignore it" path that lets the model invent fields the executor doesn't understand.

tenant_id is required by every schema and auto-injected by the auth layer if the body omits it. The model can pass it explicitly (some open-weights models are more reliable that way) or skip it; the result is the same.

Server-side computation

Things small models are bad at — the Worker handles them:

Arithmetic (qty × unit_price in create_invoice_with_lines)
Date stamping (update_invoice_status: "paid" stamps paid_date)
EMA score updates (submit_assessment_response recomputes the compliance score in one call)
Sustainability decay (record_metric_reading recomputes per-week decay + on-target bonus)
Pattern matching (scan_for_hygiene_flags runs the six legal-hygiene regex patterns)
Atomic multi-record writes (convert_quote_to_invoice writes invoice + N lines + back-link in one commit)

The agent asks for the state change. The Worker handles the bookkeeping.

Dual format, one source

export const INVOICE_TOOLS: readonly ToolDefinition[] = [/* ... */];

export function toAnthropicTools(tools = INVOICE_TOOLS) {
  return tools.map((t) => ({
    name: t.name,
    description: t.description,
    input_schema: t.parameters,  // <-- single rename
  }));
}

OpenAI calls the schema field parameters. Anthropic calls it input_schema. Same JSON Schema underneath. The Worker carries one canonical definition and serves whichever format the caller asks for.

Execution + the `{ok,value}`/`{ok,error}` envelope

POST /tools/list_invoices
Authorization: Bearer ldLia1i1sefsZmTRuft0XdxDfF0lTH5UZAQ53Gtz4lg
Content-Type: application/json
X-Workflow-Id: wf_abc123    <-- optional; ties this call to a workflow run

{ "status": "sent" }

→ 200 { "ok": true, "value": [ { id, customer_name, total, ... }, ... ] }

Or on failure:

→ 400 { "ok": false, "error": "customer_id must be a non-empty string" }

X-Workflow-Id (when present) is propagated to the audit_log row's composition_id — that's how SPC stats can group runs by workflow as doc-14 prescribes.

SPC telemetry — every call writes an audit row

Every tool call emits one audit_log row: tool name, success bit, schema-valid bit, latency_ms, content-free output hash (djb2 over key-sorted JSON), actor, error message on failure, composition_id when a workflow is active. Payloads aren't retained — only the structural fingerprint. That's the SPC substrate; see the SPC Dashboard page.

The minimal-as-feature take

Tightness is per-tool, not per-catalog. The catalog grew from 15 to 124 across eight modules without sacrificing the discipline that makes small models reliable on each individual tool. A bigger surface means more places the model could pick a near-but-wrong tool — but the alternative was building one-tool-per-CRUD-action and the model would pick the wrong fields instead. Specific intents beat generic verbs on small models even at scale.

When a Qwen3-30B can't drive a 30-generic-tool catalog, accelerando's 124-specific-tool catalog still works — because every one of the 124 has additionalProperties: false and a description that names the *intent*, not the row operation.