BabyAI tool surface
Accelerando is a deterministic Worker. Agents talk to it through a tight tool catalog. The agents are external — the BabyAI MoE (Qwen3-Coder + GLM + Llama) running in the BabyAI Playground, or Anthropic models via the Playground's fallback. Accelerando doesn't ship its own model.
The catalog
124 tools across 25 entities. GET /tools returns the OpenAI shape; GET /tools?format=anthropic returns the input_schema shape. The same schema, two renames. Each tool maps to a specific intent ("convert quote to invoice", "submit assessment response", "scan for hygiene flags") rather than a generic CRUD verb.
The catalog grew with the modules; the *discipline* didn't change. Tight by intent, server-side math, atomic transactions for parent+children writes, every tool's audit_log row content-free.
Sales arc — quote → invoice → payment
list_customers / read_customer / create_customer / edit_customer / delete_customer / import_customerslist_quotes / read_quote / create_quote_with_lines / list_quote_line_items / update_quote_status / edit_quote / delete_quoteconvert_quote_to_invoice(quote_id)— atomic; copies customer + lines + total + currency + tax; sets bidirectionalInvoice.quote_id/Quote.invoice_id; refuses if quote already linkedlist_invoices / read_invoice / create_invoice_with_lines / edit_invoice_lines / update_invoice_statuslist_tickets / create_ticket / update_ticket_status / edit_ticket / delete_ticket / import_tickets
Purchasing arc
list_vendors / create_vendor / edit_vendor / read_vendor / delete_vendorlist_bills / create_bill / update_bill_status / read_bill / delete_bill / edit_bill / import_billslist_recurring_invoices / create_recurring_invoice / edit / delete / run_nowlist_recurring_bills / create_recurring_bill / edit / delete / run_now
Books — double-entry, period close, aging
create_journal_entry / import_journal_entries / list / read / edit / reverse_journal_entrylist_accounts / create_account / edit / deletelist_bank_transactions / import / reconcile_bank_transaction / unreconcile / bank_reconciliationclose_period / list_period_closes / reopen_periodar_aging / ap_aging / total_outstanding / revenue_by_month / trial_balance / profit_and_loss
Workflows — the SPC unit
start_workflow / complete_workflow / list_workflows / read_workflow— composition_id flows through every tool call via theX-Workflow-Idheaderlist_workflow_templates / create / edit / delete / read_workflow_template / start_workflow_from_template— predefined workflow shapes ("Monthly close", "AR collection run") with expected_tools list for progress tracking
Compliance LMS — knowledge as a state
list_compliance_domains / create_compliance_domain— HIPAA, OSHA, etc. with passing + critical thresholdslist_questions / create_question— scenario-based MCQ with the teaching-momentexplanationfieldlist_learners / create_learner— per-learner streak + points + overall_compliancelist_compliance_scores— per-learner per-domain knowledge scorestart_daily_assessment(learner_id)→ returns DailyAssessment + 3 picked questionssubmit_assessment_response→ returns is_correct + explanation + new_score (EMA-updated server-side)complete_daily_assessment→ stamps streak + recomputes overall_compliancecompliance_score_history— per-day score series for the SPC sparklinedepartment_compliance— per-dept × per-domain rollup
Quality — QMS + PI-CoE unified
list_defects / read_defect / create_defect / update_defect_status— lane-agnostic (source ∈ human/system/ai)list_root_causes / create_root_cause— 5-Why / Ishikawa / fault-treelist_corrective_actions / create_corrective_action / complete_corrective_action— 90-day effectiveness check scheduled by defaultverify_corrective_action_effectiveness— effective closes the defect; ineffective re-opens it as 'recurring' (systemic)list_process_metrics / create_process_metric / record_metric_reading— sustainability score erodes at decay_per_weekmetric_history— Nelson Rules (>3σ, 9 same-side, 6 monotonic) annotated on every readingoperator_scorecard— per-actor three-lane composite (AI tool SPC + defects + CAPA effectiveness + LMS compliance)
Legal — eDiscovery + Hygiene
list_legal_matters / create_legal_matter— litigation / regulatory / investigation / contract / employment / iptrigger_legal_hold— flips matter.litigation_hold + starts the 24h notification clocklist_legal_holds / release_legal_holdadd_hold_custodian / list_hold_custodians / ack_hold_custodianscan_for_hygiene_flags(text, persist?)— deterministic regex over the six README patterns; returns matches + risk score; persist=true logs a LegalHygieneFlag per matchlist_hygiene_flags / update_hygiene_flag_status
Attachments + OIE + audit
list_attachments / delete_attachment— R2-backed file uploadslist_findings / delete_finding / delete_findings_older_than— OIE nightly findingswho_changed(entity, id)— git-log slice over the GitHub repo for one record
Tight by design
Each tool maps to a specific intent. Schemas use additionalProperties: false everywhere — there's no quiet "unknown field, I'll ignore it" path that lets the model invent fields the executor doesn't understand.
tenant_id is required by every schema and auto-injected by the auth layer if the body omits it. The model can pass it explicitly (some open-weights models are more reliable that way) or skip it; the result is the same.
Server-side computation
Things small models are bad at — the Worker handles them:
- Arithmetic (
qty × unit_priceincreate_invoice_with_lines) - Date stamping (
update_invoice_status: "paid"stampspaid_date) - EMA score updates (
submit_assessment_responserecomputes the compliance score in one call) - Sustainability decay (
record_metric_readingrecomputes per-week decay + on-target bonus) - Pattern matching (
scan_for_hygiene_flagsruns the six legal-hygiene regex patterns) - Atomic multi-record writes (
convert_quote_to_invoicewrites invoice + N lines + back-link in one commit)
The agent asks for the state change. The Worker handles the bookkeeping.
Dual format, one source
export const INVOICE_TOOLS: readonly ToolDefinition[] = [/* ... */];
export function toAnthropicTools(tools = INVOICE_TOOLS) {
return tools.map((t) => ({
name: t.name,
description: t.description,
input_schema: t.parameters, // <-- single rename
}));
}
OpenAI calls the schema field parameters. Anthropic calls it input_schema. Same JSON Schema underneath. The Worker carries one canonical definition and serves whichever format the caller asks for.
Execution + the {ok,value}/{ok,error} envelope
POST /tools/list_invoices
Authorization: Bearer ldLia1i1sefsZmTRuft0XdxDfF0lTH5UZAQ53Gtz4lg
Content-Type: application/json
X-Workflow-Id: wf_abc123 <-- optional; ties this call to a workflow run
{ "status": "sent" }
→ 200 { "ok": true, "value": [ { id, customer_name, total, ... }, ... ] }
Or on failure:
→ 400 { "ok": false, "error": "customer_id must be a non-empty string" }
X-Workflow-Id (when present) is propagated to the audit_log row's composition_id — that's how SPC stats can group runs by workflow as doc-14 prescribes.
SPC telemetry — every call writes an audit row
Every tool call emits one audit_log row: tool name, success bit, schema-valid bit, latency_ms, content-free output hash (djb2 over key-sorted JSON), actor, error message on failure, composition_id when a workflow is active. Payloads aren't retained — only the structural fingerprint. That's the SPC substrate; see the SPC Dashboard page.
The minimal-as-feature take
Tightness is per-tool, not per-catalog. The catalog grew from 15 to 124 across eight modules without sacrificing the discipline that makes small models reliable on each individual tool. A bigger surface means more places the model could pick a near-but-wrong tool — but the alternative was building one-tool-per-CRUD-action and the model would pick the wrong fields instead. Specific intents beat generic verbs on small models even at scale.
When a Qwen3-30B can't drive a 30-generic-tool catalog, accelerando's 124-specific-tool catalog still works — because every one of the 124 has additionalProperties: false and a description that names the *intent*, not the row operation.