accelerando.wiki ↗ app ↗ github

BabyAI tool surface

Accelerando is a deterministic Worker. Agents talk to it through a tight tool catalog. The agents are external — the BabyAI MoE (Qwen3-Coder + GLM + Llama) running in the BabyAI Playground, or Anthropic models via the Playground's fallback. Accelerando doesn't ship its own model.

The catalog

124 tools across 25 entities. GET /tools returns the OpenAI shape; GET /tools?format=anthropic returns the input_schema shape. The same schema, two renames. Each tool maps to a specific intent ("convert quote to invoice", "submit assessment response", "scan for hygiene flags") rather than a generic CRUD verb.

The catalog grew with the modules; the *discipline* didn't change. Tight by intent, server-side math, atomic transactions for parent+children writes, every tool's audit_log row content-free.

Sales arc — quote → invoice → payment

Purchasing arc

Books — double-entry, period close, aging

Workflows — the SPC unit

Compliance LMS — knowledge as a state

Quality — QMS + PI-CoE unified

Legal — eDiscovery + Hygiene

Attachments + OIE + audit

Tight by design

Each tool maps to a specific intent. Schemas use additionalProperties: false everywhere — there's no quiet "unknown field, I'll ignore it" path that lets the model invent fields the executor doesn't understand.

tenant_id is required by every schema and auto-injected by the auth layer if the body omits it. The model can pass it explicitly (some open-weights models are more reliable that way) or skip it; the result is the same.

Server-side computation

Things small models are bad at — the Worker handles them:

The agent asks for the state change. The Worker handles the bookkeeping.

Dual format, one source

export const INVOICE_TOOLS: readonly ToolDefinition[] = [/* ... */];

export function toAnthropicTools(tools = INVOICE_TOOLS) {
  return tools.map((t) => ({
    name: t.name,
    description: t.description,
    input_schema: t.parameters,  // <-- single rename
  }));
}

OpenAI calls the schema field parameters. Anthropic calls it input_schema. Same JSON Schema underneath. The Worker carries one canonical definition and serves whichever format the caller asks for.

Execution + the {ok,value}/{ok,error} envelope

POST /tools/list_invoices
Authorization: Bearer ldLia1i1sefsZmTRuft0XdxDfF0lTH5UZAQ53Gtz4lg
Content-Type: application/json
X-Workflow-Id: wf_abc123    <-- optional; ties this call to a workflow run

{ "status": "sent" }

→ 200 { "ok": true, "value": [ { id, customer_name, total, ... }, ... ] }

Or on failure:

→ 400 { "ok": false, "error": "customer_id must be a non-empty string" }

X-Workflow-Id (when present) is propagated to the audit_log row's composition_id — that's how SPC stats can group runs by workflow as doc-14 prescribes.

SPC telemetry — every call writes an audit row

Every tool call emits one audit_log row: tool name, success bit, schema-valid bit, latency_ms, content-free output hash (djb2 over key-sorted JSON), actor, error message on failure, composition_id when a workflow is active. Payloads aren't retained — only the structural fingerprint. That's the SPC substrate; see the SPC Dashboard page.

The minimal-as-feature take

Tightness is per-tool, not per-catalog. The catalog grew from 15 to 124 across eight modules without sacrificing the discipline that makes small models reliable on each individual tool. A bigger surface means more places the model could pick a near-but-wrong tool — but the alternative was building one-tool-per-CRUD-action and the model would pick the wrong fields instead. Specific intents beat generic verbs on small models even at scale.

When a Qwen3-30B can't drive a 30-generic-tool catalog, accelerando's 124-specific-tool catalog still works — because every one of the 124 has additionalProperties: false and a description that names the *intent*, not the row operation.