The three loops
Decision Loop — what the LLM does
Decision Loop — what the LLM does
The Decision Loop is the model’s turn. On each step:
- The runner sends the conversation history + tool schemas to the LLM
- The LLM decides whether to respond with text (done) or request tool calls (continue)
- If tool calls are requested, they flow to the Execution Loop
Execution Loop — what happens when a tool is called
Execution Loop — what happens when a tool is called
The Execution Loop handles every tool call:
- Validate arguments against the Pydantic schema
- Policy gate — allow, deny, or defer for human approval
- Execute the handler (with hooks and middleware)
- Sanitize the output (truncate, strip injection vectors)
- Return the result to the Decision Loop
Assurance Loop — what keeps things safe
Assurance Loop — what keeps things safe
The Assurance Loop runs continuously, enforcing limits on both other loops:
- Step count — stops the agent after N iterations
- Tool call count — prevents excessive tool usage
- Cost budget — stops if estimated cost exceeds the limit
- Wall time — hard timeout on the entire run
- Failure classification — retryable, terminal, or non-fatal
FailSafeConfigThink in contracts
AFK is built on a contract-first design. Every interaction between components is defined by typed data structures:| Boundary | Contract | What flows |
|---|---|---|
| Runner → LLM | LLMRequest / LLMResponse | Messages, tool schemas, model responses |
| Runner → Tool | ToolCall / ToolResult | Validated arguments, execution output |
| Runner → Subagent | AgentInvocationRequest / AgentInvocationResponse | Delegate task and receive result |
| Runner → Memory | Checkpoint records | Conversation state for resume/replay |
| Runner → Telemetry | AgentRunEvent, RunMetrics | Spans, metrics, audit trail |
Decision tree: how complex should my system be?
Not sure what to build? Start at the top and follow the path that matches your use case.[!TIP] Start at Level 1. Only move up when you have clear evidence that your current level isn’t enough. Each level adds complexity that you need to manage and test.
What success looks like
A mature AFK implementation exhibits these properties:- Every tool has a Pydantic model — no untyped arguments
- Every run has cost limits —
max_total_cost_usdis always set - Policy gates protect mutations — dangerous actions require approval
- Evals cover core behaviors — regression tests catch prompt drift
- Observability is on from day one — even if it’s just the console exporter
- Failures are classified — the system knows what to retry and what to abort