Mental Model

The three loops

Decision Loop — what the LLM does

The Decision Loop is the model’s turn. On each step:

The runner sends the conversation history + tool schemas to the LLM
The LLM decides whether to respond with text (done) or request tool calls (continue)
If tool calls are requested, they flow to the Execution Loop

You control this with: agent instructions, model choice, tool availability

# The Decision Loop is shaped by these fields
agent = Agent(
    instructions="...",    # ← What the model knows
    model="gpt-5.5", # ← How the model thinks
    tools=[...],           # ← What the model can do
)

Execution Loop — what happens when a tool is called

The Execution Loop handles every tool call:

Validate arguments against the Pydantic schema
Policy gate — allow, deny, or defer for human approval
Execute the handler (with hooks and middleware)
Sanitize the output (truncate, strip injection vectors)
Return the result to the Decision Loop

You control this with: tool definitions, policy rules, sandbox profiles

# The Execution Loop is shaped by these
@tool(args_model=QueryArgs, name="query_db", description="Run a database query.")
def query_db(args: QueryArgs) -> dict:
    ...

policy = PolicyEngine(rules=[
    PolicyRule(
        condition=lambda e: e.tool_name == "query_db",
        action="request_approval",
    ),
])

Assurance Loop — what keeps things safe

The Assurance Loop runs continuously, enforcing limits on both other loops:

Step count — stops the agent after N iterations
Tool call count — prevents excessive tool usage
Cost budget — stops if estimated cost exceeds the limit
Wall time — hard timeout on the entire run
Failure classification — retryable, terminal, or non-fatal

You control this with: FailSafeConfig

agent = Agent(
    ...,
    fail_safe=FailSafeConfig(
        max_steps=10,
        max_tool_calls=5,
        max_total_cost_usd=0.25,
        max_wall_time_s=30.0,
    ),
)

Think in contracts

AFK is built on a contract-first design. Every interaction between components is defined by typed data structures:

Boundary	Contract	What flows
Runner → LLM	`LLMRequest` / `LLMResponse`	Messages, tool schemas, model responses
Runner → Tool	`ToolCall` / `ToolResult`	Validated arguments, execution output
Runner → Subagent	`AgentInvocationRequest` / `AgentInvocationResponse`	Delegate task and receive result
Runner → Memory	Checkpoint records	Conversation state for resume/replay
Runner → Telemetry	`AgentRunEvent`, `RunMetrics`	Spans, metrics, audit trail

Contracts are Pydantic models. This means every boundary is validated at runtime — malformed data causes clear errors, not silent bugs. When you see a validation error, it’s AFK telling you exactly where the contract was violated.

What success looks like

A mature AFK implementation exhibits these properties:

Every tool has a Pydantic model — no untyped arguments

Every run has cost limits — max_total_cost_usd is always set

Policy gates protect mutations — dangerous actions require approval

Evals cover core behaviors — regression tests catch prompt drift

Observability is on from day one — even if it’s just the console exporter

Failures are classified — the system knows what to retry and what to abort

Start Here

Core Building Blocks

LLM Runtime

Production

Integrations

Mental Model

The three loops

Think in contracts

Decision tree: how complex should my system be?

What success looks like

​The three loops

​Think in contracts

​Decision tree: how complex should my system be?

​What success looks like

The three loops

Think in contracts

Decision tree: how complex should my system be?

What success looks like