Skip to main content

Documentation Index

Fetch the complete documentation index at: https://afk.arpan.sh/llms.txt

Use this file to discover all available pages before exploring further.

Use AFK incrementally. Start with one narrow agent, add tools only when the task needs action, and add production controls before real users depend on the system.

Phase 1: narrow agent

Build one agent that solves one task without tools.
from afk.agents import Agent
from afk.core import Runner

agent = Agent(
    name="ticket-classifier",
    model="gpt-4.1-mini",
    instructions="""
    Classify the support ticket as exactly one of:
    billing, technical, account, or other.
    Return only the category.
    """,
)

result = Runner().run_sync(
    agent,
    user_message="I cannot log into my account.",
)
print(result.final_text)
Move on when the agent is reliable on real examples for the narrow task.

Phase 2: tools and safety

Add typed tools and hard limits.
from pydantic import BaseModel

from afk.agents import Agent, FailSafeConfig
from afk.core import Runner, RunnerConfig
from afk.tools import tool


class TicketArgs(BaseModel):
    ticket_id: str


@tool(args_model=TicketArgs, name="lookup_ticket", description="Fetch ticket details.")
def lookup_ticket(args: TicketArgs) -> dict:
    return {"ticket_id": args.ticket_id, "status": "open", "priority": "high"}


agent = Agent(
    name="ticket-agent",
    model="gpt-4.1-mini",
    instructions="Look up tickets before answering. Never modify data.",
    tools=[lookup_ticket],
    fail_safe=FailSafeConfig(
        max_steps=8,
        max_tool_calls=4,
        max_total_cost_usd=0.10,
    ),
)

runner = Runner(
    config=RunnerConfig(
        sanitize_tool_output=True,
        tool_output_max_chars=8_000,
    )
)
Move on when all tools have Pydantic argument models, cost limits are set, and mutating operations are gated or absent.

Phase 3: production controls

Before shipping, add the controls that make failures diagnosable:
  • evals for expected behavior;
  • telemetry for latency, usage, errors, and tool calls;
  • persistent memory or queues if runs must survive process restarts;
  • security controls for sandboxing, secret scope, and tool output limits;
  • troubleshooting docs for on-call and operators.
Useful pages:

Evals

Test agent behavior and enforce budgets.

Observability

Export metrics, traces, and run records.

Security Model

Understand policy gates, sandboxing, and secret isolation.

Task Queues

Run agents through distributed workers.

Phase 4: release discipline

Once the agent is in production:
  • run evals in CI before prompt, tool, or model changes;
  • compare behavior across releases with golden traces where appropriate;
  • monitor cost per run and failure rate;
  • version system prompts in files;
  • document operator actions for approval, resume, and rollback flows.

Common mistakes

MistakeBetter approach
Starting with a multi-agent systemStart with one narrow agent and split only when roles are genuinely different
Writing untyped toolsUse Pydantic argument models for every tool
Treating prompts as the only safety layerAdd FailSafeConfig, policy gates, sandboxing, and evals
Hiding internals in public docsKeep builder docs behavior-first and maintainer docs internals-first
Shipping without run recordsExport telemetry and inspect AgentResult fields

Next steps

Read Building with AI for production design patterns, then Troubleshooting for common operational failures.