Skip to main content
This guide covers the engineering patterns for building production-quality AI features with AFK. It’s organized around three phases: starting narrow, adding capabilities, and scaling safely.

Start narrow, iterate fast

The most common mistake is building for complexity you don’t have yet. Start with the simplest version that solves the problem, then add capabilities based on real evidence.
[!TIP] Test at every step. Don’t add tools before the base prompt works. Don’t add subagents before single-agent tools work. Each layer should be proven before adding the next.

Common patterns

Categorize input into predefined labels. No tools needed.
agent = Agent(
    name="classifier",
    model="gpt-5.2-mini",
    instructions="""
    Classify the support ticket into exactly one category:
    billing, technical, account, feature-request, other.
    Output only the category name, nothing else.
    """,
)
Tips:
  • Constrain output format explicitly in the prompt
  • Test with a diverse set of inputs
  • Add evals for each category

Anti-patterns

These are the most common mistakes. Avoiding them will save you significant debugging time.
Anti-patternProblemFix
No cost limitsRunaway agent loops spend $100s in minutesAlways set max_total_cost_usd
Vague instructionsModel produces inconsistent outputBe specific: “Output only the category name”
Too many toolsModel gets confused choosing between toolsKeep ≤ 5 tools per agent. Split into subagents.
Mixing orchestration and executionRunner logic leaks into tool handlersTools should be pure functions. No runner imports.
Skipping evalsPrompt changes break behavior silentlyRun evals in CI on every PR
Untyped tool argumentsMissing validation, hard-to-debug errorsAlways use Pydantic models
Not classifying failuresRetryable errors treated as terminal (or vice versa)Return clear error types from tools
Giant system promptsToken waste, instruction driftSplit into skills. Use templates.

Production readiness checklist

AreaRequirementStatus
SafetyFailSafeConfig with cost, step, and time limits
SafetyPolicy rules for all mutating tools
ObservabilityTelemetry exporter configured (OTEL recommended)
ObservabilityAlerts on error rate and latency
TestingEval suite with ≥ 5 cases running in CI
TestingGolden traces captured for regression detection
MemoryPersistent backend for multi-turn conversations
MemoryThread compaction configured
SecuritySecrets in environment variables, not code
SecuritySandbox profiles for code execution tools

Next steps