Skip to main content

Documentation Index

Fetch the complete documentation index at: https://afk.arpan.sh/llms.txt

Use this file to discover all available pages before exploring further.

What this snippet demonstrates

Runaway agent loops are the most common source of unexpected API costs. AFK provides two defense layers: cost budgets that kill runs when spending exceeds a threshold, and telemetry events that let you observe cost in real time. This snippet shows how to configure both.

Setting cost budgets

The simplest defense is a hard cost ceiling on every agent:
from afk.agents import Agent, FailSafeConfig

agent = Agent(
    name="budget-agent",
    model="gpt-4.1-mini",
    instructions="Be helpful and concise.",
    fail_safe=FailSafeConfig(
        max_total_cost_usd=0.50,        # Hard cost ceiling
        max_llm_calls=30,               # Secondary defense: limit API calls
        max_steps=15,                    # Tertiary defense: limit reasoning steps
        max_wall_time_s=120.0,          # Quaternary defense: wall-clock timeout
    ),
)
When the estimated cost exceeds max_total_cost_usd, the runner terminates the run with a degraded state and returns the best partial result.

Monitoring cost from results

Every AgentResult includes token counts and cost estimates:
from afk.core import Runner

runner = Runner()
result = runner.run_sync(agent, user_message="Analyze this dataset...")

# Access usage statistics
usage = result.usage_aggregate
print(f"Input tokens:  {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Total tokens:  {usage.total_tokens}")
print(f"Estimated cost: ${result.total_cost_usd or 0:.4f}")
print(f"Tool calls:    {len(result.tool_executions)}")

Real-time cost monitoring via streaming

For long-running agents, monitor cost during execution:
import asyncio
from afk.agents import Agent, FailSafeConfig
from afk.core import Runner

agent = Agent(
    name="analyst",
    model="gpt-4.1",
    instructions="Provide detailed analysis.",
    fail_safe=FailSafeConfig(
        max_total_cost_usd=1.00,
        max_steps=20,
    ),
)


async def monitor_cost():
    runner = Runner()
    handle = await runner.run_stream(
        agent, user_message="Compare Python async patterns for service code"
    )

    step_count = 0
    async for event in handle:
        match event.type:
            case "text_delta":
                print(event.text_delta, end="", flush=True)
            case "step_started" if event.step is not None:
                step_count = event.step
            case "tool_completed":
                print(f"\n  [STEP] Step {step_count} | Tool: {event.tool_name}")
            case "completed" if event.result is not None:
                usage = event.result.usage_aggregate
                print(f"\n\n--- Cost Summary ---")
                print(f"State:    {event.result.state}")
                print(f"Tokens:   {usage.total_tokens}")
                print(f"Cost:     ${event.result.total_cost_usd or 0:.4f}")
                print(f"Tools:    {len(event.result.tool_executions)}")

asyncio.run(monitor_cost())

Cost-aware batch processing

When running multiple agents in a batch, track cumulative cost:
async def batch_process(items: list[str], budget_usd: float):
    """Process items with a shared cost budget."""
    runner = Runner()
    cumulative_cost = 0.0
    results = []

    for item in items:
        if cumulative_cost >= budget_usd:
            print(f"[Limit] Budget exhausted at ${cumulative_cost:.4f}")
            break

        # Set per-item budget as remaining budget
        remaining = budget_usd - cumulative_cost
        agent = Agent(
            name="batch-processor",
            model="gpt-4.1-mini",
            instructions="Process the item concisely.",
            fail_safe=FailSafeConfig(
                max_total_cost_usd=min(remaining, 0.10),  # Per-item cap
                max_steps=5,
            ),
        )

        result = await runner.run(agent, user_message=item)
        item_cost = result.total_cost_usd or 0.0
        cumulative_cost += item_cost
        results.append(result)

        print(f"  [OK] {item[:40]}... (${item_cost:.4f})")

    print(f"\nTotal: {len(results)} items, ${cumulative_cost:.4f}")
    return results

Operating recommendations

  1. Always set max_total_cost_usd — even generous limits prevent runaway costs
  2. Layer defenses — combine cost limits with max_llm_calls, max_steps, and max_wall_time_s
  3. Use telemetry for dashboards — export metrics to monitor cost trends over time
  4. Set per-item budgets in batches — prevent one expensive item from consuming the entire budget
  5. Choose models by task — use smaller models for routine work and reserve larger models for requests that need them