11: Cost Monitoring

What this snippet demonstrates

Runaway agent loops are the most common source of unexpected API costs. AFK provides two defense layers: cost budgets that kill runs when spending exceeds a threshold, and telemetry events that let you observe cost in real time. This snippet shows how to configure both.

Setting cost budgets

The simplest defense is a hard cost ceiling on every agent:

from afk.agents import Agent, FailSafeConfig

agent = Agent(
    name="budget-agent",
    model="gpt-5.5",
    instructions="Be helpful and concise.",
    fail_safe=FailSafeConfig(
        max_total_cost_usd=0.50,        # Hard cost ceiling
        max_llm_calls=30,               # Secondary defense: limit API calls
        max_steps=15,                    # Tertiary defense: limit reasoning steps
        max_wall_time_s=120.0,          # Quaternary defense: wall-clock timeout
    ),
)

When the estimated cost exceeds max_total_cost_usd, the runner terminates the run with a degraded state and returns the best partial result.

Monitoring cost from results

Every AgentResult includes token counts and cost estimates:

from afk.core import Runner

runner = Runner()
result = runner.run_sync(agent, user_message="Analyze this dataset...")

# Access usage statistics
usage = result.usage_aggregate
print(f"Input tokens:  {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Total tokens:  {usage.total_tokens}")
print(f"Estimated cost: ${result.total_cost_usd or 0:.4f}")
print(f"Tool calls:    {len(result.tool_executions)}")

Real-time cost monitoring via streaming

For long-running agents, monitor cost during execution:

import asyncio
from afk.agents import Agent, FailSafeConfig
from afk.core import Runner

agent = Agent(
    name="analyst",
    model="gpt-5.5",
    instructions="Provide detailed analysis.",
    fail_safe=FailSafeConfig(
        max_total_cost_usd=1.00,
        max_steps=20,
    ),
)


async def monitor_cost():
    runner = Runner()
    handle = await runner.run_stream(
        agent, user_message="Compare Python async patterns for service code"
    )

    step_count = 0
    async for event in handle:
        match event.type:
            case "text_delta":
                print(event.text_delta, end="", flush=True)
            case "step_started" if event.step is not None:
                step_count = event.step
            case "tool_completed":
                print(f"\n  [STEP] Step {step_count} | Tool: {event.tool_name}")
            case "completed" if event.result is not None:
                usage = event.result.usage_aggregate
                print(f"\n\n--- Cost Summary ---")
                print(f"State:    {event.result.state}")
                print(f"Tokens:   {usage.total_tokens}")
                print(f"Cost:     ${event.result.total_cost_usd or 0:.4f}")
                print(f"Tools:    {len(event.result.tool_executions)}")

asyncio.run(monitor_cost())

Cost-aware batch processing

When running multiple agents in a batch, track cumulative cost:

async def batch_process(items: list[str], budget_usd: float):
    """Process items with a shared cost budget."""
    runner = Runner()
    cumulative_cost = 0.0
    results = []

    for item in items:
        if cumulative_cost >= budget_usd:
            print(f"[Limit] Budget exhausted at ${cumulative_cost:.4f}")
            break

        # Set per-item budget as remaining budget
        remaining = budget_usd - cumulative_cost
        agent = Agent(
            name="batch-processor",
            model="gpt-5.5",
            instructions="Process the item concisely.",
            fail_safe=FailSafeConfig(
                max_total_cost_usd=min(remaining, 0.10),  # Per-item cap
                max_steps=5,
            ),
        )

        result = await runner.run(agent, user_message=item)
        item_cost = result.total_cost_usd or 0.0
        cumulative_cost += item_cost
        results.append(result)

        print(f"  [OK] {item[:40]}... (${item_cost:.4f})")

    print(f"\nTotal: {len(results)} items, ${cumulative_cost:.4f}")
    return results

Operating recommendations

Always set max_total_cost_usd — even generous limits prevent runaway costs
Layer defenses — combine cost limits with max_llm_calls, max_steps, and max_wall_time_s
Use telemetry for dashboards — export metrics to monitor cost trends over time
Set per-item budgets in batches — prevent one expensive item from consuming the entire budget
Choose models by task — use smaller models for routine work and reserve larger models for requests that need them

Runnable Snippets

11: Cost Monitoring

What this snippet demonstrates

Setting cost budgets

Monitoring cost from results

Real-time cost monitoring via streaming

Cost-aware batch processing

Operating recommendations

What to read next

​What this snippet demonstrates

​Setting cost budgets

​Monitoring cost from results

​Real-time cost monitoring via streaming

​Cost-aware batch processing

​Operating recommendations

​What to read next

What this snippet demonstrates

Setting cost budgets

Monitoring cost from results

Real-time cost monitoring via streaming

Cost-aware batch processing

Operating recommendations

What to read next