Documentation Index
Fetch the complete documentation index at: https://afk.arpan.sh/llms.txt
Use this file to discover all available pages before exploring further.
What this snippet demonstrates
Runaway agent loops are the most common source of unexpected API costs. AFK provides two defense layers: cost budgets that kill runs when spending exceeds a threshold, and telemetry events that let you observe cost in real time. This snippet shows how to configure both.
Setting cost budgets
The simplest defense is a hard cost ceiling on every agent:
from afk.agents import Agent, FailSafeConfig
agent = Agent(
name="budget-agent",
model="gpt-4.1-mini",
instructions="Be helpful and concise.",
fail_safe=FailSafeConfig(
max_total_cost_usd=0.50, # Hard cost ceiling
max_llm_calls=30, # Secondary defense: limit API calls
max_steps=15, # Tertiary defense: limit reasoning steps
max_wall_time_s=120.0, # Quaternary defense: wall-clock timeout
),
)
When the estimated cost exceeds max_total_cost_usd, the runner terminates the run with a degraded state and returns the best partial result.
Monitoring cost from results
Every AgentResult includes token counts and cost estimates:
from afk.core import Runner
runner = Runner()
result = runner.run_sync(agent, user_message="Analyze this dataset...")
# Access usage statistics
usage = result.usage_aggregate
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Total tokens: {usage.total_tokens}")
print(f"Estimated cost: ${result.total_cost_usd or 0:.4f}")
print(f"Tool calls: {len(result.tool_executions)}")
Real-time cost monitoring via streaming
For long-running agents, monitor cost during execution:
import asyncio
from afk.agents import Agent, FailSafeConfig
from afk.core import Runner
agent = Agent(
name="analyst",
model="gpt-4.1",
instructions="Provide detailed analysis.",
fail_safe=FailSafeConfig(
max_total_cost_usd=1.00,
max_steps=20,
),
)
async def monitor_cost():
runner = Runner()
handle = await runner.run_stream(
agent, user_message="Compare Python async patterns for service code"
)
step_count = 0
async for event in handle:
match event.type:
case "text_delta":
print(event.text_delta, end="", flush=True)
case "step_started" if event.step is not None:
step_count = event.step
case "tool_completed":
print(f"\n [STEP] Step {step_count} | Tool: {event.tool_name}")
case "completed" if event.result is not None:
usage = event.result.usage_aggregate
print(f"\n\n--- Cost Summary ---")
print(f"State: {event.result.state}")
print(f"Tokens: {usage.total_tokens}")
print(f"Cost: ${event.result.total_cost_usd or 0:.4f}")
print(f"Tools: {len(event.result.tool_executions)}")
asyncio.run(monitor_cost())
Cost-aware batch processing
When running multiple agents in a batch, track cumulative cost:
async def batch_process(items: list[str], budget_usd: float):
"""Process items with a shared cost budget."""
runner = Runner()
cumulative_cost = 0.0
results = []
for item in items:
if cumulative_cost >= budget_usd:
print(f"[Limit] Budget exhausted at ${cumulative_cost:.4f}")
break
# Set per-item budget as remaining budget
remaining = budget_usd - cumulative_cost
agent = Agent(
name="batch-processor",
model="gpt-4.1-mini",
instructions="Process the item concisely.",
fail_safe=FailSafeConfig(
max_total_cost_usd=min(remaining, 0.10), # Per-item cap
max_steps=5,
),
)
result = await runner.run(agent, user_message=item)
item_cost = result.total_cost_usd or 0.0
cumulative_cost += item_cost
results.append(result)
print(f" [OK] {item[:40]}... (${item_cost:.4f})")
print(f"\nTotal: {len(results)} items, ${cumulative_cost:.4f}")
return results
Operating recommendations
- Always set
max_total_cost_usd — even generous limits prevent runaway costs
- Layer defenses — combine cost limits with
max_llm_calls, max_steps, and max_wall_time_s
- Use telemetry for dashboards — export metrics to monitor cost trends over time
- Set per-item budgets in batches — prevent one expensive item from consuming the entire budget
- Choose models by task — use smaller models for routine work and reserve larger models for requests that need them
What to read next