Skip to main content

Documentation Index

Fetch the complete documentation index at: https://afk.arpan.sh/llms.txt

Use this file to discover all available pages before exploring further.

The LLM runtime includes built-in policies for handling transient failures, managing costs, and protecting against provider outages. Configure them via the builder profile or individual policy settings.

Policy pipeline

Every LLM request passes through this policy chain:

Built-in profiles

Use profile() to apply a curated set of policies:
from afk.llms import LLMBuilder

# Development: no retry, no caching, fast failures
dev = LLMBuilder().provider("openai").model("gpt-4.1-mini").profile("development").build()

# Production: retry, circuit breaker, rate limiting, caching
prod = LLMBuilder().provider("openai").model("gpt-4.1-mini").profile("production").build()
ProfileRetryCacheRate LimitCircuit BreakerTimeout
developmentNoneNoneNoneNone30s
production3 attempts, exponential backoffIn-memory, 5min TTL60 req/min5 failures → 30s open60s
batch5 attempts, longer backoffNone20 req/min10 failures → 60s open120s

Individual policies

Configure each policy independently with create_llm_client():
Retry transient LLM failures with exponential backoff.
from afk.llms import LLMSettings, RetryPolicy, create_llm_client

client = create_llm_client(
    provider="openai",
    settings=LLMSettings(default_model="gpt-4.1-mini"),
    retry_policy=RetryPolicy(max_retries=3, backoff_base_s=1.0),
)
ParameterDefaultDescription
max_retries3Retry attempts after the initial request
backoff_base_s0.5Initial delay in seconds
backoff_jitter_s0.15Jitter added to retry delays
require_idempotency_keyTrueRequire idempotency keys for retried requests

Fallback chains

Configure a chain of models to try when the primary model fails:
from afk.agents import Agent, FailSafeConfig

agent = Agent(
    name="resilient",
    model="gpt-4.1",
    fail_safe=FailSafeConfig(
        fallback_model_chain=["gpt-4.1-mini", "gpt-4.1-nano"],
    ),
)
# If gpt-4.1 fails → try gpt-4.1-mini → try gpt-4.1-nano

Tuning cheat sheet

GoalSetting
Reduce costsLower temperature, use cheaper model, enable caching
Reduce latencyEnable caching, use faster model, set tight timeout
Handle outagesEnable retry + circuit breaker, add fallback chain
High throughputSet rate limits high, use batch profile, increase concurrency
Consistent outputSet temperature=0.0, enable structured output

Next steps

Agent Integration

How agents resolve and use LLM clients.

Observability

Monitor LLM call latency, errors, and costs.