Skip to main content
The LLM runtime includes built-in policies for handling transient failures, managing costs, and protecting against provider outages. Configure them via the builder profile or individual policy settings.

Policy pipeline

Every LLM request passes through this policy chain:

Built-in profiles

Use profile() to apply a curated set of policies:
from afk.llms import LLMBuilder

# Development: no retry, no caching, fast failures
dev = LLMBuilder().provider("openai").model("gpt-5.2-mini").profile("development").build()

# Production: retry, circuit breaker, rate limiting, caching
prod = LLMBuilder().provider("openai").model("gpt-5.2-mini").profile("production").build()
ProfileRetryCacheRate LimitCircuit BreakerTimeout
developmentNoneNoneNoneNone30s
production3 attempts, exponential backoffIn-memory, 5min TTL60 req/min5 failures → 30s open60s
batch5 attempts, longer backoffNone20 req/min10 failures → 60s open120s

Individual policies

Configure each policy independently:
Retry transient LLM failures with exponential backoff.
client = (
    LLMBuilder()
    .provider("openai")
    .model("gpt-5.2-mini")
    .retry(max_attempts=3, backoff_base=1.0, backoff_max=30.0)
    .build()
)
ParameterDefaultDescription
max_attempts3Total attempts (1 initial + 2 retries)
backoff_base1.0Initial delay in seconds
backoff_max30.0Maximum delay between retries
retryable_errors[429, 500, 502, 503]HTTP status codes to retry

Fallback chains

Configure a chain of models to try when the primary model fails:
from afk.agents import Agent, FailSafeConfig

agent = Agent(
    name="resilient",
    model="gpt-5.2",
    fail_safe=FailSafeConfig(
        fallback_model_chain=["gpt-5.2-mini", "gpt-5.2-nano"],
    ),
)
# If gpt-5.2 fails → try gpt-5.2-mini → try gpt-5.2-nano

Tuning cheat sheet

GoalSetting
Reduce costsLower temperature, use cheaper model, enable caching
Reduce latencyEnable caching, use faster model, set tight timeout
Handle outagesEnable retry + circuit breaker, add fallback chain
High throughputSet rate limits high, use batch profile, increase concurrency
Consistent outputSet temperature=0.0, enable structured output

Next steps