The LLM runtime includes built-in policies for handling transient failures, managing costs, and protecting against provider outages. Configure them via the builder profile or individual policy settings.Documentation Index
Fetch the complete documentation index at: https://afk.arpan.sh/llms.txt
Use this file to discover all available pages before exploring further.
Policy pipeline
Every LLM request passes through this policy chain:Built-in profiles
Useprofile() to apply a curated set of policies:
| Profile | Retry | Cache | Rate Limit | Circuit Breaker | Timeout |
|---|---|---|---|---|---|
development | None | None | None | None | 30s |
production | 3 attempts, exponential backoff | In-memory, 5min TTL | 60 req/min | 5 failures → 30s open | 60s |
batch | 5 attempts, longer backoff | None | 20 req/min | 10 failures → 60s open | 120s |
Individual policies
Configure each policy independently withcreate_llm_client():
- Retry
- Circuit breaker
- Rate limiting
- Caching
- Timeout
Retry transient LLM failures with exponential backoff.
| Parameter | Default | Description |
|---|---|---|
max_retries | 3 | Retry attempts after the initial request |
backoff_base_s | 0.5 | Initial delay in seconds |
backoff_jitter_s | 0.15 | Jitter added to retry delays |
require_idempotency_key | True | Require idempotency keys for retried requests |
Fallback chains
Configure a chain of models to try when the primary model fails:Tuning cheat sheet
| Goal | Setting |
|---|---|
| Reduce costs | Lower temperature, use cheaper model, enable caching |
| Reduce latency | Enable caching, use faster model, set tight timeout |
| Handle outages | Enable retry + circuit breaker, add fallback chain |
| High throughput | Set rate limits high, use batch profile, increase concurrency |
| Consistent output | Set temperature=0.0, enable structured output |
Next steps
Agent Integration
How agents resolve and use LLM clients.
Observability
Monitor LLM call latency, errors, and costs.