Policy pipeline
Every LLM request passes through this policy chain:Built-in profiles
Useprofile() to apply a curated set of policies:
| Profile | Retry | Cache | Rate Limit | Circuit Breaker | Timeout |
|---|---|---|---|---|---|
development | None | None | None | None | 30s |
production | 3 attempts, exponential backoff | In-memory, 5min TTL | 60 req/min | 5 failures → 30s open | 60s |
batch | 5 attempts, longer backoff | None | 20 req/min | 10 failures → 60s open | 120s |
Individual policies
Configure each policy independently:- Retry
- Circuit breaker
- Rate limiting
- Caching
- Timeout
Retry transient LLM failures with exponential backoff.
| Parameter | Default | Description |
|---|---|---|
max_attempts | 3 | Total attempts (1 initial + 2 retries) |
backoff_base | 1.0 | Initial delay in seconds |
backoff_max | 30.0 | Maximum delay between retries |
retryable_errors | [429, 500, 502, 503] | HTTP status codes to retry |
Fallback chains
Configure a chain of models to try when the primary model fails:Tuning cheat sheet
| Goal | Setting |
|---|---|
| Reduce costs | Lower temperature, use cheaper model, enable caching |
| Reduce latency | Enable caching, use faster model, set tight timeout |
| Handle outages | Enable retry + circuit breaker, add fallback chain |
| High throughput | Set rate limits high, use batch profile, increase concurrency |
| Consistent output | Set temperature=0.0, enable structured output |