13: Multi-Model Fallback

What this snippet demonstrates

LLM API calls fail — rate limits, outages, timeouts. AFK’s fallback_model_chain lets you define an ordered list of models to try when the primary model fails. This snippet shows how to configure fallback chains for resilience, cost optimization, and provider diversification.

Basic fallback chain

from afk.agents import Agent, FailSafeConfig

agent = Agent(
    name="resilient-agent",
    model="gpt-5.5",                    # Primary model
    instructions="Be helpful and thorough.",
    fail_safe=FailSafeConfig(
        # Fallback chain: try these models in order if the primary fails
        fallback_model_chain=[
            "gpt-5.5",              # First fallback: cheaper, faster
            "gpt-5.5",              # Last resort: fastest, cheapest
        ],

        # When LLM calls fail, retry then degrade
        llm_failure_policy="retry_then_degrade",

        # Cost ceiling still applies across all models
        max_total_cost_usd=1.00,
    ),
)

When gpt-5.5 fails (timeout, rate limit, outage):

AFK retries with the primary model (controlled by retry policy)
If retries exhaust, it falls through to gpt-5.5
If that also fails, it tries gpt-5.5
If all models fail, the llm_failure_policy determines the outcome

Cost-optimized fallback

Use expensive models only when needed:

from afk.agents import Agent, FailSafeConfig
from afk.core import Runner

# Start cheap, escalate if quality is insufficient
simple_agent = Agent(
    name="classifier",
    model="gpt-5.5",              # Start with cheapest
    instructions="""
    Classify the support ticket. Output exactly one label:
    billing, technical, account, other.
    """,
    fail_safe=FailSafeConfig(
        fallback_model_chain=["gpt-5.5", "gpt-5.5"],
        max_total_cost_usd=0.05,
    ),
)

# Complex tasks get the big model with fallbacks
analysis_agent = Agent(
    name="analyst",
    model="gpt-5.5",                   # Start with most capable
    instructions="""
    Provide detailed technical analysis with code examples.
    Be thorough and precise.
    """,
    fail_safe=FailSafeConfig(
        fallback_model_chain=["gpt-5.5"],
        llm_failure_policy="retry_then_degrade",
        max_total_cost_usd=2.00,
    ),
)

runner = Runner()

# Simple task -> cheap model handles it
r1 = runner.run_sync(simple_agent, user_message="I can't log in")
print(f"Classification: {r1.final_text} (${r1.total_cost_usd or 0:.4f})")

# Complex task -> powerful model with safety net
r2 = runner.run_sync(analysis_agent, user_message="Analyze Python's asyncio event loop")
print(f"Analysis: {r2.final_text[:100]}... (${r2.total_cost_usd or 0:.4f})")

Circuit breaker integration

AFK’s built-in circuit breaker works with fallback chains. When a model triggers too many failures, the breaker opens and the system skips straight to the next fallback:

agent = Agent(
    name="breaker-demo",
    model="gpt-5.5",
    instructions="...",
    fail_safe=FailSafeConfig(
        fallback_model_chain=["gpt-5.5", "gpt-5.5"],

        # Circuit breaker settings
        breaker_failure_threshold=5,     # Open after 5 consecutive failures
        breaker_cooldown_s=30.0,         # Wait 30s before retrying the model

        # Failure handling
        llm_failure_policy="retry_then_degrade",
        max_total_cost_usd=1.00,
    ),
)

Multi-agent with different model tiers

Use different model tiers for different specialists:

from afk.agents import Agent, FailSafeConfig

# Cheap model for simple classification
router = Agent(
    name="router",
    model="gpt-5.5",
    instructions="Route to the correct specialist.",
    fail_safe=FailSafeConfig(fallback_model_chain=["gpt-5.5"]),
    subagents=[
        # Powerful model for complex analysis
        Agent(
            name="analyst",
            model="gpt-5.5",
            instructions="Provide deep technical analysis.",
            fail_safe=FailSafeConfig(
                fallback_model_chain=["gpt-5.5"],
                max_total_cost_usd=1.00,
            ),
        ),
        # Mid-tier model for summarization
        Agent(
            name="summarizer",
            model="gpt-5.5",
            instructions="Summarize findings concisely.",
            fail_safe=FailSafeConfig(
                fallback_model_chain=["gpt-5.5"],
                max_total_cost_usd=0.25,
            ),
        ),
    ],
)

Inspecting which model was used

After a run, check the result metadata and usage aggregate:

result = runner.run_sync(agent, user_message="Analyze this...")

print(f"State: {result.state}")
print(f"Requested model: {result.requested_model}")
print(f"Normalized model: {result.normalized_model}")
print(f"Provider: {result.provider_adapter}")
print(f"Total tokens: {result.usage_aggregate.total_tokens}")
print(f"Total cost: ${result.total_cost_usd or 0:.4f}")

Recommendations

Scenario	Primary Model	Fallback Chain
Classification	`gpt-5.5`	`gpt-5.5`
General chat	`gpt-5.5`	`gpt-5.5`
Complex analysis	`gpt-5.5`	`gpt-5.5` → `gpt-5.5`
Code generation	`gpt-5.5`	`gpt-5.5`
Cost-sensitive batch	`gpt-5.5`	(none)

Runnable Snippets

13: Multi-Model Fallback

What this snippet demonstrates

Basic fallback chain

Cost-optimized fallback

Circuit breaker integration

Multi-agent with different model tiers

Inspecting which model was used

Recommendations

What to read next

​What this snippet demonstrates

​Basic fallback chain

​Cost-optimized fallback

​Circuit breaker integration

​Multi-agent with different model tiers

​Inspecting which model was used

​Recommendations

​What to read next

What this snippet demonstrates

Basic fallback chain

Cost-optimized fallback

Circuit breaker integration

Multi-agent with different model tiers

Inspecting which model was used

Recommendations

What to read next