Agent Integration

This page explains how agents connect to the LLM layer — from model resolution to request construction, streaming, and error handling.

Model resolution

Agents can specify their model in two ways:

Model name (auto-resolved)
Pre-built client

Pass a model name string. AFK resolves it to an LLM client using the default provider.

agent = Agent(
    name="demo",
    model="gpt-5.5",
    instructions="Answer directly.",
)

The resolution order:

Check agent.model_resolver (custom function)
Check registered adapters for matching provider prefix
Default to OpenAI adapter

Pass a pre-configured LLMClient for full control over provider settings.

from afk.llms import LLMBuilder

client = (
    LLMBuilder()
    .provider("openai")
    .model("gpt-5.5")
    .profile("production")
    .build()
)

agent = Agent(
    name="demo",
    model=client,
    instructions="Answer directly.",
)

How the runner uses the LLM

On each step of the agent loop:

Request construction

The runner builds an LLMRequest from multiple sources:

Source	Contributes	Priority
`Agent.instructions`	System message	Highest
Thread history	Previous messages	—
`user_message`	User message	—
`Agent.tools`	Tool schemas	—
`Agent.subagents`	Transfer tool schemas	—
`RunnerConfig`	Temperature, max_tokens	Lowest

Streaming integration

When using run_stream(), the runner passes through streaming events from the LLM:

handle = await runner.run_stream(agent, user_message="Explain DNS")

async for event in handle:
    match event.type:
        case "text_delta":
            # ← Comes from LLM streaming
            print(event.text_delta, end="")
        case "tool_started":
            # ← Comes from the runner
            print(f"\n[TOOL] {event.tool_name}")

text_delta events come from two paths:

Provider streaming when the adapter supports stream deltas.
Runner fallback chunking of final text for non-streaming providers.

Tool and step events are generated by the runner.

Error handling

LLM errors are classified and handled automatically:

Error	Classification	Runner behavior
Rate limit (429)	Retryable	Retry with backoff
Server error (500, 502, 503)	Retryable	Retry with backoff
Auth error (401, 403)	Terminal	Fail the run
Invalid request (400)	Terminal	Fail the run
Timeout	Retryable	Retry (if attempts remain)
Circuit breaker open	Terminal	Try fallback model or fail
All retries exhausted	Terminal	Try fallback model or fail

Model selection guide

Treat these as example starting points. Confirm current model names, prices, and rate limits with the provider you use.

Task	Starting model	Why
Simple Q&A, classification	`gpt-5.5`	Fast, cheap, good enough
General purpose with tools	`gpt-5.5`	Best balance of cost and capability
Complex reasoning, coding	`gpt-5.5` or `opus-4.8`	Better at multi-step reasoning
Cost-sensitive batch	`gpt-5.5`	Lowest cost per token
Higher quality generation	`gpt-5.5` + low temperature	More capability, less variation

Start Here

Core Building Blocks

LLM Runtime

Production

Integrations

Agent Integration

Model resolution

How the runner uses the LLM

Request construction

Streaming integration

Error handling

Model selection guide

Next steps

Core Runner

Streaming

​Model resolution

​How the runner uses the LLM

​Request construction

​Streaming integration

​Error handling

​Model selection guide

​Next steps

Core Runner

Streaming

Model resolution

How the runner uses the LLM

Request construction

Streaming integration

Error handling

Model selection guide

Next steps