Skip to main content

Documentation Index

Fetch the complete documentation index at: https://afk.arpan.sh/llms.txt

Use this file to discover all available pages before exploring further.

The LLM layer normalizes communication with language models across all supported providers. Your agent code uses provider-agnostic contracts (LLMRequest / LLMResponse) while built-in adapters handle the provider-specific details.

The LLMBuilder

Create LLM clients with the builder pattern:
from afk.llms import LLMBuilder

client = (
    LLMBuilder()
    .provider("openai")
    .model("gpt-4.1-mini")
    .build()
)
1

Choose a provider

builder = LLMBuilder().provider("openai")
# Also: "anthropic", "litellm", or a custom adapter
2

Set the model

builder = builder.model("gpt-4.1-mini")
3

Add policies (optional)

builder = builder.profile("production")
# retry, timeout, rate limit, circuit breaker
4

Build

client = builder.build()

Middleware

The LLM layer supports middleware for intercepting and transforming requests and responses. Use middleware for logging, tracing, caching, and custom request/response handling.

Built-in middleware

from afk.llms import LLMBuilder
from afk.llms.middleware import MiddlewareStack
from afk.llms.middleware.timeout import (
    TimeoutMiddleware,
    EmbedTimeoutMiddleware,
    StreamTimeoutMiddleware,
    TimeoutConfig,
)

config = TimeoutConfig(
    default_timeout_s=30.0,
    chat_timeout_s=60.0,
    embed_timeout_s=15.0,
    stream_timeout_s=45.0,
)

stack = MiddlewareStack(
    chat=[TimeoutMiddleware(config)],
    embed=[EmbedTimeoutMiddleware(config)],
    stream=[StreamTimeoutMiddleware(config)],
)

client = (
    LLMBuilder()
    .provider("openai")
    .model("gpt-4.1-mini")
    .with_middlewares(stack)
    .build()
)

Custom middleware

from afk.llms import LLMBuilder, LLMRequest, LLMResponse
from afk.llms.middleware import MiddlewareStack

async def tracing_middleware(call_next, req: LLMRequest) -> LLMResponse:
    """Add tracing metadata to requests."""
    req.metadata = req.metadata or {}
    req.metadata["trace_id"] = generate_trace_id()
    req.metadata["span_name"] = "llm.chat"
    return await call_next(req)

client = (
    LLMBuilder()
    .provider("openai")
    .model("gpt-4.1-mini")
    .with_middlewares(MiddlewareStack(
        chat=[tracing_middleware],
        embed=[],
        stream=[],
    ))
    .build()
)

Middleware protocols

ProtocolOperationSignature
LLMChatMiddlewareNon-streaming chatasync (call_next, req: LLMRequest) -> LLMResponse
LLMEmbedMiddlewareEmbeddingsasync (call_next, req: EmbeddingRequest) -> EmbeddingResponse
LLMStreamMiddlewareStreaming chat(call_next, req: LLMRequest) -> AsyncIterator[LLMStreamEvent]

Supported providers

OpenAI

GPT-4.1, GPT-4.1-mini, GPT-4.1-nano, o-series

Anthropic

Claude Opus and Sonnet families

LiteLLM

100+ providers via the LiteLLM proxy
All providers expose the same LLMClient interface. Your agent code never touches provider-specific types.

Example model choices

Use these as starting points, then verify model availability, pricing, and context limits with your provider account.
ScenarioStarting point
General purposeOpenAI gpt-4.1-mini
Complex reasoningOpenAI gpt-4.1 or Anthropic claude-opus-4-5
Cost-sensitiveOpenAI gpt-4.1-nano
Non-OpenAI/Anthropic modelLiteLLM adapter
Custom or self-hostedCustom adapter

How agents use the LLM layer

You rarely build LLMClient directly. Agents resolve their model automatically:
# Option 1: Model name (auto-resolved)
agent = Agent(name="demo", model="gpt-4.1-mini", instructions="Answer directly.")

# Option 2: Pre-built client (full control)
client = LLMBuilder().provider("openai").model("gpt-4.1-mini").profile("production").build()
agent = Agent(name="demo", model=client, instructions="Answer directly.")

Next steps

Contracts

LLMRequest / LLMResponse — what flows across the boundary.

Adapters

Built-in providers and custom adapter registration.