Documentation Index Fetch the complete documentation index at: https://afk.arpan.sh/llms.txt
Use this file to discover all available pages before exploring further.
This guide covers deploying AFK agents to production environments, from single-container setups to distributed, multi-worker deployments.
Docker deployment
Basic Dockerfile
FROM python:3.13-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Run your application entrypoint that creates AFK agents/runners
CMD [ "python" , "-m" , "your_app.server" ]
Production Dockerfile with multi-stage build
FROM python:3.13-slim AS builder
WORKDIR /app
RUN pip install --upgrade pip
COPY requirements.txt .
RUN pip install --prefix=/install -r requirements.txt
# Production image
FROM python:3.13-slim
WORKDIR /app
COPY --from=builder /install /usr/local
COPY . .
# Run as non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
CMD [ "python" , "-m" , "your_app.server" ]
docker-compose.yml
version : '3.8'
services :
agent :
build : .
ports :
- "8000:8000"
environment :
- OPENAI_API_KEY=${OPENAI_API_KEY}
- AFK_MEMORY_BACKEND=redis
- AFK_REDIS_URL=redis://redis:6379
- AFK_QUEUE_BACKEND=redis
- AFK_QUEUE_REDIS_URL=redis://redis:6379
depends_on :
- redis
restart : unless-stopped
healthcheck :
test : [ "CMD" , "curl" , "-f" , "http://localhost:8000/health" ]
interval : 30s
timeout : 10s
retries : 3
redis :
image : redis:7-alpine
volumes :
- redis_data:/data
restart : unless-stopped
volumes :
redis_data :
Environment configuration
Required environment variables
# LLM Provider
OPENAI_API_KEY = sk-... # Required for OpenAI
# or
ANTHROPIC_API_KEY = sk-ant-... # For Anthropic
# Memory backend
AFK_MEMORY_BACKEND = postgres # Options: memory, sqlite, redis, postgres
AFK_SQLITE_PATH = ./data/memory.sqlite3
AFK_REDIS_URL = redis://localhost:6379
AFK_PG_DSN = postgresql://user:pass@host/db
AFK_VECTOR_DIM = 1536 # Required for Postgres vector search
# Queue backend
AFK_QUEUE_BACKEND = redis
AFK_QUEUE_REDIS_URL = redis://localhost:6379
# Observability
AFK_TELEMETRY = otel # Options: console, json, otel, none
OTEL_EXPORTER_OTLP_ENDPOINT = http://telemetry:4317
# Server mode
AFK_SERVER_PORT = 8000
AFK_SERVER_WORKERS = 4
Production configuration file
Create config/production.yaml:
agent :
default_model : gpt-4.1-mini
default_fail_safe :
max_steps : 20
max_tool_calls : 10
max_total_cost_usd : 1.00
max_wall_time_s : 120
llm :
provider : openai
profile : production
memory :
backend : postgres
postgres_dsn : ${AFK_PG_DSN}
queue :
backend : redis
redis_url : ${AFK_QUEUE_REDIS_URL}
max_concurrency : 10
telemetry :
exporter : otel
service_name : afk-agent
export_interval_ms : 5000
Scaling patterns
Horizontal scaling with workers
from afk.queues import RUNNER_CHAT_CONTRACT , InMemoryTaskQueue, TaskWorker
from afk.core import Runner
queue = InMemoryTaskQueue()
worker = TaskWorker(
queue = queue,
agents = { "analyzer" : agent},
runner_factory = lambda : Runner(),
execution_contracts = [ RUNNER_CHAT_CONTRACT ],
)
await worker.start()
Kubernetes deployment
apiVersion : apps/v1
kind : Deployment
metadata :
name : afk-agent
spec :
replicas : 3
selector :
matchLabels :
app : afk-agent
template :
metadata :
labels :
app : afk-agent
spec :
containers :
- name : agent
image : your-registry/afk-agent:latest
ports :
- containerPort : 8000
env :
- name : OPENAI_API_KEY
valueFrom :
secretKeyRef :
name : llm-secrets
key : api-key
- name : AFK_MEMORY_BACKEND
value : "redis"
- name : AFK_REDIS_URL
value : "redis://redis:6379"
resources :
requests :
memory : "256Mi"
cpu : "250m"
limits :
memory : "512Mi"
cpu : "500m"
livenessProbe :
httpGet :
path : /health
port : 8000
initialDelaySeconds : 10
periodSeconds : 30
readinessProbe :
httpGet :
path : /ready
port : 8000
initialDelaySeconds : 5
periodSeconds : 10
Kubernetes HPA for auto-scaling
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : afk-agent-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : afk-agent
minReplicas : 2
maxReplicas : 10
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 70
- type : Pods
pods :
metric :
name : queue_depth
target :
type : AverageValue
averageValue : "10"
Health checks
Implement health endpoints in your server:
from afk.core import Runner
from afk.memory import InMemoryMemoryStore
app = FastAPI()
runner = Runner()
memory_store = InMemoryMemoryStore()
@app.get ( "/health" )
async def health ():
return { "status" : "healthy" }
@app.get ( "/ready" )
async def ready ():
try :
await memory_store.health_check()
return { "status" : "ready" , "memory" : "ok" }
except Exception as e:
raise HTTPException( status_code = 503 , detail = str (e))
Database schema
SQLite (development)
SQLite requires no schema setup — tables are created automatically on first use.
PostgreSQL
-- Run these for production PostgreSQL deployments
CREATE EXTENSION IF NOT EXISTS vector ;
CREATE TABLE afk_events (
id TEXT PRIMARY KEY ,
thread_id TEXT NOT NULL ,
run_id TEXT NOT NULL ,
event_type TEXT NOT NULL ,
role TEXT ,
content TEXT ,
metadata JSONB,
created_at TIMESTAMPTZ NOT NULL ,
INDEX idx_thread_id (thread_id),
INDEX idx_run_id (run_id),
INDEX idx_created_at (created_at)
);
CREATE TABLE afk_checkpoints (
id TEXT PRIMARY KEY ,
run_id TEXT NOT NULL ,
thread_id TEXT NOT NULL ,
step INTEGER NOT NULL ,
state JSONB NOT NULL ,
created_at TIMESTAMPTZ NOT NULL ,
UNIQUE (run_id, step)
);
CREATE TABLE afk_long_term_memory (
id TEXT PRIMARY KEY ,
user_id TEXT ,
scope TEXT ,
data JSONB,
text TEXT ,
embedding VECTOR ( 1536 ),
tags TEXT [],
metadata JSONB,
created_at TIMESTAMPTZ NOT NULL ,
updated_at TIMESTAMPTZ NOT NULL
);
-- Vector similarity search
CREATE INDEX ON afk_long_term_memory
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100 );
Security checklist
Secrets management
Store API keys in secrets managers (AWS Secrets Manager, HashiCorp Vault, Kubernetes Secrets). Never commit keys to version control.
Network policies
Restrict traffic between services. Agents should only reach LLM providers and necessary databases.
Rate limiting
Configure rate limits on public endpoints to prevent abuse.
Cost limits
Always set max_total_cost_usd in FailSafeConfig for production agents.
Audit logging
Enable telemetry export to your logging infrastructure for compliance.
Monitoring
Key metrics to track:
Metric What it indicates Alert threshold agent.run.durationHow long runs take > 60s p95 agent.run.costToken spend per run > $0.50 per run agent.run.failuresFailed runs > 5% error rate llm.latencyLLM response time > 10s p95 llm.errorsLLM API errors > 1% error rate queue.depthPending tasks > 100 items queue.dead_lettersFailed tasks > 0
Next steps
Observability Set up telemetry and alerting for production monitoring.
Security Model Security hardening checklist and best practices.
Evals CI-gated quality checks for agent releases.
Building with AI Production patterns and anti-patterns.