Start narrow, iterate fast
The most common mistake is building for complexity you don’t have yet. Start with the simplest version that solves the problem, then add capabilities based on real evidence.[!TIP] Test at every step. Don’t add tools before the base prompt works. Don’t add subagents before single-agent tools work. Each layer should be proven before adding the next.
Common patterns
- Classifier agent
- RAG agent
- Coding agent
- Coordinator pattern
Categorize input into predefined labels. No tools needed.Tips:
- Constrain output format explicitly in the prompt
- Test with a diverse set of inputs
- Add evals for each category
Anti-patterns
| Anti-pattern | Problem | Fix |
|---|---|---|
| No cost limits | Runaway agent loops spend $100s in minutes | Always set max_total_cost_usd |
| Vague instructions | Model produces inconsistent output | Be specific: “Output only the category name” |
| Too many tools | Model gets confused choosing between tools | Keep ≤ 5 tools per agent. Split into subagents. |
| Mixing orchestration and execution | Runner logic leaks into tool handlers | Tools should be pure functions. No runner imports. |
| Skipping evals | Prompt changes break behavior silently | Run evals in CI on every PR |
| Untyped tool arguments | Missing validation, hard-to-debug errors | Always use Pydantic models |
| Not classifying failures | Retryable errors treated as terminal (or vice versa) | Return clear error types from tools |
| Giant system prompts | Token waste, instruction drift | Split into skills. Use templates. |
Production readiness checklist
| Area | Requirement | Status |
|---|---|---|
| Safety | FailSafeConfig with cost, step, and time limits | |
| Safety | Policy rules for all mutating tools | |
| Observability | Telemetry exporter configured (OTEL recommended) | |
| Observability | Alerts on error rate and latency | |
| Testing | Eval suite with ≥ 5 cases running in CI | |
| Testing | Golden traces captured for regression detection | |
| Memory | Persistent backend for multi-turn conversations | |
| Memory | Thread compaction configured | |
| Security | Secrets in environment variables, not code | |
| Security | Sandbox profiles for code execution tools |
Next steps
Observability
Set up monitoring, alerting, and dashboards.
Evals
Write behavioral tests for agents.