Adoption path
Phase 1: Narrow vertical slice
Build one focused agent that does one thing well.What to build:Focus on:
- Single agent with specific instructions
- 0–2 tools for the core capability
- Synchronous execution (
run_sync)
- Getting the prompt right — iterate until the agent reliably produces correct output
- Testing with 10–20 real examples before adding any complexity
- Keeping everything in a single file
Phase 2: Add controls
Give the agent capabilities and constrain them with safety controls.What to add:Focus on:
- Tools with Pydantic argument models
FailSafeConfigwith step, cost, and time limits- Policy rules for dangerous operations
- Basic error handling
- Every tool has a Pydantic model — no untyped arguments
- Cost limits are always set (
max_total_cost_usd) - Dangerous operations require approval
Phase 3: Production controls
Add the infrastructure for monitoring, testing, and safe deployment.What to add:Focus on:
- Telemetry (console for dev, OTEL for prod)
- Eval suite with at least 5 test cases
- Memory persistence for multi-turn conversations
- Streaming for user-facing UIs
- Evals run in CI on every pull request
- Alerting on error rate and latency
- Memory compaction for long-running threads
Phase 4: Release discipline
Establish processes for safe agent evolution.What to add:
- Golden trace comparison for regression detection
- Budget-gated CI (releases blocked if evals fail)
- Canary deployments with cost monitoring
- Documentation for on-call and incident response
- Run the full eval suite before every release
- Compare golden traces to catch behavioral drift
- Monitor cost-per-run trends for budget anomalies
- Keep system prompts in version-controlled files (not inline strings)
Phase checklist
Use this to track your progress:| Phase | Milestone | Status |
|---|---|---|
| 1 | Agent produces correct output for 20+ test cases | |
| 2 | Tools have Pydantic models, FailSafe limits are set | |
| 2 | Policy rules gate all mutating operations | |
| 3 | Telemetry is configured and exporting | |
| 3 | Eval suite runs in CI with ≥ 5 cases | |
| 3 | Memory persistence is configured | |
| 4 | Golden traces are captured and compared | |
| 4 | Releases are gated by eval pass rate |