What this snippet demonstrates
Agent runs can be interrupted by timeouts, cancellations, infrastructure failures, or intentional pauses (such as waiting for human approval). When a run is interrupted, the runner persists a checkpoint containing the run’s state at the point of interruption. Theresume() method picks up from that checkpoint, restoring the conversation history, tool execution records, and step counter so the agent continues where it left off rather than starting from scratch.
Over time, long-running threads accumulate checkpoint records, event logs, and state entries. The compact_thread() method prunes old records according to retention policies, keeping storage bounded without losing the data needed for active runs.
Resuming an interrupted run
How resume works internally
The runner follows this sequence whenresume() is called:
-
Checkpoint lookup — The runner queries the memory store for the latest checkpoint matching the given
run_idandthread_id. If no checkpoint exists, it raisesAgentCheckpointCorruptionError. - Terminal check — If the checkpoint already contains a terminal result (the run completed before the resume was requested), the runner returns that result immediately without re-executing.
- Snapshot restoration — The runner loads the runtime snapshot from the checkpoint, which includes the conversation message history, step counter, tool execution records, and any pending subagent state.
-
Continued execution — The runner calls
run_handle()internally with the restored snapshot, continuing the step loop from where it was interrupted.
Resume method signature
| Parameter | Type | Description |
|---|---|---|
agent | BaseAgent | The agent definition used for continued execution. Must match the agent that started the original run. |
run_id | str | The unique run identifier from the interrupted run. Found on result.run_id. |
thread_id | str | The thread identifier from the interrupted run. Found on result.thread_id. |
context | dict or None | Optional context overlay. Merged with the original run context. |
Compacting thread memory
How compaction works
Compaction operates on two dimensions of stored data:-
Event retention — Controlled by
RetentionPolicy. Removes event records older thanmax_age_ms. Events are the raw telemetry log entries (LLM calls, tool executions, state transitions) that accumulate over the lifetime of a thread. -
State retention — Controlled by
StateRetentionPolicy. Removes state entries that exceedmax_entries, keeping only the most recent ones. State entries include checkpoint snapshots, conversation summaries, and key-value metadata.
MemoryCompactionResult with counts of removed records so you can log or alert on compaction activity.
When to compact
- After long conversations — Threads with hundreds of turns accumulate large checkpoint histories. Compact after the conversation ends or reaches a natural break point.
- On a schedule — Run compaction as a background task (e.g., hourly or daily) for threads that are still active but have grown large.
- Before resume — If you know a thread has extensive history, compacting before resume reduces the data the runner needs to load.
Error handling
What to read next
- Memory — Full memory architecture, checkpoint schema, and retention policies.
- Core Runner — Step loop lifecycle, state machine, and all runner API methods.
- Checkpoint Schema — Exact structure of checkpoint records stored in memory.