Retries and timeouts
Default retry counts and per-attempt timeouts for each activity kind, and how to override them.
Defaults by activity kind
| Kind | Default max_attempts | Why | Default per-attempt timeout |
|---|---|---|---|
code | 3 | Pure computation — transient failures are uncommon but retrying is safe | 30 seconds |
llm | 3 | API rate limits and transient model errors are common | 300 seconds |
cli | 1 | External commands may have non-idempotent side effects | 60 seconds |
mcp_tool | 1 | MCP tool calls may have non-idempotent side effects | 60 seconds |
Why cli and mcp_tool default to 1 attempt: these steps often write to external systems. Retrying a write unexpectedly is worse than failing fast and letting the operator decide. If your cli or mcp_tool step is idempotent, you can safely increase max_attempts.
schedule_to_start_timeout
Every activity has a 30-second schedule_to_start_timeout. If no worker picks up the activity within 30 seconds of it being scheduled, the activity fails with a "no worker on this queue" error. This fails fast rather than waiting indefinitely for a worker that may never come online.
If you see this error, run cori status to check whether the expected worker is online.
Overriding retries and timeouts
Per step, in the step file:
export default step.cli({
description: 'Idempotent write to S3',
command: ({ input }) => `aws s3 cp ${input.file} s3://my-bucket/`,
parse_output: (stdout) => ({ success: true }),
retries: {
max_attempts: 3,
},
timeout_ms: 120_000, // 2 minutes
});export default step.llm({
description: 'Translate a large document',
model: 'gpt-4o',
prompt: ({ input }) => `Translate to French:\n\n${input.text}`,
output_schema: output,
timeout_ms: 600_000, // 10 minutes for large documents
});Retryable vs. non-retryable errors
| Error class | Retried? |
|---|---|
| Transient API error (rate limit, 5xx) | Yes, up to max_attempts |
Non-zero exit code from a cli step | Yes, up to max_attempts |
| MCP tool error | Yes, up to max_attempts |
schedule_to_start timeout (no worker) | No — fails immediately |
| Manifest validation failure | No — fails immediately, fix required |
| TypeScript compilation error | No — fails immediately, fix required |
TODO: Confirm exact retry option names (retries.max_attempts vs. retries: { max, backoff }) against the current SDK. This reference uses retries.max_attempts based on AGENTS.md.