Rate Limits
Understanding and working with API rate limits.
Overview
Rate limits protect our infrastructure and ensure fair usage for all users. Limits are applied per API key.
Default Limits
| Limit Type | Default | Description |
|---|---|---|
| Requests per minute | 60 | Maximum API calls per minute |
| Concurrent predictions | 5 | Maximum running predictions at once |
| Daily spending | No limit | Optional, configurable per key |
| Monthly spending | No limit | Optional, configurable per key |
Rate Limit Headers
Every response includes headers showing your current rate limit status:
| Header | Description |
|---|---|
| X-RateLimit-Limit | Maximum requests allowed per minute |
| X-RateLimit-Remaining | Requests remaining in current window |
| X-RateLimit-Reset | Unix timestamp when the limit resets |
| Retry-After | Seconds to wait (only on 429 errors) |
Response Headers Example
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1705312800Handling Rate Limits
When you exceed the rate limit, you'll receive a 429 Too Many Requests response. Implement exponential backoff to handle this gracefully:
Node.js - Exponential Backoff
async function apiRequest(url, options, maxRetries = 3) {
let retries = 0;
while (retries < maxRetries) {
const response = await fetch(url, options);
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || 60;
const delay = Math.min(retryAfter * 1000, Math.pow(2, retries) * 1000);
console.log(`Rate limited. Retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
retries++;
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}Best Practices
✓ Do
- Implement exponential backoff for retries
- Monitor rate limit headers in responses
- Use webhooks for long-running predictions
- Cache responses when possible
- Batch requests where supported
Don't
- Retry immediately after a 429 error
- Poll predictions more than once per second
- Create multiple API keys to bypass limits
- Ignore the Retry-After header
Concurrent Prediction Limits
The concurrent limit restricts how many predictions can run simultaneously. A prediction counts against your limit from creation until completion.
429 Concurrent Limit Response
{
"error": {
"type": "rate_limit_error",
"message": "Concurrent prediction limit exceeded. 5/5 predictions running.",
"code": "CONCURRENT_LIMIT_EXCEEDED",
"details": {
"limit": 5,
"active": 5
}
}
}Increasing Your Limits
Need higher limits? Enterprise plans include increased rate limits and dedicated support. Contact us to discuss your requirements.
Contact Sales