Rate limits & retries
CrossModel sets a request-rate limit per API key to keep the service stable. When you exceed it, the endpoint returns 429 rate_limit_error.
Current limits
| Limit | Current allowance | Notes |
|---|---|---|
| RPM | 60 requests / minute | At most 60 requests per minute. |
| TPM | 100,000 tokens / minute | At most 100,000 tokens per minute. |
Limits are tracked per API key. If you need a higher allowance, contact CrossModel support.
Over-limit responses
When you exceed the request count:
{
"error": {
"message": "Rate limit exceeded: max 60 requests per minute.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}When you exceed the token limit:
{
"error": {
"message": "Rate limit exceeded: max 100000 tokens per minute.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}Reducing the chance of hitting a limit
- Cap your concurrent requests.
- Queue batch jobs instead of firing everything at once.
- Set a sensible
max_tokens/max_completion_tokens. - Cache reusable results at the application layer.
- Summarize, chunk, or use retrieval for long inputs.
Client retry guidance
Retry only recoverable errors:
| HTTP status | Retry? | Notes |
|---|---|---|
400 | No | Malformed request or bad parameters — fix the request and resend. |
401 | No | API key missing, wrong, or disabled. |
402 | No | Insufficient balance — top up. |
404 | No | Model doesn't exist or is unavailable — check the model ID. |
429 | Yes | Wait, then retry; lower your concurrency or token cap. |
502 | Yes | Temporarily unavailable — back off briefly and retry. |
503 | Yes | Model temporarily unavailable — back off briefly and retry. |
500 | Yes | Server error — a brief retry is fine; if it persists, contact support. |
Use exponential backoff with a little random jitter:
const delays = [500, 1000, 2000, 4000];
for (let i = 0; i < delays.length; i++) {
const res = await callCrossModel();
if (res.ok) return res;
if (![429, 500, 502, 503].includes(res.status)) {
throw new Error(await res.text());
}
const jitter = Math.floor(Math.random() * 250);
await new Promise(resolve => setTimeout(resolve, delays[i] + jitter));
}Retrying streaming requests
Once a streaming request has started returning content, don't append the result of an automatic retry onto the output you already have. The safer move is to mark the generation as failed and let the user or your business logic start a fresh, complete request.