Rate limits & retries

CrossModel sets a request-rate limit per API key to keep the service stable. When you exceed it, the endpoint returns 429 rate_limit_error.

Current limits

Limit	Current allowance	Notes
RPM	60 requests / minute	At most 60 requests per minute.
TPM	100,000 tokens / minute	At most 100,000 tokens per minute.

Limits are tracked per API key. If you need a higher allowance, contact CrossModel support.

Over-limit responses

When you exceed the request count:

{
  "error": {
    "message": "Rate limit exceeded: max 60 requests per minute.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

When you exceed the token limit:

{
  "error": {
    "message": "Rate limit exceeded: max 100000 tokens per minute.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Reducing the chance of hitting a limit

Cap your concurrent requests.
Queue batch jobs instead of firing everything at once.
Set a sensible max_tokens / max_completion_tokens.
Cache reusable results at the application layer.
Summarize, chunk, or use retrieval for long inputs.

Client retry guidance

Retry only recoverable errors:

HTTP status	Retry?	Notes
`400`	No	Malformed request or bad parameters — fix the request and resend.
`401`	No	API key missing, wrong, or disabled.
`402`	No	Insufficient balance — top up.
`404`	No	Model doesn't exist or is unavailable — check the model ID.
`429`	Yes	Wait, then retry; lower your concurrency or token cap.
`502`	Yes	Temporarily unavailable — back off briefly and retry.
`503`	Yes	Model temporarily unavailable — back off briefly and retry.
`500`	Yes	Server error — a brief retry is fine; if it persists, contact support.

Use exponential backoff with a little random jitter:

const delays = [500, 1000, 2000, 4000];
 
for (let i = 0; i < delays.length; i++) {
  const res = await callCrossModel();
  if (res.ok) return res;
 
  if (![429, 500, 502, 503].includes(res.status)) {
    throw new Error(await res.text());
  }
 
  const jitter = Math.floor(Math.random() * 250);
  await new Promise(resolve => setTimeout(resolve, delays[i] + jitter));
}

Retrying streaming requests

Once a streaming request has started returning content, don't append the result of an automatic retry onto the output you already have. The safer move is to mark the generation as failed and let the user or your business logic start a fresh, complete request.