CrossModel

Rate limits & retries

CrossModel sets a request-rate limit per API key to keep the service stable. When you exceed it, the endpoint returns 429 rate_limit_error.

Current limits

LimitCurrent allowanceNotes
RPM60 requests / minuteAt most 60 requests per minute.
TPM100,000 tokens / minuteAt most 100,000 tokens per minute.

Limits are tracked per API key. If you need a higher allowance, contact CrossModel support.

Over-limit responses

When you exceed the request count:

{
  "error": {
    "message": "Rate limit exceeded: max 60 requests per minute.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

When you exceed the token limit:

{
  "error": {
    "message": "Rate limit exceeded: max 100000 tokens per minute.",
    "type": "rate_limit_error",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Reducing the chance of hitting a limit

  • Cap your concurrent requests.
  • Queue batch jobs instead of firing everything at once.
  • Set a sensible max_tokens / max_completion_tokens.
  • Cache reusable results at the application layer.
  • Summarize, chunk, or use retrieval for long inputs.

Client retry guidance

Retry only recoverable errors:

HTTP statusRetry?Notes
400NoMalformed request or bad parameters — fix the request and resend.
401NoAPI key missing, wrong, or disabled.
402NoInsufficient balance — top up.
404NoModel doesn't exist or is unavailable — check the model ID.
429YesWait, then retry; lower your concurrency or token cap.
502YesTemporarily unavailable — back off briefly and retry.
503YesModel temporarily unavailable — back off briefly and retry.
500YesServer error — a brief retry is fine; if it persists, contact support.

Use exponential backoff with a little random jitter:

const delays = [500, 1000, 2000, 4000];
 
for (let i = 0; i < delays.length; i++) {
  const res = await callCrossModel();
  if (res.ok) return res;
 
  if (![429, 500, 502, 503].includes(res.status)) {
    throw new Error(await res.text());
  }
 
  const jitter = Math.floor(Math.random() * 250);
  await new Promise(resolve => setTimeout(resolve, delays[i] + jitter));
}

Retrying streaming requests

Once a streaming request has started returning content, don't append the result of an automatic retry onto the output you already have. The safer move is to mark the generation as failed and let the user or your business logic start a fresh, complete request.