CrossModel

Billing, balance & usage

CrossModel bills against your account balance based on usage. There are really only three things to track: whether your account has balance, how much token usage a request reported, and whether the usage and cost in the console match your expectations.

How billing works

Model calls are billed by token usage. Different models — and different token types — can have different prices; the model page and console show the rates that apply.

Common token types:

TypeNotes
Input tokensYour prompt, message history, system instructions, tool results, and other input.
Output tokensThe content the model generates.
Cache-related tokensSome models report cache-read or cache-write usage; support depends on the provider.
Reasoning tokensSome reasoning models report reasoning tokens, usually counted within the output or usage totals.

usage in responses

Non-streaming requests typically include a usage field in the response body:

{
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 18,
    "total_tokens": 42
  }
}

Field names differ slightly by protocol:

ProtocolCommon fields
Chat Completionsprompt_tokens, completion_tokens, total_tokens
Responsesinput_tokens, output_tokens, total_tokens
Messagesinput_tokens, output_tokens

These fields are handy for your own analytics. The authoritative bill and cost are the records in the CrossModel console.

usage on streaming requests

For streaming requests, usage usually arrives near the end, in a protocol-specific way:

  • Chat Completions: set stream_options.include_usage to get a usage chunk before the stream ends.
  • Responses: the final response and usage arrive in the response.completed event.
  • Messages: cumulative usage arrives in the events before the message ends.

Insufficient balance

If your account is out of balance, the endpoint returns 402 and the request doesn't produce a generation:

{
  "error": {
    "message": "Insufficient balance. Please recharge your wallet.",
    "type": "billing_error",
    "param": null,
    "code": "insufficient_balance"
  }
}

When you hit this, top up in the console and retry.

Keeping costs down

  1. Set a sensible max_tokens / max_completion_tokens so generations don't run long.
  2. Summarize or truncate long documents and long conversations.
  3. For image input, pick an appropriate resolution and count.
  4. Log the response usage on your server so you can attribute cost by tenant, user, or feature.
  5. Check console usage regularly to catch unexpected traffic early.