---
title: Billing, balance & usage
description: CrossModel balance, usage-based billing, usage fields, and what happens when you run out.
---

# Billing, balance & usage

CrossModel bills against your account balance based on usage. There are really only three things to track: whether your account has balance, how much token usage a request reported, and whether the usage and cost in the console match your expectations.

## How billing works

Model calls are billed by token usage. Different models — and different token types — can have different prices; the model page and console show the rates that apply.

Common token types:

| Type | Notes |
|------|------|
| Input tokens | Your prompt, message history, system instructions, tool results, and other input. |
| Output tokens | The content the model generates. |
| Cache-related tokens | Some models report cache-read or cache-write usage; support depends on the provider. |
| Reasoning tokens | Some reasoning models report reasoning tokens, usually counted within the output or usage totals. |

## usage in responses

Non-streaming requests typically include a `usage` field in the response body:

```json
{
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 18,
    "total_tokens": 42
  }
}
```

Field names differ slightly by protocol:

| Protocol | Common fields |
|------|------|
| Chat Completions | `prompt_tokens`, `completion_tokens`, `total_tokens` |
| Responses | `input_tokens`, `output_tokens`, `total_tokens` |
| Messages | `input_tokens`, `output_tokens` |

These fields are handy for your own analytics. The authoritative bill and cost are the records in the CrossModel console.

## usage on streaming requests

For streaming requests, usage usually arrives near the end, in a protocol-specific way:

- Chat Completions: set `stream_options.include_usage` to get a usage chunk before the stream ends.
- Responses: the final response and usage arrive in the `response.completed` event.
- Messages: cumulative usage arrives in the events before the message ends.

## Insufficient balance

If your account is out of balance, the endpoint returns `402` and the request doesn't produce a generation:

```json
{
  "error": {
    "message": "Insufficient balance. Please recharge your wallet.",
    "type": "billing_error",
    "param": null,
    "code": "insufficient_balance"
  }
}
```

When you hit this, top up in the console and retry.

## Keeping costs down

1. Set a sensible `max_tokens` / `max_completion_tokens` so generations don't run long.
2. Summarize or truncate long documents and long conversations.
3. For image input, pick an appropriate resolution and count.
4. Log the response `usage` on your server so you can attribute cost by tenant, user, or feature.
5. Check console usage regularly to catch unexpected traffic early.
