---
title: Production best practices
description: Security, reliability, and observability guidance for running the CrossModel API in production.
---

# Production best practices

This page collects the things most likely to trip you up before going live.

## Key security

- Keep API keys server-side only.
- Inject keys via environment variables or a secrets manager.
- Don't log request headers.
- Rotate long-lived keys periodically.
- Use separate keys per environment — development, staging, production.

## Reliability

- Retry `429`, `500`, `502`, and `503` with exponential backoff.
- Set a client timeout — don't wait forever.
- Handle mid-stream error events separately for streaming requests.
- Keep the model ID as a config value so you can switch quickly.
- Cap concurrency on high-volume jobs to avoid tripping RPM or TPM limits.

## Cost control

- Set a sensible `max_tokens` / `max_completion_tokens` per request.
- Truncate, summarize, or use retrieval for long text input.
- Set an appropriate `detail` on image requests.
- Monitor console usage and balance.
- Log each response's `usage` so you can reconcile against the bill.

## Observability

Successful responses return:

```http
x-request-id: req_cm_...
x-crossmodel-model: vendor/model
```

Record these in your service logs:

- `x-request-id`
- user or tenant ID
- model ID
- HTTP status code
- latency
- token usage
- whether the request was streaming

## End-user identifiers

On OpenAI-compatible endpoints, pass `safety_identifier` (older code can pass `user`). On Anthropic-compatible endpoints, pass a stable identifier in `metadata.user_id`.

Use a hashed application-side user ID — don't send raw personal data like an email or phone number.
