CrossModel

Production best practices

This page collects the things most likely to trip you up before going live.

Key security

  • Keep API keys server-side only.
  • Inject keys via environment variables or a secrets manager.
  • Don't log request headers.
  • Rotate long-lived keys periodically.
  • Use separate keys per environment — development, staging, production.

Reliability

  • Retry 429, 500, 502, and 503 with exponential backoff.
  • Set a client timeout — don't wait forever.
  • Handle mid-stream error events separately for streaming requests.
  • Keep the model ID as a config value so you can switch quickly.
  • Cap concurrency on high-volume jobs to avoid tripping RPM or TPM limits.

Cost control

  • Set a sensible max_tokens / max_completion_tokens per request.
  • Truncate, summarize, or use retrieval for long text input.
  • Set an appropriate detail on image requests.
  • Monitor console usage and balance.
  • Log each response's usage so you can reconcile against the bill.

Observability

Successful responses return:

x-request-id: req_cm_...
x-crossmodel-model: vendor/model

Record these in your service logs:

  • x-request-id
  • user or tenant ID
  • model ID
  • HTTP status code
  • latency
  • token usage
  • whether the request was streaming

End-user identifiers

On OpenAI-compatible endpoints, pass safety_identifier (older code can pass user). On Anthropic-compatible endpoints, pass a stable identifier in metadata.user_id.

Use a hashed application-side user ID — don't send raw personal data like an email or phone number.