Production best practices

This page collects the things most likely to trip you up before going live.

Key security

Keep API keys server-side only.
Inject keys via environment variables or a secrets manager.
Don't log request headers.
Rotate long-lived keys periodically.
Use separate keys per environment — development, staging, production.

Reliability

Retry 429, 500, 502, and 503 with exponential backoff.
Set a client timeout — don't wait forever.
Handle mid-stream error events separately for streaming requests.
Keep the model ID as a config value so you can switch quickly.
Cap concurrency on high-volume jobs to avoid tripping RPM or TPM limits.

Cost control

Set a sensible max_tokens / max_completion_tokens per request.
Truncate, summarize, or use retrieval for long text input.
Set an appropriate detail on image requests.
Monitor console usage and balance.
Log each response's usage so you can reconcile against the bill.

Observability

Successful responses return:

x-request-id: req_cm_...
x-crossmodel-model: vendor/model

Record these in your service logs:

x-request-id
user or tenant ID
model ID
HTTP status code
latency
token usage
whether the request was streaming

End-user identifiers

On OpenAI-compatible endpoints, pass safety_identifier (older code can pass user). On Anthropic-compatible endpoints, pass a stable identifier in metadata.user_id.

Use a hashed application-side user ID — don't send raw personal data like an email or phone number.