Production best practices
This page collects the things most likely to trip you up before going live.
Key security
- Keep API keys server-side only.
- Inject keys via environment variables or a secrets manager.
- Don't log request headers.
- Rotate long-lived keys periodically.
- Use separate keys per environment — development, staging, production.
Reliability
- Retry
429,500,502, and503with exponential backoff. - Set a client timeout — don't wait forever.
- Handle mid-stream error events separately for streaming requests.
- Keep the model ID as a config value so you can switch quickly.
- Cap concurrency on high-volume jobs to avoid tripping RPM or TPM limits.
Cost control
- Set a sensible
max_tokens/max_completion_tokensper request. - Truncate, summarize, or use retrieval for long text input.
- Set an appropriate
detailon image requests. - Monitor console usage and balance.
- Log each response's
usageso you can reconcile against the bill.
Observability
Successful responses return:
x-request-id: req_cm_...
x-crossmodel-model: vendor/modelRecord these in your service logs:
x-request-id- user or tenant ID
- model ID
- HTTP status code
- latency
- token usage
- whether the request was streaming
End-user identifiers
On OpenAI-compatible endpoints, pass safety_identifier (older code can pass user). On Anthropic-compatible endpoints, pass a stable identifier in metadata.user_id.
Use a hashed application-side user ID — don't send raw personal data like an email or phone number.