Idempotency
- Use an
Idempotency-Key
header for non‑idempotent POSTs such asPOST /v1/user/charge
(manual charges) or any custom side effects. - Generate a unique key per logical operation (e.g.,
requestId
or a UUID). If you retry, reuse the same key to prevent duplicate charges. Keys are typically retained for around 24 hours.
Retries & backoff
- For 429/5xx from providers, use exponential backoff with jitter.
Streaming resilience
- If a stream is interrupted, you can retry the whole request (idempotently) or show the partial transcript and let the user continue.
- Buffer streamed text on the server if you need to store a complete copy alongside the live stream.
HTTP statuses and response shapes
- Chat proxy (insufficient funds/authorization): returns
200 OK
with a normal assistant message that already contains an authorization or top‑up link—render it as‑is. - Validation errors (missing fields, bad model): return standard 4xx with JSON errors.
- Server/transient failures: return 5xx; retry with backoff.
Rate limits
- Treat
429
as a signal to slow down; back off and optionally queue non‑interactive work.
Timeouts
- Set request timeouts to avoid stuck connections; prefer cancellable streams in clients and servers.
Observability
- Log
user
,requestId
, and paywall event ids with your application logs so support and finance can reconcile quickly.