Pricing is rule‑based and model‑aware so you can balance UX and margins
without changing code.
Concepts
- Per‑model rules — set different prices per model so users can self‑select price/performance.
- Layers you can combine:
- Per request — a flat fee per API call (great for tools/actions and short prompts) (soon).
- Per token — meter prompt + completion tokens (finalized after generation/stream end).
- Hybrid — minimum fee + per‑token to keep margins predictable across varied prompts (soon).
- Credits/promos — trial or promo balances to reduce onboarding friction.
You control pricing in the dashboard per model. Typical fields include
prompt/completion prices (per million tokens), per‑request minimum fee, and
optional markup.
Per‑model dynamic pricing
Use model‑specific prices to match cost and value:- Budget chat (e.g.,
openai/gpt-4o-mini
): lower per‑token, low minimum. - Premium reasoning (e.g., larger models): higher per‑token and/or higher minimum.
- Tool‑heavy agents: small minimum + manual per‑tool fees (see Manual charges below).
Metering details
- Token accounting — prompt and completion tokens are computed per request.
- Streaming — token totals are finalized at the end of the stream; charges are written to the ledger then.
- Retries — if you retry a non‑idempotent POST (e.g., a manual charge) use the same
Idempotency-Key
so you don’t double‑charge. - Combination — automatic token charges and your manual charges can coexist in the same ledger period.