Pricing is rule‑based and model‑aware so you can balance UX and margins without changing code.

Concepts

  • Per‑model rules — set different prices per model so users can self‑select price/performance.
  • Layers you can combine:
    • Per request — a flat fee per API call (great for tools/actions and short prompts) (soon).
    • Per token — meter prompt + completion tokens (finalized after generation/stream end).
    • Hybrid — minimum fee + per‑token to keep margins predictable across varied prompts (soon).
    • Credits/promos — trial or promo balances to reduce onboarding friction.
You control pricing in the dashboard per model. Typical fields include prompt/completion prices (per million tokens), per‑request minimum fee, and optional markup.

Per‑model dynamic pricing

Use model‑specific prices to match cost and value:
  • Budget chat (e.g., openai/gpt-4o-mini): lower per‑token, low minimum.
  • Premium reasoning (e.g., larger models): higher per‑token and/or higher minimum.
  • Tool‑heavy agents: small minimum + manual per‑tool fees (see Manual charges below).

Metering details

  • Token accounting — prompt and completion tokens are computed per request.
  • Streaming — token totals are finalized at the end of the stream; charges are written to the ledger then.
  • Retries — if you retry a non‑idempotent POST (e.g., a manual charge) use the same Idempotency-Key so you don’t double‑charge.
  • Combination — automatic token charges and your manual charges can coexist in the same ledger period.