- Send a stable, pseudonymous
user
on every billable request. - Use idempotency for non‑idempotent POSTs.
- Let the proxy return renderable assistant messages for auth/top‑ups (no special branching).
Per‑token metering (chat)
When to use- General chat UX where prompts vary widely and you need margin predictability.
- Configure per‑model prices for prompt/completion tokens plus your desired markup.
- Charge = token_cost × markup.
- In the dashboard, set model pricing.
- Use the Chat Completions endpoint normally; metering finalizes at end of stream.
Charge per tool call
When to use- Tools (e.g., web browse, RAG search, PDF parse) where value isn’t well expressed in tokens.
- Combine automatic chat metering with a flat, manual charge per tool call.
- Manual charge:
POST /v1/user/charge
- Optional balance check (display):
GET /v1/user/balance
- Before executing the tool, check user’s balance (optional).
- If balance is enough, perform the tool action. On success, call
POST /user/charge
with an idempotency key (e.g., the tool run id). - If
success: false
, render the assistant message (auth/top‑up link) and stop the tool.
- Charge after your tool succeeds to avoid refunds/adjustments.
- If you must pre‑authorize, use a small charge first, then a second charge on completion.
Freemium → prepaid top‑ups → subscription + overage
Goal- Smooth path from zero‑friction trial to predictable recurring revenue, with fair overages.
- Freemium: grant trial credits by calling
POST /v1/user/balance/deposit
from your backend when a new user onboards (label astrial
in metadata). - Prepaid: in Default mode use Stripe to sell credits; after payment success, call Deposit. In Shared mode, hosted top‑ups handle funding automatically.
- Overage: normal metering and/or manual charges apply once credits are used.
- Native subscriptions (alongside usage) are on the roadmap; for now, trigger periodic deposits from your billing system.
- Show remaining credits and estimated cost per action to improve conversion.
- For subscriptions, align the deposit amount with the monthly entitlement; Allow to top up once the balance hits zero.
Multi‑agent app with per‑model pricing
When to use- Multi‑tool/agent systems where some actions require premium models while others run fine on small models.
- Map each agent to a model id (and price). Expose the model tier in UI so users understand cost/performance.
- Apply guardrails: if balance is low, fall back to a cheaper model or pause premium agents.
- Keep agent → model mapping centralized so analytics can attribute cost/revenue per agent.
- Consider a small per‑request minimum fee for premium agents to stabilize margins.
Guardrails & UX patterns
- Idempotency: use
Idempotency-Key
for all manual charges/deposits. - Messaging: always render the assistant message when a request is blocked for auth/top‑up—no custom UI needed.
- Analytics: label
metadata
withrequestId
,tool
, andagent
for clear reporting.