Question 1

What is cheapestinference?

Accepted Answer

cheapestinference is an inference service where you take out one or more unlimited subscriptions — each covering a daily 8-hour time block — entirely via API. During every block you've reserved, usage of all served models is unlimited, for a fixed monthly fee instead of per-token billing. It is drop-in compatible with the OpenAI and Anthropic SDKs.

Question 2

What does "unlimited-token" mean?

Accepted Answer

It means that during the 8-hour block (or blocks) you've reserved, your usage is completely unlimited — no tokens to count, no budget cap, and no overage charges. The only restriction is two simultaneous requests per key; create additional keys to run requests in parallel. Within your reserved hours you can send as much as you want.

Question 3

Which models does cheapestinference support?

Accepted Answer

Three frontier open-source models: Kimi K2.6 (Moonshot), GLM 5.2 (Zhipu / Z.ai), and MiniMax M3 (MiniMax), whose 1M-token context window handles entire codebases and long agent runs. Inference is unlimited for all three — every reserved block gives uncapped usage across the entire set, with no separate full-catalog tier.

Question 4

How is this different from OpenRouter, Together AI, or Fireworks?

Accepted Answer

Those providers charge per token, so your cost rises with usage and can spike unpredictably. cheapestinference charges a fixed monthly fee, so your cost is the same whether you send one request or millions during your reserved hours. We also offer multi-key subscriptions through the Management API, so platforms can provision and manage unlimited subscriptions on behalf of their own customers.

Question 5

Can I use it as a Claude Code or Cursor alternative for unlimited AI coding?

Accepted Answer

Yes. The Unlimited Kimi K2.6 pool targets exactly this use case. Because the API is OpenAI-compatible, it works in Cline, Roo Code, Continue, and any client that accepts a custom OpenAI base URL.

Question 6

Is the API OpenAI- and Anthropic-compatible?

Accepted Answer

Yes. Point your client's base_url to https://api.cheapestinference.com/v1 for the OpenAI format or /anthropic for the Anthropic format, and use your subscriber key as the bearer token. No other code changes are needed.

Question 7

Can I pay with crypto / USDC?

Accepted Answer

Yes. cheapestinference accepts USDC on Base L2 via any wagmi-compatible wallet (MetaMask, Coinbase Wallet) as well as credit and debit cards through Stripe. There is no auto-renewal — subscriptions last 30 days and you renew manually.

Question 8

Is my data private — do you store or train on my prompts?

Accepted Answer

Privacy is the default, not an upsell. Your prompts, completions, attachments, and tool outputs are processed in memory and discarded the instant the request finishes — nothing is written to disk. We never train, fine-tune, or evaluate any model on your data, and you keep full ownership of everything you send and receive. The only thing we keep is usage metadata (token and request counts, cost per key) for billing — never the content — and even that is aggregated and anonymized after 90 days.

Question 9

Does anyone else see my data — other providers, partners, or ad networks?

Accepted Answer

No. Your prompts and completions are processed only on infrastructure we operate, under our own security and confidentiality controls — and we remain fully responsible for your data at every step. We never sell it, share it, or repurpose it, and it is never used to train any model. Any partner involved in running inference acts solely on our instructions and is bound by the same zero-retention, no-training terms. Inference is effectively zero-retention by design: there is no prompt database and nothing is ever fed back into a model. Enterprises that need a formal Data Processing Agreement or written zero-retention terms can request them at privacy@cheapestinference.com.

Question 10

What compliance and data rights do you support?

Accepted Answer

We are built to be GDPR-friendly — EU users can request data export or deletion at any time, data is handled in secure, audited infrastructure, and per-key usage is fully isolated: even a platform provisioning keys for its own customers can never see those customers' request content, because nothing is stored. Full details, retention windows, and contact info live in our Privacy & Data Handling policy and Terms.

Cheapest Inference for Open-Source AI Models

API Keys & Management

Privacy first.

Cheapest Inference for Open-Source AI Models

How unlimited time-window LLM inference works

Unlimited LLM API plans — flat monthly pricing, no per-token billing

API Keys & Management

Privacy first.