Star ๅކๅฒ่ถ‹ๅŠฟ
ๆ•ฐๆฎๆฅๆบ: GitHub API ยท ็”Ÿๆˆ่‡ช Stargazers.cn
README.md

Awesome Free LLM APIs

Awesome

LLM APIs with permanent free tiers for text inference.



Contents

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

Cohere ๐Ÿ‡จ๐Ÿ‡ฆ

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Base URL: https://api.cohere.com/v2

Model NameContextMax OutputModalityRate Limit
Command A (111B)256K4KText20 RPM
Command R+128K4KText20 RPM
Command R128K4KText20 RPM
Command R7B128K4KText20 RPM
Embed 4โ€”โ€”Embeddings (Text + Image)2,000 inputs/min
Rerank 3.5โ€”โ€”Reranking10 RPM

Google Gemini ๐Ÿ‡บ๐Ÿ‡ธ

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. 1

Base URL: https://generativelanguage.googleapis.com/v1beta

Model NameContextMax OutputModalityRate Limit
Gemini 2.5 Flash1M65KText + Image + Audio + Video10 RPM, 250 RPD
Gemini 2.5 Flash-Lite1M65KText + Image + Audio + Video15 RPM, 1,000 RPD

Mistral AI ๐Ÿ‡ซ๐Ÿ‡ท

Free "Experiment" plan, no credit card. ~1B tokens/month.

Base URL: https://api.mistral.ai/v1

Model NameContextMax OutputModalityRate Limit
Mistral Small 4256K256KText + Image + Code~1 RPS, 500K TPM
Mistral Medium 3128K128KText~1 RPS, 500K TPM
Mistral Large 3256K256KText~1 RPS, 500K TPM
Mistral Nemo (12B)128K128KText~1 RPS, 500K TPM
Codestral256K256KCode~1 RPS, 500K TPM
Pixtral Large128K128KText + Image~1 RPS, 500K TPM

Z AI (Zhipu AI) ๐Ÿ‡จ๐Ÿ‡ณ

Permanent free models, no credit card required.

Base URL: https://open.bigmodel.cn/api/paas/v4

Model NameContextMax OutputModalityRate Limit
GLM-4.7-Flash200K128KText1 concurrent request
GLM-4.5-Flash128K~8KText1 concurrent request
GLM-4.6V-Flash128K~4KText + Image1 concurrent request

Inference providers

Third-party platforms that host open-weight models from various sources.

Cerebras ๐Ÿ‡บ๐Ÿ‡ธ

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap.

Base URL: https://api.cerebras.ai/v1

Model NameContextMax OutputModalityRate Limit
llama3.1-8b128K (8K on free)8KText30 RPM, 14,400 RPD, 1M TPD
gpt-oss-120b128K (8K on free)8KText30 RPM, 14,400 RPD, 1M TPD
qwen-3-235b-a22b-instruct-2507131K (8K on free)8KText30 RPM, 14,400 RPD, 1M TPD
zai-glm-4.7128K (8K on free)8KText10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI ๐Ÿ‡บ๐Ÿ‡ธ

10,000 Neurons/day free. 50+ models available on free tier.

Base URL: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run

Model NameContextMax OutputModalityRate Limit
@cf/meta/llama-3.3-70b-instruct-fp8-fast131KShared w/ contextText10K neurons/day (shared)
@cf/meta/llama-3.1-8b-instruct-fp8-fast131KShared w/ contextText10K neurons/day (shared)
@cf/meta/llama-3.2-11b-vision-instruct131KShared w/ contextText + Vision10K neurons/day (shared)
@cf/meta/llama-4-scout-17b-16e-instructUp to 10MShared w/ contextMultimodal10K neurons/day (shared)
@cf/mistralai/mistral-small-3.1-24b-instruct128KShared w/ contextText10K neurons/day (shared)
@cf/google/gemma-4-26b-a4b-it256KShared w/ contextText10K neurons/day (shared)
@cf/qwen/qwq-32b32KShared w/ contextText10K neurons/day (shared)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b32KShared w/ contextText10K neurons/day (shared)
+ 42 more modelsVariesVariesText, Image, Audio, Embeddings10K neurons/day (shared)

GitHub Models ๐Ÿ‡บ๐Ÿ‡ธ

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Base URL: https://models.inference.ai.azure.com

Model NameContextMax OutputModalityRate Limit
gpt-4.11M32KText10 RPM, 50 RPD
gpt-4.1-mini1M32KText15 RPM, 150 RPD
gpt-4o128K16KText + Vision10 RPM, 50 RPD
o3-mini200K100KText (reasoning)10 RPM, 50 RPD
o4-mini200K100KText (reasoning)10 RPM, 50 RPD
Llama-4-Scout-17B-16E512K~4KText + Vision15 RPM, 150 RPD
Llama-4-Maverick-17B-128E256K~4KText + Vision10 RPM, 50 RPD
Meta-Llama-3.3-70B131K~4KText15 RPM, 150 RPD
DeepSeek-R164K8KText (reasoning)15 RPM, 150 RPD
Mistral-Small-3.1128K~4KText + Vision15 RPM, 150 RPD
+ 35 more modelsVariesVariesText / ImageVaries by tier

Groq ๐Ÿ‡บ๐Ÿ‡ธ

Free tier, no credit card. Ultra-fast LPU inference. 2

Base URL: https://api.groq.com/openai/v1

Model NameContextMax OutputModalityRate Limit
llama-3.3-70b-versatile131K32KText30 RPM, 14,400 RPD
llama-3.1-8b-instant131K131KText30 RPM, 14,400 RPD
llama-4-scout-17b-16e-instruct131K8KText + Vision30 RPM, 14,400 RPD
llama-4-maverick-17b-128e-instruct131K8KText + Vision15 RPM, 500 RPD
qwen3-32b131K131KText30 RPM, 14,400 RPD
gpt-oss-120b131K32KText30 RPM, 14,400 RPD
kimi-k2-instruct262K262KText30 RPM, 14,400 RPD
deepseek-r1-distill-70b131K8KText30 RPM, 14,400 RPD
whisper-large-v3โ€”โ€”Audio โ†’ Text20 RPM, 2,000 RPD
whisper-large-v3-turboโ€”โ€”Audio โ†’ Text20 RPM, 2,000 RPD

Hugging Face ๐Ÿ‡บ๐Ÿ‡ธ

Free Serverless Inference API

Base URL: https://api-inference.huggingface.co/models

Model NameContextMax OutputModalityRate Limit
Meta-Llama-3.1-8B-Instruct128K~4KText~1,000 RPD
Mistral-7B-Instruct-v0.332K~4KText~1,000 RPD
Mixtral-8x7B-Instruct-v0.132K~4KText~1,000 RPD
Phi-3.5-mini-instruct128K~4KText~1,000 RPD
Qwen2.5-7B-Instruct131K~4KText~1,000 RPD

Kilo Code ๐Ÿ‡บ๐Ÿ‡ธ

Free models with no credit card required. kilo-auto/free auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). 3

Base URL: https://api.kilo.ai/api/gateway

Model NameContextMax OutputModalityRate Limit
bytedance-seed/dola-seed-2.0-pro:freeโ€”โ€”Text~200 req/hr
x-ai/grok-code-fast-1:optimized:freeโ€”โ€”Text (code)~200 req/hr
nvidia/nemotron-3-super-120b-a12b:free262K32KText~200 req/hr
arcee-ai/trinity-large-thinking:freeโ€”โ€”Text (reasoning)~200 req/hr
openrouter/freeVariesVariesText~200 req/hr

LLM7.io ๐Ÿ‡ฌ๐Ÿ‡ง

Zero-friction API gateway. No registration needed for basic access. 30+ models.

Base URL: https://api.llm7.io/v1

Model NameContextMax OutputModalityRate Limit
deepseek-r1-0528โ€”โ€”Text (reasoning)30 RPM (120 with token)
deepseek-v3-0324โ€”โ€”Text30 RPM (120 with token)
gemini-2.5-flash-liteโ€”โ€”Text + Vision30 RPM (120 with token)
gpt-4o-miniโ€”โ€”Text + Vision30 RPM (120 with token)
mistral-small-3.1-24b32Kโ€”Text30 RPM (120 with token)
qwen2.5-coder-32bโ€”โ€”Text (code)30 RPM (120 with token)
+ ~24 more modelsVariesVariesText30 RPM (120 with token)

NVIDIA NIM ๐Ÿ‡บ๐Ÿ‡ธ

Free with NVIDIA Developer Program membership. 100+ models. No daily token cap.

Base URL: https://integrate.api.nvidia.com/v1

Model NameContextMax OutputModalityRate Limit
deepseek-ai/deepseek-r1128K~163KText (reasoning)~40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1128K4KText~40 RPM
nvidia/nemotron-3-super-120b-a12b262K262KText~40 RPM
nvidia/nemotron-3-nano-30b-a3b128K32KText~40 RPM
meta/llama-3.1-405b-instruct128K4KText~40 RPM
qwen/qwen2.5-72b-instruct128K8KText~40 RPM
google/gemma-4-31b128K8KText~40 RPM
mistralai/mistral-large-2-instruct128K4KText~40 RPM
nvidia/nemotron-nano-2-vl128K8KVision + Text + Video~40 RPM
minimax/minimax-m2.7128K8KText~40 RPM
+ 90 more modelsVariesVariesText, Image, Video, Speech, Embeddings~40 RPM

Ollama Cloud ๐Ÿ‡บ๐Ÿ‡ธ

Free tier with qualitative usage limits. 400+ models from Ollama library. Not OpenAI SDK-compatible; uses Ollama API. 4

Base URL: https://api.ollama.com

Model NameContextMax OutputModalityRate Limit
llama3.1:cloud128KModel-dependentTextSession/weekly limits (unpublished)
deepseek-r1:cloud128KModel-dependentText (reasoning)Session/weekly limits (unpublished)
qwen2.5:cloud128KModel-dependentTextSession/weekly limits (unpublished)
gemma2:cloud8KModel-dependentTextSession/weekly limits (unpublished)
mistral:cloud32KModel-dependentTextSession/weekly limits (unpublished)
+ 400 more modelsVariesVariesTextSession/weekly limits (unpublished)

OpenRouter ๐Ÿ‡บ๐Ÿ‡ธ

35+ free models (marked with :free suffix). OpenAI SDK-compatible. 5

Base URL: https://openrouter.ai/api/v1

Model NameContextMax OutputModalityRate Limit
deepseek/deepseek-r1-0528:free163K~163KText (reasoning)20 RPM, 200 RPD
deepseek/deepseek-chat-v3-0324:free163K163KText20 RPM, 200 RPD
qwen/qwen3.6-plus:free1M65KText20 RPM, 200 RPD
qwen/qwen3-coder-480b-a35b:free262K~32KText20 RPM, 200 RPD
meta-llama/llama-4-scout:free10M16KMultimodal20 RPM, 200 RPD
meta-llama/llama-4-maverick:free1M16KMultimodal20 RPM, 200 RPD
meta-llama/llama-3.3-70b-instruct:free65K~16KText20 RPM, 200 RPD
google/gemma-4-31b-it:free256K~8KMultimodal20 RPM, 200 RPD
nvidia/nemotron-3-super-120b-a12b:free1M~32KText20 RPM, 200 RPD
openai/gpt-oss-120b:free131K131KText20 RPM, 200 RPD
minimax/minimax-m2.5:free196K8KText20 RPM, 200 RPD
mistralai/devstral-2512:free256K~32KText20 RPM, 200 RPD
+ ~23 more free modelsVariesVariesText / Image20 RPM, 200 RPD

SiliconFlow ๐Ÿ‡จ๐Ÿ‡ณ

Free tier with 14 CNY signup credits. Permanently free models available.

Base URL: https://api.siliconflow.cn/v1

Model NameContextMax OutputModalityRate Limit
Qwen/Qwen3-8B131K131KText1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B~33K16KText (reasoning)1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B131KConfigurableText (reasoning)1,000 RPM, 50K TPM
THUDM/glm-4-9b-chat32K32KText1,000 RPM, 50K TPM
THUDM/GLM-4.1V-9B-Thinking66K66KVision + Text1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-OCRโ€”8KVision (OCR)1,000 RPM, 50K TPM
+ embedding/speech modelsVariesVariesEmbeddings, Speech1,000 RPM, 50K TPM

Contributing

Know a free tier that's missing? Open a PR. Include the provider, endpoint, rate limits (link to their docs), and a few notable models. Trial credits and time-limited promos don't count.

Glossary

AbbreviationMeaning
RPMRequests per minute
RPDRequests per day
TPMTokens per minute
TPDTokens per day
RPSRequests per second

Notes

  • All endpoints are OpenAI SDK-compatible unless noted.
  • Each link points to the provider's API key page.

Footnotes

  1. Free tier not available in the EU, UK, or Switzerland (available regions). โ†ฉ

  2. Groq rate limits vary by model. Llama 4 Maverick is limited to 500 RPD. Most other models get 14,400 RPD (rate limits). โ†ฉ

  3. Kilo Code free model list may change over time. nvidia/nemotron-3-super-120b-a12b:free is for trial use only โ€” prompts are logged by NVIDIA. Auto-router kilo-auto/free routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). โ†ฉ

  4. Ollama Cloud measures usage by GPU time, not tokens or requests. Free tier described as "light usage" with session limits resetting every 5 hours and weekly limits every 7 days. Pro (50x more) and Max (250x more) plans available. Not OpenAI SDK-compatible; uses Ollama API. โ†ฉ

  5. Free models default to 200 RPD. A one-time purchase of $10+ in credits unlocks 1,000 RPD for free models. OpenRouter also offers a Free Models Router (openrouter/free) and model fallbacks for chaining models in priority order. โ†ฉ

ๅ…ณไบŽ About

Permanent Free LLM API List (API Keys) ๐Ÿ˜Ž๐Ÿ”‘
ai-agentsanthropicgeminillmllm-routerllm-routingollamaopenaiopenclawopenclaw-pluginrouter

่ฏญ่จ€ Languages

JavaScript100.0%

ๆไบคๆดป่ทƒๅบฆ Commit Activity

ไปฃ็ ๆไบค็ƒญๅŠ›ๅ›พ
่ฟ‡ๅŽป 52 ๅ‘จ็š„ๅผ€ๅ‘ๆดป่ทƒๅบฆ
27
Total Commits
ๅณฐๅ€ผ: 17ๆฌก/ๅ‘จ
Less
More

ๆ ธๅฟƒ่ดก็Œฎ่€… Contributors