What each model costs at different usage levels. Assumes 30% input / 70% output token split. DeepSeek cache column assumes 80% cache hit rate on input.
Recommendation
Gemini 3.1 Flash Lite (primary) + DeepSeek V3.2 (fallback)
Both scored 4.45 quality — tied for #1. Flash Lite is 4x faster (164 vs 39 tok/s) and runs on Google's free tier. DeepSeek is cheapest if free tier limits are hit. Current Phipps config already has this — keep it.
$0/mo (free tier) to ~$1.27/mo (if fallback kicks in)
If you engage Phipps heavily
DeepSeek V3.2 primary
If you exceed Google's free tier (1K req/day), swap to DeepSeek primary. Same quality (4.45), cache makes it near-free on input. At 8M tok/mo still only $3.02.
$0.39/M blended → ~$1.27–$3.02/mo
Best quality (if needed)
GPT-5.4 Mini
Highest eval score (4.42) from prior run. Fast at 255 tok/s. But output pricing is 10x DeepSeek and quality gap is negligible vs the top 2.
$3.56/M blended → ~$13.50/mo
Skip these
Kimi K2.5, MiMo V2 Pro
Kimi: 90s avg latency, 96.7% reliability, mediocre quality (4.08) at premium price. MiMo: 4.07 quality at $2.50/M — no advantage over cheaper models. Both are out.
Poor value — remove from fallback chain
Best monthly plan
Google AI Pro ($19.99/mo)
Includes $10/mo Google Cloud credits usable for Gemini API. Covers ~26M output tokens of Flash Lite — 6x your moderate usage. Plus: 2TB storage, Jules coding agent, Veo 3.1. If you're going to pay for a plan, this is it — the #1 eval model's provider giving you API credits.
$19.99/mo — $10 of it directly funds Phipps API calls
Other plans
MiniMax Coding $10/mo · Alibaba Pro $50/mo
MiniMax gives M2.7 (scored 4.30) for $10/mo fixed. Alibaba Pro gives multi-model (Qwen 3.5+, Kimi K2.5, GLM-5, M2.5) for $50/mo ($15 intro). Neither beats pay-as-you-go DeepSeek/Gemini at your usage level.
Niche — only if you want those specific models
OpenRouter
No subscription plans. Buy credits ($5–$25K), 5.5% fee charged at purchase. Zero per-token markup — you pay provider rates. BYOK: 1M req/mo free, then 5%. Your account: $130.12 all-time, $35.25 this month.
Monthly spend
$35.25 / ~$50 projected · includes 5.5% fee
Google Gemini
Free tier: 5–15 RPM, 1K req/day, no credit card. Covers light Phipps use.
Google AI Pro ($19.99/mo): Includes $10/mo Google Cloud credits usable for Gemini API calls via AI Studio or Vertex AI. That's ~26M output tokens of Flash Lite — far more than Phipps would use. Also: 2TB storage, Veo 3.1, Jules coding agent, 128K context.
Google AI Ultra ($249.99/mo): $100/mo cloud credits, 1M context, Deep Think, Gemini Agent. Overkill for Phipps.
Batch API: 50% off all models. Tiered billing starts Apr 1, 2026.
Free tier covers Phipps. Pro $20/mo = $10 API credits if needed.
DeepSeek
Pay-as-you-go only. 5M free tokens for new accounts (30 day expiry). Cache read hits = $0.028/M input (90% off). Phipps system prompt (~4K tokens) cache-hits after first message. At moderate use: ~$1.27/mo with cache.
Cache = hidden discount
OpenAI
ChatGPT subs (Plus $20, Pro $200) give ZERO API access. API is entirely separate, pay-as-you-go. Usage tiers unlock rate limits. Cache: 50–90% off repeated input (90% for GPT-5 family). Batch API: 50% off, 24h processing. No plan helps with Phipps routing.
MiniMax
Coding Plan now includes M2.7 (upgraded from M2.1). Starter $10/mo (100 prompts/5hr), Plus $20/mo (300), Max $50/mo (1000). Annual saves ~17%. Works as sk-sp- API key for coding tools. Pay-as-you-go M2.7: $0.30/$1.20 per M. Auto-caching included.
$10/mo for M2.7 access
Moonshot / Kimi
Moderato $19/mo is chat-only — does NOT include API credits. API is separate pay-as-you-go. Cache hits: $0.10/M input (75% off, automatic). OpenRouter is cheaper than direct for K2.5 ($0.45 vs $0.60 input).
Alibaba / Qwen
Pro Coding Plan $50/mo ($15 intro). Multi-model: Qwen 3.5+, Kimi K2.5, GLM-5, MiniMax M2.5 — 8 models, one API key. 90K req/mo (effective ~3K–18K prompts). Lite plan ($10/mo) discontinued Mar 20.
Multi-model bundle