Phipps Daily Driver

model cost × quality × usage projection — march 21 2026 — eval in progress

9
Candidates
3 scored, 6 running
Flash Lite
Current Primary
Gemini 3.1 — since Mar 7
$35.25
OpenRouter / Month
$24.50/wk · $130 all-time
$0.39
Cheapest Blended
DeepSeek V3.2 direct
4.42
Best Eval Score
GPT-5.4 Mini — $3.56 blended

1 Model Comparison

Model Provider Input $/M Output $/M Blended $/M Cache $/M Eval Speed Plans Verdict
DeepSeek V3.2 Direct $0.28 $0.42 $0.39 $0.02890% off input 4.45 39 Pay-as-you-go Winner
MiniMax M2.7 Direct $0.30 $1.20 $0.98 4.30 Coding Plan$10/mo (100 prompts/5hr) Value pick
GPT-5.4 Nano OpenAI $0.20 $1.25 $1.04 $0.07562% off input 4.03 194 Pay-as-you-goSubs = no API Fastest
Gemini 3.1 Flash Lite Google $0.25 $1.50 $1.19 4.45 164 Free tier5-15 RPM, 1K req/day Co-winner
Kimi K2.5 MoonshotOR cheaper $0.60OR: $0.45 $2.50OR: $2.20 $1.76 $0.1075% off input 4.08 35 Moderato $19/moChat only, API separate. Also in Alibaba Pro $50/mo Slow + pricey
Qwen 3.5 397B OpenRouterDirect 2.3x cheaper $0.39Direct: $0.17 $2.34Direct: $1.00 $1.83Direct: ~$0.79 4.25 53 Alibaba Pro $50/mo90K req, multi-model ($15 intro) Overpriced via OR
Gemini 3 Flash Google $0.50 $3.00 $2.38 4.28 168 Free tier Solid
MiMo V2 Pro OpenRouter $1.00 $3.00 $2.50 4.07 67 Pay-as-you-go Not worth it
GPT-5.4 Mini OpenAI $0.75 $4.50 $3.56 $0.07590% off input 4.42 255 Pay-as-you-go Quality king
Blended = (1 × input + 3 × output) / 4 per 1M tokens Speed = tokens/sec Prices verified 2026-03-21

2 Monthly Cost Projections

What each model costs at different usage levels. Assumes 30% input / 70% output token split. DeepSeek cache column assumes 80% cache hit rate on input.

Model Light2M tok/mo Moderate4M tok/mo Heavy8M tok/mo w/ Cache (Mod.)if available
DeepSeek V3.2 $0.76 $1.51 $3.02 $1.27
MiniMax M2.7 $1.86 $3.72 $7.44
GPT-5.4 Nano $1.87 $3.74 $7.48 $3.62
Gemini 3.1 Flash Lite $2.25 $4.50 $9.00
Kimi K2.5 (OR) $3.35 $6.70 $13.40
Gemini 3 Flash $4.50 $9.00 $18.00
GPT-5.4 Mini $6.75 $13.50 $27.00 $12.85
Token split: 30% input / 70% output Cache assumes 80% hit rate on input tokens (system prompt reuse)

3 Decision Matrix

Recommendation
Gemini 3.1 Flash Lite (primary) + DeepSeek V3.2 (fallback)
Both scored 4.45 quality — tied for #1. Flash Lite is 4x faster (164 vs 39 tok/s) and runs on Google's free tier. DeepSeek is cheapest if free tier limits are hit. Current Phipps config already has this — keep it.
$0/mo (free tier) to ~$1.27/mo (if fallback kicks in)
If you engage Phipps heavily
DeepSeek V3.2 primary
If you exceed Google's free tier (1K req/day), swap to DeepSeek primary. Same quality (4.45), cache makes it near-free on input. At 8M tok/mo still only $3.02.
$0.39/M blended → ~$1.27–$3.02/mo
Best quality (if needed)
GPT-5.4 Mini
Highest eval score (4.42) from prior run. Fast at 255 tok/s. But output pricing is 10x DeepSeek and quality gap is negligible vs the top 2.
$3.56/M blended → ~$13.50/mo
Skip these
Kimi K2.5, MiMo V2 Pro
Kimi: 90s avg latency, 96.7% reliability, mediocre quality (4.08) at premium price. MiMo: 4.07 quality at $2.50/M — no advantage over cheaper models. Both are out.
Poor value — remove from fallback chain
Best monthly plan
Google AI Pro ($19.99/mo)
Includes $10/mo Google Cloud credits usable for Gemini API. Covers ~26M output tokens of Flash Lite — 6x your moderate usage. Plus: 2TB storage, Jules coding agent, Veo 3.1. If you're going to pay for a plan, this is it — the #1 eval model's provider giving you API credits.
$19.99/mo — $10 of it directly funds Phipps API calls
Other plans
MiniMax Coding $10/mo · Alibaba Pro $50/mo
MiniMax gives M2.7 (scored 4.30) for $10/mo fixed. Alibaba Pro gives multi-model (Qwen 3.5+, Kimi K2.5, GLM-5, M2.5) for $50/mo ($15 intro). Neither beats pay-as-you-go DeepSeek/Gemini at your usage level.
Niche — only if you want those specific models

4 Monthly Plans & Subscriptions

OpenRouter
No subscription plans. Buy credits ($5–$25K), 5.5% fee charged at purchase. Zero per-token markup — you pay provider rates. BYOK: 1M req/mo free, then 5%. Your account: $130.12 all-time, $35.25 this month.
Monthly spend
$35.25 / ~$50 projected · includes 5.5% fee
Google Gemini
Free tier: 5–15 RPM, 1K req/day, no credit card. Covers light Phipps use.

Google AI Pro ($19.99/mo): Includes $10/mo Google Cloud credits usable for Gemini API calls via AI Studio or Vertex AI. That's ~26M output tokens of Flash Lite — far more than Phipps would use. Also: 2TB storage, Veo 3.1, Jules coding agent, 128K context.

Google AI Ultra ($249.99/mo): $100/mo cloud credits, 1M context, Deep Think, Gemini Agent. Overkill for Phipps.

Batch API: 50% off all models. Tiered billing starts Apr 1, 2026.
Free tier covers Phipps. Pro $20/mo = $10 API credits if needed.
DeepSeek
Pay-as-you-go only. 5M free tokens for new accounts (30 day expiry). Cache read hits = $0.028/M input (90% off). Phipps system prompt (~4K tokens) cache-hits after first message. At moderate use: ~$1.27/mo with cache.
Cache = hidden discount
OpenAI
ChatGPT subs (Plus $20, Pro $200) give ZERO API access. API is entirely separate, pay-as-you-go. Usage tiers unlock rate limits. Cache: 50–90% off repeated input (90% for GPT-5 family). Batch API: 50% off, 24h processing. No plan helps with Phipps routing.
MiniMax
Coding Plan now includes M2.7 (upgraded from M2.1). Starter $10/mo (100 prompts/5hr), Plus $20/mo (300), Max $50/mo (1000). Annual saves ~17%. Works as sk-sp- API key for coding tools. Pay-as-you-go M2.7: $0.30/$1.20 per M. Auto-caching included.
$10/mo for M2.7 access
Moonshot / Kimi
Moderato $19/mo is chat-only — does NOT include API credits. API is separate pay-as-you-go. Cache hits: $0.10/M input (75% off, automatic). OpenRouter is cheaper than direct for K2.5 ($0.45 vs $0.60 input).
Alibaba / Qwen
Pro Coding Plan $50/mo ($15 intro). Multi-model: Qwen 3.5+, Kimi K2.5, GLM-5, MiniMax M2.5 — 8 models, one API key. 90K req/mo (effective ~3K–18K prompts). Lite plan ($10/mo) discontinued Mar 20.
Multi-model bundle

5 Eval Status

Eval v5 Harness — 9 Models, 20 Tests Each, 3-Run Averaging
All 9 models evaluated (March 21):
DeepSeek V3.2 = 4.45 · Gemini 3.1 Flash Lite = 4.45 · GPT-5.4 Mini = 4.42 · MiniMax M2.7 = 4.30 · Gemini 3 Flash = 4.28 · Qwen 3.5 397B = 4.25 · Kimi K2.5 = 4.08 · MiMo V2 Pro = 4.07 · GPT-5.4 Nano = 4.03

Local reference (Thurin):
Thurin 27B = 4.47 (Opus-distilled dense) · Thurin v1.1 80B = 4.25 (PE SFT MoE)

Key question: Can DeepSeek V3.2 or GPT-5.4 Nano deliver 4.25+ quality at under $1/M blended? If yes, that's the daily driver. If not, M2.7 at $0.98 blended with 4.30 quality is the default pick. The page will auto-update when eval results are in.
Bottom Line — What To Do With Results
Scenario A: DeepSeek V3.2 scores 4.25+
Switch primary to DeepSeek direct. Monthly cost drops to ~$1.27 at moderate use (from $35+ on OpenRouter). Cache makes it even cheaper at higher volume. OpenRouter becomes pure fallback.

Scenario B: DeepSeek scores <4.25 but Nano scores 4.25+
GPT-5.4 Nano at $1.04 blended. Fast (214 tok/s), cached input. Good daily driver at ~$3.62/mo with cache. Fallback to M2.7.

Scenario C: Neither hits 4.25
MiniMax M2.7 (4.30, $0.98) becomes the pick. Investigate their monthly plan pricing. Or stay on Flash Lite (free) if eval scores are competitive.

In all scenarios: Moving off OpenRouter to direct APIs saves the 5.5% markup and gives access to provider-specific caching. Even at current $35/mo, that's $2/mo in pure markup.