Phipps Daily Driver

model cost × quality × usage projection — march 21 2026 — eval in progress

Candidates

3 scored, 6 running

Flash Lite

Current Primary

Gemini 3.1 — since Mar 7

$35.25

OpenRouter / Month

$24.50/wk · $130 all-time

$0.39

Cheapest Blended

DeepSeek V3.2 direct

4.42

Best Eval Score

GPT-5.4 Mini — $3.56 blended

1 Model Comparison

Model	Provider	Input $/M	Output $/M	Blended $/M	Cache $/M	Eval	Speed	Plans	Verdict
DeepSeek V3.2	Direct	$0.28	$0.42	$0.39	$0.02890% off input	4.45	39	Pay-as-you-go	Winner
MiniMax M2.7	Direct	$0.30	$1.20	$0.98	—	4.30	—	Coding Plan$10/mo (100 prompts/5hr)	Value pick
GPT-5.4 Nano	OpenAI	$0.20	$1.25	$1.04	$0.07562% off input	4.03	194	Pay-as-you-goSubs = no API	Fastest
Gemini 3.1 Flash Lite	Google	$0.25	$1.50	$1.19	—	4.45	164	Free tier5-15 RPM, 1K req/day	Co-winner
Kimi K2.5	MoonshotOR cheaper	$0.60OR: $0.45	$2.50OR: $2.20	$1.76	$0.1075% off input	4.08	35	Moderato $19/moChat only, API separate. Also in Alibaba Pro $50/mo	Slow + pricey
Qwen 3.5 397B	OpenRouterDirect 2.3x cheaper	$0.39Direct: $0.17	$2.34Direct: $1.00	$1.83Direct: ~$0.79	—	4.25	53	Alibaba Pro $50/mo90K req, multi-model ($15 intro)	Overpriced via OR
Gemini 3 Flash	Google	$0.50	$3.00	$2.38	—	4.28	168	Free tier	Solid
MiMo V2 Pro	OpenRouter	$1.00	$3.00	$2.50	—	4.07	67	Pay-as-you-go	Not worth it
GPT-5.4 Mini	OpenAI	$0.75	$4.50	$3.56	$0.07590% off input	4.42	255	Pay-as-you-go	Quality king
Blended = (1 × input + 3 × output) / 4 per 1M tokens				Speed = tokens/sec		Prices verified 2026-03-21

2 Monthly Cost Projections

What each model costs at different usage levels. Assumes 30% input / 70% output token split. DeepSeek cache column assumes 80% cache hit rate on input.

Model	Light2M tok/mo	Moderate4M tok/mo	Heavy8M tok/mo	w/ Cache (Mod.)if available
DeepSeek V3.2	$0.76	$1.51	$3.02	$1.27
MiniMax M2.7	$1.86	$3.72	$7.44	—
GPT-5.4 Nano	$1.87	$3.74	$7.48	$3.62
Gemini 3.1 Flash Lite	$2.25	$4.50	$9.00	—
Kimi K2.5 (OR)	$3.35	$6.70	$13.40	—
Gemini 3 Flash	$4.50	$9.00	$18.00	—
GPT-5.4 Mini	$6.75	$13.50	$27.00	$12.85
Token split: 30% input / 70% output		Cache assumes 80% hit rate on input tokens (system prompt reuse)

3 Decision Matrix

Recommendation

Gemini 3.1 Flash Lite (primary) + DeepSeek V3.2 (fallback)

Both scored 4.45 quality — tied for #1. Flash Lite is 4x faster (164 vs 39 tok/s) and runs on Google's free tier. DeepSeek is cheapest if free tier limits are hit. Current Phipps config already has this — keep it.

$0/mo (free tier) to ~$1.27/mo (if fallback kicks in)

If you engage Phipps heavily

DeepSeek V3.2 primary

If you exceed Google's free tier (1K req/day), swap to DeepSeek primary. Same quality (4.45), cache makes it near-free on input. At 8M tok/mo still only $3.02.

$0.39/M blended → ~$1.27–$3.02/mo

Best quality (if needed)

GPT-5.4 Mini

Highest eval score (4.42) from prior run. Fast at 255 tok/s. But output pricing is 10x DeepSeek and quality gap is negligible vs the top 2.

$3.56/M blended → ~$13.50/mo

Skip these

Kimi K2.5, MiMo V2 Pro

Kimi: 90s avg latency, 96.7% reliability, mediocre quality (4.08) at premium price. MiMo: 4.07 quality at $2.50/M — no advantage over cheaper models. Both are out.

Poor value — remove from fallback chain

Best monthly plan

Google AI Pro ($19.99/mo)

Includes $10/mo Google Cloud credits usable for Gemini API. Covers ~26M output tokens of Flash Lite — 6x your moderate usage. Plus: 2TB storage, Jules coding agent, Veo 3.1. If you're going to pay for a plan, this is it — the #1 eval model's provider giving you API credits.

$19.99/mo — $10 of it directly funds Phipps API calls

Other plans

MiniMax Coding $10/mo · Alibaba Pro $50/mo

MiniMax gives M2.7 (scored 4.30) for $10/mo fixed. Alibaba Pro gives multi-model (Qwen 3.5+, Kimi K2.5, GLM-5, M2.5) for $50/mo ($15 intro). Neither beats pay-as-you-go DeepSeek/Gemini at your usage level.

Niche — only if you want those specific models

4 Monthly Plans & Subscriptions

OpenRouter

No subscription plans. Buy credits ($5–$25K), 5.5% fee charged at purchase. Zero per-token markup — you pay provider rates. BYOK: 1M req/mo free, then 5%. Your account: $130.12 all-time, $35.25 this month.

Monthly spend

$35.25 / ~$50 projected · includes 5.5% fee

Google Gemini

Free tier: 5–15 RPM, 1K req/day, no credit card. Covers light Phipps use.

Google AI Pro ($19.99/mo): Includes $10/mo Google Cloud credits usable for Gemini API calls via AI Studio or Vertex AI. That's ~26M output tokens of Flash Lite — far more than Phipps would use. Also: 2TB storage, Veo 3.1, Jules coding agent, 128K context.

Google AI Ultra ($249.99/mo): $100/mo cloud credits, 1M context, Deep Think, Gemini Agent. Overkill for Phipps.

Batch API: 50% off all models. Tiered billing starts Apr 1, 2026.

Free tier covers Phipps. Pro $20/mo = $10 API credits if needed.

DeepSeek

Pay-as-you-go only. 5M free tokens for new accounts (30 day expiry). Cache read hits = $0.028/M input (90% off). Phipps system prompt (~4K tokens) cache-hits after first message. At moderate use: ~$1.27/mo with cache.

Cache = hidden discount

OpenAI

ChatGPT subs (Plus $20, Pro $200) give ZERO API access. API is entirely separate, pay-as-you-go. Usage tiers unlock rate limits. Cache: 50–90% off repeated input (90% for GPT-5 family). Batch API: 50% off, 24h processing. No plan helps with Phipps routing.

MiniMax

Coding Plan now includes M2.7 (upgraded from M2.1). Starter $10/mo (100 prompts/5hr), Plus $20/mo (300), Max $50/mo (1000). Annual saves ~17%. Works as sk-sp- API key for coding tools. Pay-as-you-go M2.7: $0.30/$1.20 per M. Auto-caching included.

$10/mo for M2.7 access

Moonshot / Kimi

Moderato $19/mo is chat-only — does NOT include API credits. API is separate pay-as-you-go. Cache hits: $0.10/M input (75% off, automatic). OpenRouter is cheaper than direct for K2.5 ($0.45 vs $0.60 input).

Alibaba / Qwen

Pro Coding Plan $50/mo ($15 intro). Multi-model: Qwen 3.5+, Kimi K2.5, GLM-5, MiniMax M2.5 — 8 models, one API key. 90K req/mo (effective ~3K–18K prompts). Lite plan ($10/mo) discontinued Mar 20.

Multi-model bundle

5 Eval Status

Eval v5 Harness — 9 Models, 20 Tests Each, 3-Run Averaging

All 9 models evaluated (March 21):
DeepSeek V3.2 = 4.45 · Gemini 3.1 Flash Lite = 4.45 · GPT-5.4 Mini = 4.42 · MiniMax M2.7 = 4.30 · Gemini 3 Flash = 4.28 · Qwen 3.5 397B = 4.25 · Kimi K2.5 = 4.08 · MiMo V2 Pro = 4.07 · GPT-5.4 Nano = 4.03

Local reference (Thurin):
Thurin 27B = 4.47 (Opus-distilled dense) · Thurin v1.1 80B = 4.25 (PE SFT MoE)

Key question: Can DeepSeek V3.2 or GPT-5.4 Nano deliver 4.25+ quality at under $1/M blended? If yes, that's the daily driver. If not, M2.7 at $0.98 blended with 4.30 quality is the default pick. The page will auto-update when eval results are in.

Bottom Line — What To Do With Results

Scenario A: DeepSeek V3.2 scores 4.25+
Switch primary to DeepSeek direct. Monthly cost drops to ~$1.27 at moderate use (from $35+ on OpenRouter). Cache makes it even cheaper at higher volume. OpenRouter becomes pure fallback.

Scenario B: DeepSeek scores <4.25 but Nano scores 4.25+
GPT-5.4 Nano at $1.04 blended. Fast (214 tok/s), cached input. Good daily driver at ~$3.62/mo with cache. Fallback to M2.7.

Scenario C: Neither hits 4.25
MiniMax M2.7 (4.30, $0.98) becomes the pick. Investigate their monthly plan pricing. Or stay on Flash Lite (free) if eval scores are competitive.

In all scenarios: Moving off OpenRouter to direct APIs saves the 5.5% markup and gives access to provider-specific caching. Even at current $35/mo, that's $2/mo in pure markup.