42% of current LLM spend · 4 actionable recommendations
Projected from the last 24h × 30 days
Cost, latency, or quality drift vs. baseline
Recommendations · sorted by savings
What to fix this week.
Route short support_bot queries to gpt-4o-mini
110 of 119 support_bot calls in 24h were under 600 input tokens on claude-opus-4-7.
Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_prompt < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.
Enable prompt caching for chat_summary
84 calls share a 2.4k-token system prompt on claude-opus-4-7.
Anthropic prompt caching cuts repeated system-prompt cost by ~90% after the first hit. Set cache_control: {"type":"ephemeral"} on the system block — no code path change required on Peekr's side.
Fine-tune for support_bot (high-volume on premium model)
119 support_bot calls in 24h, 100% on premium models.
At this volume a fine-tuned smaller model typically reaches ≥95% of frontier quality on a constrained task. Sample 5k spans, fine-tune gpt-4o-mini, A/B against current. Training cost recovers in ~5 days at current spend.
Fine-tune for search_qa (high-volume on premium model)
90 search_qa calls in 24h, 100% on premium models.
At this volume a fine-tuned smaller model typically reaches ≥95% of frontier quality on a constrained task. Sample 5k spans, fine-tune gpt-4o-mini, A/B against current. Training cost recovers in ~5 days at current spend.
Anomalies · last 7 days
When things changed without you noticing.
chat_summary cost +38% vs 7-day baseline
Triggered when chat_summary defaulted back to claude-opus-4-7 on 2026-05-18. Volume held flat — the spike is purely model-mix.
Inspect a representative trace →tool.web_fetch p95 latency doubled
p95 jumped from 480ms to 980ms after the 13:00 deploy. Hit rate on the downstream proxy dropped — likely cache invalidation.
data_extraction hallucination rate up 11pp
Switched from claude-opus-4-7 to claude-sonnet-4-6 on the structured extraction prompt. Quality regressed; estimated $/correct-answer is actually higher.
Top spenders
Which users cost you the most.
| User | Share | Calls | Top feature | Models used | 24h | Projected /mo |
|---|---|---|---|---|---|---|
he u_heavy_19 | 3.6% | 40 | data_extraction | 3 | $0.652 | $19.57 |
he u_heavy_27 | 3.3% | 15 | chat_summary | 1 | $0.592 | $17.77 |
he u_heavy_37 | 2.7% | 25 | moderation | 3 | $0.493 | $14.78 |
he u_heavy_30 | 2.4% | 14 | support_bot | 1 | $0.430 | $12.90 |
he u_heavy_46 | 2.3% | 9 | code_assist | 1 | $0.421 | $12.64 |
he u_heavy_18 | 2.2% | 16 | search_qa | 2 | $0.399 | $11.98 |
Want these recommendations on your real traffic?
Sign in, mint a key, ship spans. Peekr starts surfacing optimizations the moment your first batch lands.