Insights

Save money. Fix regressions. Right-size models.

Real-time analysis of your spans. We highlight where you're overpaying and where your stack just regressed — with concrete actions, not vibes.

Available savings
$228 / mo

42% of current LLM spend · 4 actionable recommendations

Current monthly spend
$541

Projected from the last 24h × 30 days

Active anomalies
3

Cost, latency, or quality drift vs. baseline

Recommendations · sorted by savings

What to fix this week.

Cost breakdown →
Model swapsupport_botlow effort

Route short support_bot queries to gpt-4o-mini

110 of 119 support_bot calls in 24h were under 600 input tokens on claude-opus-4-7.

Now:claude-opus-4-7·110 calls/24h·303 mean input tokensProposed:gpt-4o-mini

Short prompts don't need a frontier model. Add a length check at the dispatcher: if tokens_prompt < 600, use gpt-4o-mini; otherwise fall back. Quality drop is typically negligible at this length.

Save / mo
$74
99% on feature
Prompt cachingchat_summarylow effort

Enable prompt caching for chat_summary

84 calls share a 2.4k-token system prompt on claude-opus-4-7.

Now:claude-opus-4-7·84 calls/24h·2,400 mean input tokens

Anthropic prompt caching cuts repeated system-prompt cost by ~90% after the first hit. Set cache_control: {"type":"ephemeral"} on the system block — no code path change required on Peekr's side.

Save / mo
$73
81% on feature
Fine-tunesupport_bothigh effort

Fine-tune for support_bot (high-volume on premium model)

119 support_bot calls in 24h, 100% on premium models.

At this volume a fine-tuned smaller model typically reaches ≥95% of frontier quality on a constrained task. Sample 5k spans, fine-tune gpt-4o-mini, A/B against current. Training cost recovers in ~5 days at current spend.

Save / mo
$62
75% on feature
Fine-tunesearch_qahigh effort

Fine-tune for search_qa (high-volume on premium model)

90 search_qa calls in 24h, 100% on premium models.

At this volume a fine-tuned smaller model typically reaches ≥95% of frontier quality on a constrained task. Sample 5k spans, fine-tune gpt-4o-mini, A/B against current. Training cost recovers in ~5 days at current spend.

Save / mo
$18
75% on feature

Anomalies · last 7 days

When things changed without you noticing.

costchat_summary2026-05-18 14:00

chat_summary cost +38% vs 7-day baseline

Triggered when chat_summary defaulted back to claude-opus-4-7 on 2026-05-18. Volume held flat — the spike is purely model-mix.

Inspect a representative trace →
38%
latency2026-05-19 13:18

tool.web_fetch p95 latency doubled

p95 jumped from 480ms to 980ms after the 13:00 deploy. Hit rate on the downstream proxy dropped — likely cache invalidation.

104%
qualitydata_extraction2026-05-19 09:42

data_extraction hallucination rate up 11pp

Switched from claude-opus-4-7 to claude-sonnet-4-6 on the structured extraction prompt. Quality regressed; estimated $/correct-answer is actually higher.

11%

Top spenders

Which users cost you the most.

All users →
UserShareCallsTop featureModels used24hProjected /mo
he
u_heavy_19
3.6%
40data_extraction3$0.652$19.57
he
u_heavy_27
3.3%
15chat_summary1$0.592$17.77
he
u_heavy_37
2.7%
25moderation3$0.493$14.78
he
u_heavy_30
2.4%
14support_bot1$0.430$12.90
he
u_heavy_46
2.3%
9code_assist1$0.421$12.64
he
u_heavy_18
2.2%
16search_qa2$0.399$11.98

Want these recommendations on your real traffic?

Sign in, mint a key, ship spans. Peekr starts surfacing optimizations the moment your first batch lands.

Sign in to start