Trace p50
3.45s
typical trace
Trace p95
6.19s
tail latency
Trace p99
7.20s
worst 1%
Slowest model
claude-opus-4-7
p95 2.41s
Where the time goes
Total in-trace milliseconds, split LLM vs tool. Anything outside these bars is your own code.
LLM time85%
mean 1.47s per call · 481 calls
Tool time15%
mean 358ms per call · 349 calls
Diagnostic
LLM calls are 85% of in-trace time. Model choice or prompt length is the right place to optimize.
Latency by model
Per-call distribution. Compare apples-to-apples before you switch.
| Model | Calls | Mean | p50 | p95 | p99 |
|---|---|---|---|---|---|
| claude-opus-4-7 | 260 | 1.49s | 1.51s | 2.41s | 2.48s |
| gpt-4o | 90 | 1.49s | 1.55s | 2.37s | 2.49s |
| claude-sonnet-4-6 | 71 | 1.46s | 1.50s | 2.36s | 2.39s |
| gpt-4-mini | 60 | 1.39s | 1.44s | 2.35s | 2.49s |
Slowest traces
Click any row to open its waterfall — the bar shows whether time went to the model or tools.
- agent.run7.38strace_00002r· support_bot· claude-opus-4-7LLM 6.42s (87%)Tools 856ms (12%)
- agent.run7.35strace_00003u· search_qa· gpt-4oLLM 5.95s (81%)Tools 1.32s (18%)
- workflow.execute7.20strace_00001o· chat_summary· claude-opus-4-7LLM 7.10s (99%)Tools 0ms (0%)
- agent.run6.90strace_00003t· search_qa· gpt-4oLLM 5.58s (81%)Tools 1.21s (17%)
- agent.plan6.74strace_000007· code_assist· claude-opus-4-7LLM 5.65s (84%)Tools 1.00s (15%)
- workflow.execute6.68strace_000022· chat_summary· claude-opus-4-7LLM 5.39s (81%)Tools 1.17s (18%)
- workflow.execute6.57strace_00005v· chat_summary· claude-opus-4-7LLM 5.82s (89%)Tools 639ms (10%)
- workflow.execute6.56strace_00003e· data_extraction· claude-sonnet-4-6LLM 5.92s (90%)Tools 533ms (8%)
Most developers assume the LLM is the bottleneck and swap models. It almost never is.
The waterfall tells you exactly where time went. If tools dominate, parallelize or cache. If your own code dominates, look there before touching the model. See a representative trace →