Latency

Where time goes.

Most teams blame the LLM and start swapping models. The trace usually says otherwise. Here's where this project's time actually went in the last 24 hours.

Trace p50
3.45s
typical trace
Trace p95
6.19s
tail latency
Trace p99
7.20s
worst 1%
Slowest model
claude-opus-4-7
p95 2.41s

Where the time goes

Total in-trace milliseconds, split LLM vs tool. Anything outside these bars is your own code.

LLM time85%
mean 1.47s per call · 481 calls
Tool time15%
mean 358ms per call · 349 calls
Diagnostic

LLM calls are 85% of in-trace time. Model choice or prompt length is the right place to optimize.

Latency by model

Per-call distribution. Compare apples-to-apples before you switch.

ModelCallsMeanp50p95p99
claude-opus-4-72601.49s1.51s2.41s2.48s
gpt-4o901.49s1.55s2.37s2.49s
claude-sonnet-4-6711.46s1.50s2.36s2.39s
gpt-4-mini601.39s1.44s2.35s2.49s

Slowest traces

Click any row to open its waterfall — the bar shows whether time went to the model or tools.

  • agent.run
    trace_00002r· support_bot· claude-opus-4-7
    7.38s
    LLM 6.42s (87%)Tools 856ms (12%)
  • agent.run
    trace_00003u· search_qa· gpt-4o
    7.35s
    LLM 5.95s (81%)Tools 1.32s (18%)
  • workflow.execute
    trace_00001o· chat_summary· claude-opus-4-7
    7.20s
    LLM 7.10s (99%)Tools 0ms (0%)
  • agent.run
    trace_00003t· search_qa· gpt-4o
    6.90s
    LLM 5.58s (81%)Tools 1.21s (17%)
  • agent.plan
    trace_000007· code_assist· claude-opus-4-7
    6.74s
    LLM 5.65s (84%)Tools 1.00s (15%)
  • workflow.execute
    trace_000022· chat_summary· claude-opus-4-7
    6.68s
    LLM 5.39s (81%)Tools 1.17s (18%)
  • workflow.execute
    trace_00005v· chat_summary· claude-opus-4-7
    6.57s
    LLM 5.82s (89%)Tools 639ms (10%)
  • workflow.execute
    trace_00003e· data_extraction· claude-sonnet-4-6
    6.56s
    LLM 5.92s (90%)Tools 533ms (8%)

Most developers assume the LLM is the bottleneck and swap models. It almost never is.

The waterfall tells you exactly where time went. If tools dominate, parallelize or cache. If your own code dominates, look there before touching the model. See a representative trace →