Quality

Hallucination & faithfulness

LLM-as-judge scores from peekr.eval.Hallucination, with claim-level RAGAS verdicts where detailed mode is enabled.

healthy
Evaluations
481
LLM spans scored
Mean score
0.812
median 0.910
Warnings
73
15.2% of evals
Critical
41
8.5% < 0.5

Score distribution

Lower scores = more hallucinated content. Bars colored by threshold zone.

01422830.00.20.40.60.8Hallucination score (1.0 = fully grounded)
≥ 0.7 healthy 0.5 – 0.7 warning< 0.5 critical

Mean score over time

Hourly mean for the last 24h. Red dots flag hours with at least one critical score.

0.00.50.71.0-23h-17h-11h-5h

Quality by model

Routes for hard queries: pick from the top of this list.

ModelEvalsMean scoreCriticalStatus
claude-opus-4-7260
0.800
9.6%healthy
gpt-4o90
0.816
10.0%healthy
gpt-4-mini60
0.819
6.7%healthy
claude-sonnet-4-671
0.843
4.2%healthy

Claim verdicts

Stacked across all detailed evaluations

Supported
187
34.7% of claims
Contradicted
201
37.3% of claims
Unsupported
151
28.0% of claims

Over 539 factual claims across detailed evaluations.

Worst offenders

Lowest-scoring LLM spans in the last 24 hours. Each links to its full trace.

  • 0.01claude-opus-4-7tenant: soylent

    Refund approved under policy P-204(b), $189.99 returned to card ending 4421 within 3 days…

    View trace →
  • 0.04claude-opus-4-7tenant: soylent

    Q3 revenue hit $4.2B, a 38% YoY jump, with operating margin expanding to 27%…

    View trace →
  • 0.04gpt-4otenant: acme

    Q3 revenue hit $4.2B, a 38% YoY jump, with operating margin expanding to 27%…

    View trace →
  • 0.06gpt-4-minitenant: acme

    Q3 revenue hit $4.2B, a 38% YoY jump, with operating margin expanding to 27%…

    2 contradicted, 1 unsupported of 4 claims
    View trace →
  • 0.06gpt-4-minitenant: globex

    Q3 revenue hit $4.2B, a 38% YoY jump, with operating margin expanding to 27%…

    View trace →
  • 0.08gpt-4otenant: globex

    Q3 revenue hit $4.2B, a 38% YoY jump, with operating margin expanding to 27%…

    0 contradicted, 1 unsupported of 2 claims
    View trace →