implement TrustScoreMetric for evaluating RAG source trustworthiness by Danish2op · Pull Request #2623 · confident-ai/deepeval

Danish2op · 2026-04-22T17:43:29Z

Summary

Implements TrustScoreMetric as proposed in #2586.

What this adds

A new deterministic, rule-based metric that evaluates how trustworthy an
LLM's RAG output is based on the source tier of its retrieval context.

Scoring logic

Tier	Score
T1 (e.g. SEC filings)	1.0
T2 (e.g. Bloomberg)	0.8
T3 (e.g. news sites)	0.6
T4 (e.g. blog posts)	0.4
T5 (e.g. forums/AI)	0.2
Unmatched	0.5

Final score = average across all retrieval context chunks.

Usage

from deepeval.metrics import TrustScoreMetric
from deepeval.test_case import LLMTestCase

metric = TrustScoreMetric(
    threshold=0.7,
    source_tiers={"SEC filings": 1, "news": 3, "forums": 4}
)
test_case = LLMTestCase(
    input="What was Q3 revenue?",
    actual_output="Revenue was $4.2B",
    retrieval_context=["SEC 10-Q filing: Revenue $4.2B"]
)
metric.measure(test_case)
print(metric.score)    # 1.0
print(metric.reason)   # "Matched source 'SEC filings' mapped to Tier 1..."
print(metric.success)  # True

Testing

poetry run pytest tests/test_metrics/test_trust_score_metric.py

Closes #2586

…iness - Create `TrustScoreMetric` that evaluates trustworthiness based on predefined `source_tiers`. - Accept a dict mapping source substrings to tiers (1-5) and compute score logic. - Add `TrustScoreMetric` into `deepeval/metrics/__init__.py`. - Create extensive tests covering high, low, mixed, unmatched, and empty retrieval context scenarios. - Provide a minimal example script `examples/getting_started/test_trust_score.py`. Co-authored-by: Danish2op <135794381+Danish2op@users.noreply.github.com>

- Added an explanatory markdown file that details what was solved, how the TrustScoreMetric was implemented, and how to verify it. Co-authored-by: Danish2op <135794381+Danish2op@users.noreply.github.com>

vercel · 2026-04-22T17:43:33Z

@Danish2op is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

google-labs-jules Bot and others added 4 commits April 22, 2026 11:57

docs: add TRUST_SCORE_METRIC_IMPLEMENTATION.md

5e952ec

- Added an explanatory markdown file that details what was solved, how the TrustScoreMetric was implemented, and how to verify it. Co-authored-by: Danish2op <135794381+Danish2op@users.noreply.github.com>

Delete TRUST_SCORE_METRIC_IMPLEMENTATION.md

2da3447

chore: verify false alarms

26518b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement TrustScoreMetric for evaluating RAG source trustworthiness#2623

implement TrustScoreMetric for evaluating RAG source trustworthiness#2623
Danish2op wants to merge 4 commits intoconfident-ai:mainfrom
Danish2op:feature/trust-score-metric-14902349632280497711

Danish2op commented Apr 22, 2026

Uh oh!

vercel Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Danish2op commented Apr 22, 2026

Summary

What this adds

Scoring logic

Usage

Testing

Uh oh!

vercel Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant