Skip to content

implement TrustScoreMetric for evaluating RAG source trustworthiness#2623

Open
Danish2op wants to merge 4 commits intoconfident-ai:mainfrom
Danish2op:feature/trust-score-metric-14902349632280497711
Open

implement TrustScoreMetric for evaluating RAG source trustworthiness#2623
Danish2op wants to merge 4 commits intoconfident-ai:mainfrom
Danish2op:feature/trust-score-metric-14902349632280497711

Conversation

@Danish2op
Copy link
Copy Markdown

Summary

Implements TrustScoreMetric as proposed in #2586.

What this adds

A new deterministic, rule-based metric that evaluates how trustworthy an
LLM's RAG output is based on the source tier of its retrieval context.

Scoring logic

Tier Score
T1 (e.g. SEC filings) 1.0
T2 (e.g. Bloomberg) 0.8
T3 (e.g. news sites) 0.6
T4 (e.g. blog posts) 0.4
T5 (e.g. forums/AI) 0.2
Unmatched 0.5

Final score = average across all retrieval context chunks.

Usage

from deepeval.metrics import TrustScoreMetric
from deepeval.test_case import LLMTestCase

metric = TrustScoreMetric(
    threshold=0.7,
    source_tiers={"SEC filings": 1, "news": 3, "forums": 4}
)
test_case = LLMTestCase(
    input="What was Q3 revenue?",
    actual_output="Revenue was $4.2B",
    retrieval_context=["SEC 10-Q filing: Revenue $4.2B"]
)
metric.measure(test_case)
print(metric.score)    # 1.0
print(metric.reason)   # "Matched source 'SEC filings' mapped to Tier 1..."
print(metric.success)  # True

Testing

poetry run pytest tests/test_metrics/test_trust_score_metric.py

Closes #2586

google-labs-jules Bot and others added 4 commits April 22, 2026 11:57
…iness

- Create `TrustScoreMetric` that evaluates trustworthiness based on predefined `source_tiers`.
- Accept a dict mapping source substrings to tiers (1-5) and compute score logic.
- Add `TrustScoreMetric` into `deepeval/metrics/__init__.py`.
- Create extensive tests covering high, low, mixed, unmatched, and empty retrieval context scenarios.
- Provide a minimal example script `examples/getting_started/test_trust_score.py`.

Co-authored-by: Danish2op <135794381+Danish2op@users.noreply.github.com>
- Added an explanatory markdown file that details what was solved, how the TrustScoreMetric was implemented, and how to verify it.

Co-authored-by: Danish2op <135794381+Danish2op@users.noreply.github.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 22, 2026

@Danish2op is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trust scoring as an evaluation metric — source tier and provenance

1 participant