confident-ai / deepeval Public

Notifications You must be signed in to change notification settings
Fork 1.4k
Star 15.1k

Code
Issues 210
Pull requests 53
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: confident-ai/deepeval

Labels 17 Milestones 1

New pull request New

51 Open 1,359 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add MseeP.ai badge

#2637 opened Apr 28, 2026 by mseep-ai

Loading…

fix: return 1.0 when no knowledge retention verdicts exist

#2636 opened Apr 28, 2026 by NgDMau

Loading…

feat: Integrate OpenRouterModel into Metric utilities

#2632 opened Apr 26, 2026 by Djalal-H

Loading…

Feature/trust score metric 17823921494002512507

#2625 opened Apr 22, 2026 by Danish2op

Loading…

implement TrustScoreMetric for evaluating RAG source trustworthiness

#2623 opened Apr 22, 2026 by Danish2op

Loading…

fix: guard against None input in trimAndLoadJson

#2620 opened Apr 19, 2026 by bongho

Loading…

fix: prevent duplicate test-case rows on pytest retry

#2619 opened Apr 17, 2026 by amitkojha05

Loading…

3 tasks done

fix trace crashes from concurrent access

#2616 opened Apr 16, 2026 by gauravyad86

Loading…

POLLUX LLM-Judge metric

#2610 opened Apr 10, 2026 by ulyanaisaeva

Loading…

Allow default model to be set via env

#2602 opened Apr 5, 2026 by A-Vamshi Collaborator

Loading…

feat(test_case): make trace_dict public for post-hoc agentic evaluation

#2600 opened Apr 4, 2026 by tiffanychum Contributor

Loading…

3 tasks

fix: multi-root traces silently drop root spans from evaluation and export

#2599 opened Apr 4, 2026 by aerosta Contributor

Loading…

fix: batched upload permanently truncates in-memory test run

#2597 opened Apr 4, 2026 by aerosta Contributor

Loading…

Add AG2 integration for multi-agent tracing

#2596 opened Apr 3, 2026 by faridun-ag2

Loading…

8 tasks done

Fix/predictable temp file and race condition in gpu utils vulnerability

#2593 opened Apr 2, 2026 by AseemPrasad

Loading…

fixing secure_exec sandbox escape via getattr vulnerability

#2592 opened Apr 2, 2026 by AseemPrasad

Loading…

examples: add RAIL Score responsible AI evaluation example

#2591 opened Apr 2, 2026 by SumitVermakgp

Loading…

add agent evaluation example with @observe and component-level tracing

#2585 opened Mar 31, 2026 by Ajay6601 Contributor

Loading…

[NOT MERGABLE] OpenAI embedder changes

#2582 opened Mar 30, 2026 by A-Vamshi Collaborator • Draft

feat: add penalize_ambiguous_claims to AnswerRelevancyMetric

#2573 opened Mar 25, 2026 by Krishnachaitanyakc

Loading…

3 tasks

fix(ragas): update capture_metric_type call for new telemetry signature

#2568 opened Mar 22, 2026 by sachinML

Loading…

Add GoodMem integration for memory-powered retrieval

#2566 opened Mar 19, 2026 by bassammalik

Loading…

4 of 5 tasks

fix: include tool and trace state in evaluation cache keys

#2561 opened Mar 19, 2026 by aerosta Contributor

Loading…

fix: preserve metric snapshots when async metric tasks fail in indicator

#2560 opened Mar 18, 2026 by aerosta Contributor

Loading…

feat: add native Groq model integration for high-speed evaluations

#2556 opened Mar 17, 2026 by Jayachander123

Loading…

4 tasks done

Previous 1 2 3 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!