Skip to content

add agent evaluation example with @observe and component-level tracing#2585

Open
Ajay6601 wants to merge 2 commits intoconfident-ai:mainfrom
Ajay6601:feat/add-agent-eval-example
Open

add agent evaluation example with @observe and component-level tracing#2585
Ajay6601 wants to merge 2 commits intoconfident-ai:mainfrom
Ajay6601:feat/add-agent-eval-example

Conversation

@Ajay6601
Copy link
Copy Markdown
Contributor

Adds examples/agent_evaluation/ with a complete, runnable example showing DeepEval v3.0 agent evaluation capabilities:

test_agent_eval.py
Three ways to evaluate an agent:

  1. evaluate () function (quickest)
  2. pytest integration (deepeval test run)
  3. Component-level eval with @observe and update_current_span
    README.md with Quick start guide with metric descriptions

Uses a mock agent with retriever to keep the example (no external API calls needed beyond the evaluation LLM).

… with TaskCompletion, AnswerRelevancy, custom GEval
@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 31, 2026

@Ajay6601 is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

@penguine-ip
Copy link
Copy Markdown
Contributor

Hey @Ajay6601 the observed callback is no longer supported - it is the evals iterator right now: https://deepeval.com/docs/evaluation-component-level-llm-evals#run-component-level-evals

@Ajay6601 Ajay6601 force-pushed the feat/add-agent-eval-example branch from a67d2a6 to 0d9c390 Compare April 15, 2026 03:47
- Use evals_iterator loop instead of observed_callback
- Nest @observe components to form proper trace hierarchy
- Move TaskCompletionMetric to evals_iterator (trace-level)
- Keep AnswerRelevancyMetric on @observe (span-level)
- Add update_current_span for runtime test case creation
@Ajay6601 Ajay6601 force-pushed the feat/add-agent-eval-example branch from 0d9c390 to 6567fe3 Compare April 15, 2026 03:49
Copy link
Copy Markdown
Contributor Author

@Ajay6601 Ajay6601 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Updated to use evals_iterator with @observe + update_current_span per the current docs. The example now demonstrates trace-level metrics via evals_iterator(metrics=[ ]) and span-level metrics via @observe(metrics=[ ]), with proper nested spans forming the trace hierarchy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants