fix: prevent duplicate test-case rows on pytest retry by amitkojha05 · Pull Request #2619 · confident-ai/deepeval

amitkojha05 · 2026-04-17T00:54:28Z

Problem

When a test is marked with pytest.mark.flaky or pytest-rerunfailures,assert_test() is called again from scratch on every retry attempt.Each call flows through:

assert_test() → execute_test_cases() → update_test_run() → add_test_case()

add_test_case() blindly appended on every call with no duplicate check.
A test that passed on the 3rd attempt would appear on the Confident AI dashboard as 2 failures + 1 pass instead of just 1 pass.

The same unconditional append also double-counted evaluation_cost — a test costing 0.01 on attempt 1 and 0.02 on attempt 2 accumulated 0.03 on the run total instead of the correct 0.02.

Root cause

TestRun.add_test_case() in deepeval/test_run/test_run.py — no deduplication by test-case name before appending to self.test_cases or self.conversational_test_cases.

Fix

One method changed, one file touched: deepeval/test_run/test_run.py.

add_test_case() now scans the target list for an existing entry with the same name. On a match it replaces that slot (latest attempt wins) and backs the replaced attempt's evaluation_cost out of the run total before applying the new one. New test cases still append as before via Python's for/else.

replaced_cost: Union[float, None] = None
for i, existing in enumerate(target_list):
    if existing.name == api_test_case.name:
        replaced_cost = existing.evaluation_cost
        target_list[i] = api_test_case
        break
else:
    target_list.append(api_test_case)

if replaced_cost is not None and self.evaluation_cost is not None:
    self.evaluation_cost -= replaced_cost

Before / after

run.add_test_case(LLMApiTestCase(name="test_foo", ..., evaluationCost=0.01, success=False))  # attempt 1
run.add_test_case(LLMApiTestCase(name="test_foo", ..., evaluationCost=0.02, success=True))   # retry

# BEFORE
len(run.test_cases)  # 2    ← both attempts appear as separate dashboard rows
run.evaluation_cost  # 0.03 ← cost double-counted

# AFTER
len(run.test_cases)  # 1    ← only the final result reported
run.evaluation_cost  # 0.02 ← correct

Tests

Added tests/test_core/test_run/test_test_run_retry_overwrite.py — 10 tests, all passing on Python 3.10 and 3.12:

LLM: same name twice → one row, latest success wins
LLM: two different names → two rows (existing behaviour unchanged)
LLM: cost not double-counted on replace
LLM: retry with None cost backs out previous cost correctly
LLM: first attempt None cost, retry sets cost correctly
LLM: three retries → one row, only last cost counted
LLM: mixed retried + unique test → correct total cost
Conversational: same name twice → one row
Conversational: cost not double-counted
Conversational: two different names → two rows

Caveat

Deduplication keys on name. If two genuinely distinct test cases share the same display name in one run, the second would silently overwrite the first. Pytest node IDs are unique by default so this shouldn't occur in practice — flagging it in case maintainers prefer a stricter key (e.g. nodeid) in future.

Checklist

Closes Test cases reported multiple times with pytest.mark.flaky #1992
Changes limited to deepeval/test_run/test_run.py + new test file
10/10 tests passing locally (Python 3.10 + 3.12)

…rd duplicates (confident-ai#1992)

vercel · 2026-04-17T00:54:33Z

@amitkojha05 is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

amitkojha05 · 2026-04-20T12:52:08Z

@penguine-ip Please review this PR

fix: deduplicate test cases on retry to fix pytest.mark.flaky dashboa…

b9f0a84

…rd duplicates (confident-ai#1992)

fix: apply black formatting

d86ee70

Merge branch 'main' into fix/flaky-test-duplicate-reporting

d9ee6e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent duplicate test-case rows on pytest retry#2619

fix: prevent duplicate test-case rows on pytest retry#2619
amitkojha05 wants to merge 3 commits intoconfident-ai:mainfrom
amitkojha05:fix/flaky-test-duplicate-reporting

amitkojha05 commented Apr 17, 2026

Uh oh!

vercel Bot commented Apr 17, 2026

Uh oh!

amitkojha05 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amitkojha05 commented Apr 17, 2026

Problem

Root cause

Fix

Before / after

Tests

Caveat

Checklist

Uh oh!

vercel Bot commented Apr 17, 2026

Uh oh!

amitkojha05 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant