Skip to content

fix: prevent duplicate test-case rows on pytest retry#2619

Open
amitkojha05 wants to merge 3 commits intoconfident-ai:mainfrom
amitkojha05:fix/flaky-test-duplicate-reporting
Open

fix: prevent duplicate test-case rows on pytest retry#2619
amitkojha05 wants to merge 3 commits intoconfident-ai:mainfrom
amitkojha05:fix/flaky-test-duplicate-reporting

Conversation

@amitkojha05
Copy link
Copy Markdown

Problem

When a test is marked with pytest.mark.flaky or pytest-rerunfailures,assert_test() is called again from scratch on every retry attempt.Each call flows through:

assert_test() → execute_test_cases() → update_test_run() → add_test_case()

add_test_case() blindly appended on every call with no duplicate check.
A test that passed on the 3rd attempt would appear on the Confident AI dashboard as 2 failures + 1 pass instead of just 1 pass.

The same unconditional append also double-counted evaluation_cost — a test costing 0.01 on attempt 1 and 0.02 on attempt 2 accumulated 0.03 on the run total instead of the correct 0.02.

Root cause

TestRun.add_test_case() in deepeval/test_run/test_run.py — no deduplication by test-case name before appending to self.test_cases or self.conversational_test_cases.

Fix

One method changed, one file touched: deepeval/test_run/test_run.py.

add_test_case() now scans the target list for an existing entry with the same name. On a match it replaces that slot (latest attempt wins) and backs the replaced attempt's evaluation_cost out of the run total before applying the new one. New test cases still append as before via Python's for/else.

replaced_cost: Union[float, None] = None
for i, existing in enumerate(target_list):
    if existing.name == api_test_case.name:
        replaced_cost = existing.evaluation_cost
        target_list[i] = api_test_case
        break
else:
    target_list.append(api_test_case)

if replaced_cost is not None and self.evaluation_cost is not None:
    self.evaluation_cost -= replaced_cost

Before / after

run.add_test_case(LLMApiTestCase(name="test_foo", ..., evaluationCost=0.01, success=False))  # attempt 1
run.add_test_case(LLMApiTestCase(name="test_foo", ..., evaluationCost=0.02, success=True))   # retry

# BEFORE
len(run.test_cases)  # 2    ← both attempts appear as separate dashboard rows
run.evaluation_cost  # 0.03 ← cost double-counted

# AFTER
len(run.test_cases)  # 1    ← only the final result reported
run.evaluation_cost  # 0.02 ← correct

Tests

Added tests/test_core/test_run/test_test_run_retry_overwrite.py — 10 tests, all passing on Python 3.10 and 3.12:

  • LLM: same name twice → one row, latest success wins
  • LLM: two different names → two rows (existing behaviour unchanged)
  • LLM: cost not double-counted on replace
  • LLM: retry with None cost backs out previous cost correctly
  • LLM: first attempt None cost, retry sets cost correctly
  • LLM: three retries → one row, only last cost counted
  • LLM: mixed retried + unique test → correct total cost
  • Conversational: same name twice → one row
  • Conversational: cost not double-counted
  • Conversational: two different names → two rows

Caveat

Deduplication keys on name. If two genuinely distinct test cases share the same display name in one run, the second would silently overwrite the first. Pytest node IDs are unique by default so this shouldn't occur in practice — flagging it in case maintainers prefer a stricter key (e.g. nodeid) in future.

Checklist

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

@amitkojha05 is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

@amitkojha05
Copy link
Copy Markdown
Author

@penguine-ip Please review this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test cases reported multiple times with pytest.mark.flaky

1 participant