Hi, thanks for this great library.
I'm running into an issue when marking tests with pytest.mark.flaky. In cases where tests pass e.g. on the second try, the first attempt is logged on the confident-ai.com dashboard as a failure, and the second attempt is logged as a pass.
This leads to many more test cases being shown on the dashboard than were actually run in practice.
Is there a better way to handle retry logic other than using mark.flaky? Should this be handled internally within the test, e.g. catching AssertionErrors N times, and only then throwing when N is exceeded?
Minimal example
This won't work "out-of-the-box", but should show how I'm constructing the test cases.
@pytest.mark.parametrize(
"query",
# big list of queries
[...]
)
@pytest.mark.flaky(reruns=5)
def test_with_flaky(query: str):
# Assume that this takes the query, makes a generation with the agent
# and then returns a `deepeval.LLMTestCase` object
test_case = create_basic_test_case(query)
metric = GEval(
name="flaky test",
criteria=(
"..."
),
evaluation_params=[LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT],
)
# Maybe fails the first time, but works the second
assert_test(test_case, [metric])
and then the command to run the tests from the CLI:
poetry run deepeval login --api-key $CONFIDENT_API_KEY
poetry run deepeval test run test_file.py -n 2
Hi, thanks for this great library.
I'm running into an issue when marking tests with
pytest.mark.flaky. In cases where tests pass e.g. on the second try, the first attempt is logged on theconfident-ai.comdashboard as a failure, and the second attempt is logged as a pass.This leads to many more test cases being shown on the dashboard than were actually run in practice.
Is there a better way to handle
retrylogic other than usingmark.flaky? Should this be handled internally within the test, e.g. catchingAssertionErrorsN times, and only then throwing when N is exceeded?Minimal example
This won't work "out-of-the-box", but should show how I'm constructing the test cases.
and then the command to run the tests from the CLI: