You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/content/docs/(concepts)/evaluation-datasets.mdx
+34-2Lines changed: 34 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -356,9 +356,9 @@ By default the `save_as()` method only saves the `Golden`s within your `Evaluati
356
356
357
357
### Load Dataset
358
358
359
-
`deepeval` offers support for loading datasets stored in JSON files, CSV files, and hugging face datasets into an `EvaluationDataset` as either test cases or goldens.
359
+
`deepeval` offers support for loading datasets stored in JSON, JSONL, CSV, and hugging face datasets into an `EvaluationDataset` as either test cases or goldens.
Loading datasets as goldens are especially helpful if you're looking to generate LLM `actual_output`s at evaluation time. You might find yourself in this situation if you are generating data for testing or using historical data from production.
416
416
:::
417
417
418
+
</Tab>
419
+
<Tabvalue="From JSONL">
420
+
421
+
You can load existing `Golden`s or `ConversationalGolden`s from a `.jsonl` file by supplying a `file_path`. Each line should contain one JSON object that maps to either a `Golden` or a `ConversationalGolden`.
422
+
423
+
```python
424
+
from deepeval.dataset import EvaluationDataset
425
+
426
+
dataset = EvaluationDataset()
427
+
428
+
# Add goldens from a JSONL file
429
+
dataset.add_goldens_from_jsonl_file(
430
+
file_path="example.jsonl",
431
+
) # file_path is the absolute path to your .jsonl file
{"scenario": "A user asks for help evaluating an LLM app.", "expected_outcome": "The user understands how to create an evaluation dataset.", "context": ["DeepEval supports evaluation datasets."]}
444
+
```
445
+
446
+
:::note
447
+
An `EvaluationDataset` can contain either single-turn or multi-turn goldens, but not both. If a JSONL file mixes `Golden` and `ConversationalGolden` rows, `deepeval` will raise an error.
|`--verbose`, `-v`| Show verbose pytest output and turn on deepeval verbose mode. |
136
+
|`--exit-on-first-failure`, `-x`| Stop after the first failed test. |
137
+
|`--show-warnings`, `-w`| Show pytest warnings instead of disabling them. |
138
+
|`--identifier`, `-id`| Attach an identifier to the test run. |
139
+
|`--num-processes`, `-n`| Run tests with multiple pytest-xdist processes. |
140
+
|`--repeat`, `-r`| Rerun each test case the specified number of times. |
141
+
|`--use-cache`, `-c`| Use cached evaluation results when `--repeat` is not set. |
142
+
|`--ignore-errors`, `-i`| Continue when deepeval evaluation errors occur. |
143
+
|`--skip-on-missing-params`, `-s`| Skip test cases with missing metric parameters. |
144
+
|`--display`, `-d`| Control final result display. Defaults to showing all results. |
145
+
|`--mark`, `-m`| Run tests matching a pytest marker expression. |
146
+
147
+
You can pass additional pytest flags after the `deepeval` options. For example:
148
+
149
+
```bash
150
+
deepeval test run tests/evals \
151
+
--mark "not slow" \
152
+
--exit-on-first-failure \
153
+
-- --tb=short
154
+
```
155
+
44
156
## Confident AI Commands
45
157
46
158
Use these commands to connect `deepeval` to **Confident AI** (`deepeval` Cloud) so your local evaluations can be uploaded, organized, and viewed as rich test run reports on the cloud. If you don’t have an account yet, [sign up here](https://app.confident-ai.com).
@@ -74,16 +186,18 @@ For example, **$3 / MTok = 0.000003**.
74
186
:::
75
187
76
188
To set the model and token cost for Anthropic you would run:
For provider-specific flags, run `deepeval set-<provider> --help`.
210
305
:::
@@ -215,4 +310,4 @@ For provider-specific flags, run `deepeval set-<provider> --help`.
215
310
- **Provider still active after unsetting?** Unsetting turns off target provider `USE_*` flags; if a provider remains enabled and properly configured it will become the active provider. If no provider is enabled, but OpenAI credentials are present, OpenAI may be used as a fallback. To force a provider, run the corresponding `set-<provider>` command.
216
311
- **Dotenv edits not picked up?** deepeval loads dotenv files from the current working directory by default, or `ENV_DIR_PATH` if set. Ensure your Python process runs in that context.
217
312
218
-
If you’re still stuck, the dedicated [Troubleshooting](/docs/troubleshooting) page covers deeper debugging (TLS errors, logging, timeouts, dotenv loading, and config caching).
313
+
If you’re still stuck, the dedicated [Troubleshooting](/docs/troubleshooting) page covers deeper debugging (TLS errors, logging, timeouts, dotenv loading, and config caching).
For deeper configuration details, see the method-specific pages for [generating from docs](/docs/synthesizer-generate-from-docs), [contexts](/docs/synthesizer-generate-from-contexts), [scratch](/docs/synthesizer-generate-from-scratch), and [goldens](/docs/synthesizer-generate-from-goldens).
0 commit comments