Skip to content

Commit a9b6d38

Browse files
committed
docs cleanup
1 parent de6eadf commit a9b6d38

4 files changed

Lines changed: 154 additions & 139 deletions

File tree

docs/content/docs/(concepts)/evaluation-datasets.mdx

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -356,9 +356,9 @@ By default the `save_as()` method only saves the `Golden`s within your `Evaluati
356356

357357
### Load Dataset
358358

359-
`deepeval` offers support for loading datasets stored in JSON files, CSV files, and hugging face datasets into an `EvaluationDataset` as either test cases or goldens.
359+
`deepeval` offers support for loading datasets stored in JSON, JSONL, CSV, and hugging face datasets into an `EvaluationDataset` as either test cases or goldens.
360360

361-
<Tabs items={["Confident AI", "From JSON", "From CSV"]}>
361+
<Tabs items={["Confident AI", "From JSON", "From JSONL", "From CSV"]}>
362362
<Tab value="Confident AI">
363363

364364
You can load entire datasets on Confident AI's cloud in one line of code.
@@ -415,6 +415,38 @@ dataset.add_test_cases_from_json_file(
415415
Loading datasets as goldens are especially helpful if you're looking to generate LLM `actual_output`s at evaluation time. You might find yourself in this situation if you are generating data for testing or using historical data from production.
416416
:::
417417

418+
</Tab>
419+
<Tab value="From JSONL">
420+
421+
You can load existing `Golden`s or `ConversationalGolden`s from a `.jsonl` file by supplying a `file_path`. Each line should contain one JSON object that maps to either a `Golden` or a `ConversationalGolden`.
422+
423+
```python
424+
from deepeval.dataset import EvaluationDataset
425+
426+
dataset = EvaluationDataset()
427+
428+
# Add goldens from a JSONL file
429+
dataset.add_goldens_from_jsonl_file(
430+
file_path="example.jsonl",
431+
) # file_path is the absolute path to your .jsonl file
432+
```
433+
434+
For single-turn goldens, each line can look like:
435+
436+
```json
437+
{"input": "What is DeepEval?", "expected_output": "An LLM evaluation framework.", "context": ["DeepEval helps evaluate LLM apps."]}
438+
```
439+
440+
For multi-turn goldens, each line can look like:
441+
442+
```json
443+
{"scenario": "A user asks for help evaluating an LLM app.", "expected_outcome": "The user understands how to create an evaluation dataset.", "context": ["DeepEval supports evaluation datasets."]}
444+
```
445+
446+
:::note
447+
An `EvaluationDataset` can contain either single-turn or multi-turn goldens, but not both. If a JSONL file mixes `Golden` and `ConversationalGolden` rows, `deepeval` will raise an error.
448+
:::
449+
418450
</Tab>
419451
<Tab value="From CSV">
420452

docs/content/docs/command-line-interface.mdx

Lines changed: 119 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ sidebar_label: CLI Settings
99
`deepeval` provides a CLI for managing common tasks directly from the terminal. You can use it for:
1010

1111
- Logging in/out and viewing test runs
12+
- Running evaluations from test files
13+
- Generating synthetic goldens from docs, contexts, scratch, or existing goldens
1214
- Enabling/disabling debug
1315
- Selecting an LLM/embeddings provider (OpenAI, Azure OpenAI, Gemini, Grok, DeepSeek, LiteLLM, local/Ollama)
1416
- Setting/unsetting provider-specific options (model, endpoint, deployment, etc.)
@@ -41,6 +43,116 @@ deepeval also uses a legacy JSON keystore at `.deepeval/.deepeval` for **non-sec
4143
To disable dotenv autoloading (useful in pytest/CI to avoid loading local `.env*` files on import), set `DEEPEVAL_DISABLE_DOTENV=1`.
4244
:::
4345

46+
## Core Commands
47+
48+
### `generate`
49+
50+
Use `deepeval generate` to generate synthetic goldens from the terminal with the Golden Synthesizer. The command requires two selectors:
51+
52+
- `--method`: where goldens come from: `docs`, `contexts`, `scratch`, or `goldens`
53+
- `--variation`: what to generate: `single-turn` or `multi-turn`
54+
55+
Generate single-turn goldens from documents:
56+
57+
```bash
58+
deepeval generate \
59+
--method docs \
60+
--variation single-turn \
61+
--documents example.txt \
62+
--documents another.pdf \
63+
--output-dir ./synthetic_data
64+
```
65+
66+
Generate multi-turn goldens from scratch:
67+
68+
```bash
69+
deepeval generate \
70+
--method scratch \
71+
--variation multi-turn \
72+
--num-goldens 25 \
73+
--scenario-context "Users asking support questions" \
74+
--conversational-task "Help users solve product issues" \
75+
--participant-roles "User and assistant"
76+
```
77+
78+
Common options:
79+
80+
| Option | Description |
81+
| -------------------------------------------- | ---------------------------------------------------------------------------- |
82+
| `--method docs\|contexts\|scratch\|goldens` | Select the generation method. |
83+
| `--variation single-turn\|multi-turn` | Select whether to generate `Golden`s or `ConversationalGolden`s. |
84+
| `--output-dir` | Directory where generated goldens are saved. Defaults to `./synthetic_data`. |
85+
| `--file-type json\|csv\|jsonl` | Output file type. Defaults to `json`. |
86+
| `--file-name` | Optional output filename without extension. |
87+
| `--model` | Model to use for generation. |
88+
| `--async-mode / --sync-mode` | Enable or disable concurrent generation. |
89+
| `--max-concurrent` | Maximum number of concurrent generation tasks. |
90+
| `--include-expected / --no-include-expected` | Generate or skip expected outputs/outcomes. |
91+
| `--cost-tracking` | Print generation cost when supported by the model. |
92+
93+
Method-specific options:
94+
95+
| Method | Required Options | Useful Optional Options |
96+
| ---------- | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
97+
| `docs` | `--documents` | `--max-goldens-per-context`, `--max-contexts-per-document`, `--min-contexts-per-document`, `--chunk-size`, `--chunk-overlap`, `--context-quality-threshold`, `--context-similarity-threshold`, `--max-retries` |
98+
| `contexts` | `--contexts-file` | `--max-goldens-per-context` |
99+
| `scratch` | `--num-goldens` plus styling options | Single-turn: `--scenario`, `--task`, `--input-format`, `--expected-output-format`. Multi-turn: `--scenario-context`, `--conversational-task`, `--participant-roles`, `--scenario-format`, `--expected-outcome-format` |
100+
| `goldens` | `--goldens-file` | `--max-goldens-per-golden` |
101+
102+
For a deeper walkthrough, see the [Golden Synthesizer](/docs/golden-synthesizer#generate-goldens-from-the-cli) docs.
103+
104+
### `test`
105+
106+
Use `deepeval test run` to run evaluation test files through `pytest` with the `deepeval` pytest plugin enabled.
107+
108+
```bash
109+
deepeval test --help
110+
deepeval test run --help
111+
```
112+
113+
Run a single test file:
114+
115+
```bash
116+
deepeval test run test_chatbot.py
117+
```
118+
119+
Run a test directory:
120+
121+
```bash
122+
deepeval test run tests/evals
123+
```
124+
125+
Run a specific test:
126+
127+
```bash
128+
deepeval test run test_chatbot.py::test_answer_relevancy
129+
```
130+
131+
Useful options:
132+
133+
| Option | Description |
134+
| -------------------------------- | -------------------------------------------------------------- |
135+
| `--verbose`, `-v` | Show verbose pytest output and turn on deepeval verbose mode. |
136+
| `--exit-on-first-failure`, `-x` | Stop after the first failed test. |
137+
| `--show-warnings`, `-w` | Show pytest warnings instead of disabling them. |
138+
| `--identifier`, `-id` | Attach an identifier to the test run. |
139+
| `--num-processes`, `-n` | Run tests with multiple pytest-xdist processes. |
140+
| `--repeat`, `-r` | Rerun each test case the specified number of times. |
141+
| `--use-cache`, `-c` | Use cached evaluation results when `--repeat` is not set. |
142+
| `--ignore-errors`, `-i` | Continue when deepeval evaluation errors occur. |
143+
| `--skip-on-missing-params`, `-s` | Skip test cases with missing metric parameters. |
144+
| `--display`, `-d` | Control final result display. Defaults to showing all results. |
145+
| `--mark`, `-m` | Run tests matching a pytest marker expression. |
146+
147+
You can pass additional pytest flags after the `deepeval` options. For example:
148+
149+
```bash
150+
deepeval test run tests/evals \
151+
--mark "not slow" \
152+
--exit-on-first-failure \
153+
-- --tb=short
154+
```
155+
44156
## Confident AI Commands
45157

46158
Use these commands to connect `deepeval` to **Confident AI** (`deepeval` Cloud) so your local evaluations can be uploaded, organized, and viewed as rich test run reports on the cloud. If you don’t have an account yet, [sign up here](https://app.confident-ai.com).
@@ -74,16 +186,18 @@ For example, **&#36;3 / MTok = 0.000003**.
74186
:::
75187

76188
To set the model and token cost for Anthropic you would run:
189+
77190
```bash
78191
deepeval set-anthropic -m claude-3-7-sonnet-latest -i 0.000003 -o 0.000015 --save=dotenv
79192
Saved environment variables to .env.local (ensure it's git-ignored).
80193
🙌 Congratulations! You're now using Anthropic `claude-3-7-sonnet-latest` for all evals that require an LLM.
81194
```
82195
83196
To view your settings for Anthropic you would run:
197+
84198
```bash
85199
deepeval settings -l anthropic
86-
Settings
200+
Settings
87201
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
88202
┃ Name ┃ Value ┃ Description ┃
89203
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
@@ -95,26 +209,6 @@ deepeval settings -l anthropic
95209
└─────────────────────────────────┴──────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────┘
96210
```
97211
98-
## Core Commands
99-
100-
### `login` & `logout`
101-
102-
- `deepeval login [--save=dotenv[:path]]` (interactive prompt)
103-
- `deepeval logout [--save=dotenv[:path]]`: Clears keys from the JSON keystore and removes them from the chosen dotenv file.
104-
105-
### `view`
106-
107-
- `deepeval view`: Opens the latest test run in your browser. If needed, uploads artifacts first.
108-
109-
### `test`
110-
111-
The CLI includes a test sub-app for running E2E examples and fixtures. Usage varies, so consult the built-in help:
112-
113-
```bash
114-
deepeval test --help
115-
deepeval test <command> --help
116-
```
117-
118212
## Debug Controls
119213
120214
Use these to turn on structured logs, gRPC wire tracing, and Confident tracing (all optional).
@@ -140,9 +234,10 @@ To see all available debug flags, run `deepeval set-debug --help`.
140234
141235
:::tip
142236
To filter (substring match) settings by name displaying each setting's current value and description run:
237+
143238
```bash
144239
deepeval settings -l log-level
145-
Settings
240+
Settings
146241
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
147242
┃ Name ┃ Value ┃ Description ┃
148243
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
@@ -151,6 +246,7 @@ deepeval settings -l log-level
151246
│ LOG_LEVEL │ 40 │ Global logging level (e.g. DEBUG/INFO/WARNING/ERROR/CRITICAL or numeric). │
152247
└─────────────────────────────────┴───────┴──────────────────────────────────────────────────────────────────────────────┘
153248
```
249+
154250
:::
155251
156252
To restore defaults and clean persisted values:
@@ -204,7 +300,6 @@ If you want to see what environment variables `deepeval` manages under the hood,
204300
| Local (HTTP) | `set-local-embeddings` | `unset-local-embeddings` |
205301
| Ollama | `set-ollama-embeddings` | `unset-ollama-embeddings` |
206302
207-
208303
:::tip
209304
For provider-specific flags, run `deepeval set-<provider> --help`.
210305
:::
@@ -215,4 +310,4 @@ For provider-specific flags, run `deepeval set-<provider> --help`.
215310
- **Provider still active after unsetting?** Unsetting turns off target provider `USE_*` flags; if a provider remains enabled and properly configured it will become the active provider. If no provider is enabled, but OpenAI credentials are present, OpenAI may be used as a fallback. To force a provider, run the corresponding `set-<provider>` command.
216311
- **Dotenv edits not picked up?** deepeval loads dotenv files from the current working directory by default, or `ENV_DIR_PATH` if set. Ensure your Python process runs in that context.
217312
218-
If you’re still stuck, the dedicated [Troubleshooting](/docs/troubleshooting) page covers deeper debugging (TLS errors, logging, timeouts, dotenv loading, and config caching).
313+
If you’re still stuck, the dedicated [Troubleshooting](/docs/troubleshooting) page covers deeper debugging (TLS errors, logging, timeouts, dotenv loading, and config caching).

docs/content/docs/golden-synthesizer/index.mdx

Lines changed: 0 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -180,118 +180,6 @@ Here's an example of what the resulting DataFrame might look like for a single-t
180180

181181
And that's it! You now have access to a list of synthetic goldens generated using information from your knowledge base.
182182

183-
## Generate Goldens From The CLI
184-
185-
You can also generate synthetic goldens directly from the command line with `deepeval generate`. The two required selectors are:
186-
187-
- `--method`: choose where the goldens come from: `docs`, `contexts`, `scratch`, or `goldens`.
188-
- `--variation`: choose the golden type: `single-turn` or `multi-turn`.
189-
190-
The shortest document-based example looks like this:
191-
192-
```bash
193-
deepeval generate \
194-
--method docs \
195-
--variation single-turn \
196-
--documents example.txt \
197-
--output-dir ./synthetic_data
198-
```
199-
200-
### CLI Examples
201-
202-
Generate goldens from documents:
203-
204-
```bash
205-
deepeval generate \
206-
--method docs \
207-
--variation single-turn \
208-
--documents example.txt \
209-
--documents another.pdf \
210-
--max-goldens-per-context 2
211-
```
212-
213-
Generate goldens from prepared contexts:
214-
215-
```bash
216-
deepeval generate \
217-
--method contexts \
218-
--variation single-turn \
219-
--contexts-file contexts.json \
220-
--max-goldens-per-context 2
221-
```
222-
223-
The `contexts.json` file should contain a list of context lists:
224-
225-
```json
226-
[
227-
["context chunk 1", "context chunk 2"],
228-
["another context chunk"]
229-
]
230-
```
231-
232-
Generate goldens from scratch:
233-
234-
```bash
235-
deepeval generate \
236-
--method scratch \
237-
--variation single-turn \
238-
--num-goldens 25 \
239-
--scenario "Non-technical users querying a database" \
240-
--task "Answer text-to-SQL questions" \
241-
--input-format "Questions in English" \
242-
--expected-output-format "SQL query"
243-
```
244-
245-
For multi-turn scratch generation, use the conversational styling options:
246-
247-
```bash
248-
deepeval generate \
249-
--method scratch \
250-
--variation multi-turn \
251-
--num-goldens 25 \
252-
--scenario-context "Non-technical users querying a database" \
253-
--conversational-task "Help users query data" \
254-
--participant-roles "User and assistant"
255-
```
256-
257-
Generate more goldens from an existing golden file:
258-
259-
```bash
260-
deepeval generate \
261-
--method goldens \
262-
--variation single-turn \
263-
--goldens-file existing_goldens.json \
264-
--max-goldens-per-golden 2
265-
```
266-
267-
### CLI Options Reference
268-
269-
Common options:
270-
271-
| Option | Description |
272-
| --- | --- |
273-
| `--method docs\|contexts\|scratch\|goldens` | Selects the generation method. |
274-
| `--variation single-turn\|multi-turn` | Selects whether to generate `Golden`s or `ConversationalGolden`s. |
275-
| `--output-dir` | Directory for the generated file. Defaults to `./synthetic_data`. |
276-
| `--file-type json\|csv\|jsonl` | Output file type. Defaults to `json`. |
277-
| `--file-name` | Optional output filename without extension. |
278-
| `--model` | Model to use for generation. |
279-
| `--async-mode / --sync-mode` | Enables or disables concurrent generation. |
280-
| `--max-concurrent` | Maximum number of concurrent generation tasks. |
281-
| `--include-expected / --no-include-expected` | Generates or skips expected outputs/outcomes. |
282-
| `--cost-tracking` | Prints generation cost when supported by the model. |
283-
284-
Method-specific options:
285-
286-
| Method | Required Options | Useful Optional Options |
287-
| --- | --- | --- |
288-
| `docs` | `--documents` | `--max-goldens-per-context`, `--max-contexts-per-document`, `--min-contexts-per-document`, `--chunk-size`, `--chunk-overlap`, `--context-quality-threshold`, `--context-similarity-threshold`, `--max-retries` |
289-
| `contexts` | `--contexts-file` | `--max-goldens-per-context` |
290-
| `scratch` | `--num-goldens` plus styling options | Single-turn: `--scenario`, `--task`, `--input-format`, `--expected-output-format`. Multi-turn: `--scenario-context`, `--conversational-task`, `--participant-roles`, `--scenario-format`, `--expected-outcome-format` |
291-
| `goldens` | `--goldens-file` | `--max-goldens-per-golden` |
292-
293-
For deeper configuration details, see the method-specific pages for [generating from docs](/docs/synthesizer-generate-from-docs), [contexts](/docs/synthesizer-generate-from-contexts), [scratch](/docs/synthesizer-generate-from-scratch), and [goldens](/docs/synthesizer-generate-from-goldens).
294-
295183
## Save Your Synthetic Dataset
296184

297185
<Tabs items={["Confident AI", "Locally"]}>

docs/lib/generated/contributors.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2396,7 +2396,7 @@
23962396
"name": "Jeffrey Ip",
23972397
"avatarUrl": "https://avatars.githubusercontent.com/u/143328635?v=4",
23982398
"url": "https://github.com/penguine-ip",
2399-
"commits": 26
2399+
"commits": 27
24002400
},
24012401
{
24022402
"login": "kritinv",

0 commit comments

Comments
 (0)