docs cleanup

penguine-ip · penguine-ip · commit a9b6d38cdfe5 · 2026-04-27T22:37:35.000+08:00
diff --git a/docs/content/docs/(concepts)/evaluation-datasets.mdx b/docs/content/docs/(concepts)/evaluation-datasets.mdx
@@ -356,9 +356,9 @@ By default the `save_as()` method only saves the `Golden`s within your `Evaluati
 
 ### Load Dataset
 
-`deepeval` offers support for loading datasets stored in JSON files, CSV files, and hugging face datasets into an `EvaluationDataset` as either test cases or goldens.
+`deepeval` offers support for loading datasets stored in JSON, JSONL, CSV, and hugging face datasets into an `EvaluationDataset` as either test cases or goldens.
 
-<Tabs items={["Confident AI", "From JSON", "From CSV"]}>
+<Tabs items={["Confident AI", "From JSON", "From JSONL", "From CSV"]}>
 <Tab value="Confident AI">
 
 You can load entire datasets on Confident AI's cloud in one line of code.
@@ -415,6 +415,38 @@ dataset.add_test_cases_from_json_file(
 Loading datasets as goldens are especially helpful if you're looking to generate LLM `actual_output`s at evaluation time. You might find yourself in this situation if you are generating data for testing or using historical data from production.
 :::
 
+</Tab>
+<Tab value="From JSONL">
+
+You can load existing `Golden`s or `ConversationalGolden`s from a `.jsonl` file by supplying a `file_path`. Each line should contain one JSON object that maps to either a `Golden` or a `ConversationalGolden`.
+
+```python
+from deepeval.dataset import EvaluationDataset
+
+dataset = EvaluationDataset()
+
+# Add goldens from a JSONL file
+dataset.add_goldens_from_jsonl_file(
+    file_path="example.jsonl",
+) # file_path is the absolute path to your .jsonl file
+```
+
+For single-turn goldens, each line can look like:
+
+```json
+{"input": "What is DeepEval?", "expected_output": "An LLM evaluation framework.", "context": ["DeepEval helps evaluate LLM apps."]}
+```
+
+For multi-turn goldens, each line can look like:
+
+```json
+{"scenario": "A user asks for help evaluating an LLM app.", "expected_outcome": "The user understands how to create an evaluation dataset.", "context": ["DeepEval supports evaluation datasets."]}
+```
+
+:::note
+An `EvaluationDataset` can contain either single-turn or multi-turn goldens, but not both. If a JSONL file mixes `Golden` and `ConversationalGolden` rows, `deepeval` will raise an error.
+:::
+
 </Tab>
 <Tab value="From CSV">
 
diff --git a/docs/content/docs/command-line-interface.mdx b/docs/content/docs/command-line-interface.mdx
@@ -9,6 +9,8 @@ sidebar_label: CLI Settings
 `deepeval` provides a CLI for managing common tasks directly from the terminal. You can use it for:
 
 - Logging in/out and viewing test runs
+- Running evaluations from test files
+- Generating synthetic goldens from docs, contexts, scratch, or existing goldens
 - Enabling/disabling debug
 - Selecting an LLM/embeddings provider (OpenAI, Azure OpenAI, Gemini, Grok, DeepSeek, LiteLLM, local/Ollama)
 - Setting/unsetting provider-specific options (model, endpoint, deployment, etc.)
@@ -41,6 +43,116 @@ deepeval also uses a legacy JSON keystore at `.deepeval/.deepeval` for **non-sec
 To disable dotenv autoloading (useful in pytest/CI to avoid loading local `.env*` files on import), set `DEEPEVAL_DISABLE_DOTENV=1`.
 :::
 
+## Core Commands
+
+### `generate`
+
+Use `deepeval generate` to generate synthetic goldens from the terminal with the Golden Synthesizer. The command requires two selectors:
+
+- `--method`: where goldens come from: `docs`, `contexts`, `scratch`, or `goldens`
+- `--variation`: what to generate: `single-turn` or `multi-turn`
+
+Generate single-turn goldens from documents:
+
+```bash
+deepeval generate \
+  --method docs \
+  --variation single-turn \
+  --documents example.txt \
+  --documents another.pdf \
+  --output-dir ./synthetic_data
+```
+
+Generate multi-turn goldens from scratch:
+
+```bash
+deepeval generate \
+  --method scratch \
+  --variation multi-turn \
+  --num-goldens 25 \
+  --scenario-context "Users asking support questions" \
+  --conversational-task "Help users solve product issues" \
+  --participant-roles "User and assistant"
+```
+
+Common options:
+
+| Option                                       | Description                                                                  |
+| -------------------------------------------- | ---------------------------------------------------------------------------- |
+| `--method docs\|contexts\|scratch\|goldens`  | Select the generation method.                                                |
+| `--variation single-turn\|multi-turn`        | Select whether to generate `Golden`s or `ConversationalGolden`s.             |
+| `--output-dir`                               | Directory where generated goldens are saved. Defaults to `./synthetic_data`. |
+| `--file-type json\|csv\|jsonl`               | Output file type. Defaults to `json`.                                        |
+| `--file-name`                                | Optional output filename without extension.                                  |
+| `--model`                                    | Model to use for generation.                                                 |
+| `--async-mode / --sync-mode`                 | Enable or disable concurrent generation.                                     |
+| `--max-concurrent`                           | Maximum number of concurrent generation tasks.                               |
+| `--include-expected / --no-include-expected` | Generate or skip expected outputs/outcomes.                                  |
+| `--cost-tracking`                            | Print generation cost when supported by the model.                           |
+
+Method-specific options:
+
+| Method     | Required Options                     | Useful Optional Options                                                                                                                                                                                               |
+| ---------- | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `docs`     | `--documents`                        | `--max-goldens-per-context`, `--max-contexts-per-document`, `--min-contexts-per-document`, `--chunk-size`, `--chunk-overlap`, `--context-quality-threshold`, `--context-similarity-threshold`, `--max-retries`        |
+| `contexts` | `--contexts-file`                    | `--max-goldens-per-context`                                                                                                                                                                                           |
+| `scratch`  | `--num-goldens` plus styling options | Single-turn: `--scenario`, `--task`, `--input-format`, `--expected-output-format`. Multi-turn: `--scenario-context`, `--conversational-task`, `--participant-roles`, `--scenario-format`, `--expected-outcome-format` |
+| `goldens`  | `--goldens-file`                     | `--max-goldens-per-golden`                                                                                                                                                                                            |
+
+For a deeper walkthrough, see the [Golden Synthesizer](/docs/golden-synthesizer#generate-goldens-from-the-cli) docs.
+
+### `test`
+
+Use `deepeval test run` to run evaluation test files through `pytest` with the `deepeval` pytest plugin enabled.
+
+```bash
+deepeval test --help
+deepeval test run --help
+```
+
+Run a single test file:
+
+```bash
+deepeval test run test_chatbot.py
+```
+
+Run a test directory:
+
+```bash
+deepeval test run tests/evals
+```
+
+Run a specific test:
+
+```bash
+deepeval test run test_chatbot.py::test_answer_relevancy
+```
+
+Useful options:
+
+| Option                           | Description                                                    |
+| -------------------------------- | -------------------------------------------------------------- |
+| `--verbose`, `-v`                | Show verbose pytest output and turn on deepeval verbose mode.  |
+| `--exit-on-first-failure`, `-x`  | Stop after the first failed test.                              |
+| `--show-warnings`, `-w`          | Show pytest warnings instead of disabling them.                |
+| `--identifier`, `-id`            | Attach an identifier to the test run.                          |
+| `--num-processes`, `-n`          | Run tests with multiple pytest-xdist processes.                |
+| `--repeat`, `-r`                 | Rerun each test case the specified number of times.            |
+| `--use-cache`, `-c`              | Use cached evaluation results when `--repeat` is not set.      |
+| `--ignore-errors`, `-i`          | Continue when deepeval evaluation errors occur.                |
+| `--skip-on-missing-params`, `-s` | Skip test cases with missing metric parameters.                |
+| `--display`, `-d`                | Control final result display. Defaults to showing all results. |
+| `--mark`, `-m`                   | Run tests matching a pytest marker expression.                 |
+
+You can pass additional pytest flags after the `deepeval` options. For example:
+
+```bash
+deepeval test run tests/evals \
+  --mark "not slow" \
+  --exit-on-first-failure \
+  -- --tb=short
+```
+
 ## Confident AI Commands
 
 Use these commands to connect `deepeval` to **Confident AI** (`deepeval` Cloud) so your local evaluations can be uploaded, organized, and viewed as rich test run reports on the cloud. If you don’t have an account yet, [sign up here](https://app.confident-ai.com).
@@ -74,16 +186,18 @@ For example, **&#36;3 / MTok = 0.000003**.
 :::
 
 To set the model and token cost for Anthropic you would run:
+
 ```bash
 deepeval set-anthropic -m claude-3-7-sonnet-latest -i 0.000003 -o 0.000015 --save=dotenv
 Saved environment variables to .env.local (ensure it's git-ignored).
 🙌 Congratulations! You're now using Anthropic `claude-3-7-sonnet-latest` for all evals that require an LLM.
 ```
 
 To view your settings for Anthropic you would run:
+
 ```bash
 deepeval settings -l anthropic
-                                                                                Settings                                                                        
+                                                                                Settings
 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
 ┃ Name                            ┃ Value                    ┃ Description                                                                                      ┃
 ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
@@ -95,26 +209,6 @@ deepeval settings -l anthropic
 └─────────────────────────────────┴──────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────┘
 ```
 
-## Core Commands
-
-### `login` & `logout`
-
-- `deepeval login [--save=dotenv[:path]]` (interactive prompt)
-- `deepeval logout [--save=dotenv[:path]]`: Clears keys from the JSON keystore and removes them from the chosen dotenv file.
-
-### `view`
-
-- `deepeval view`: Opens the latest test run in your browser. If needed, uploads artifacts first.
-
-### `test`
-
-The CLI includes a test sub-app for running E2E examples and fixtures. Usage varies, so consult the built-in help:
-
-```bash
-deepeval test --help
-deepeval test <command> --help
-```
-
 ## Debug Controls
 
 Use these to turn on structured logs, gRPC wire tracing, and Confident tracing (all optional).
@@ -140,9 +234,10 @@ To see all available debug flags, run `deepeval set-debug --help`.
 
 :::tip
 To filter (substring match) settings by name displaying each setting's current value and description run:
+
 ```bash
 deepeval settings -l log-level
-                                                            Settings                                                     
+                                                            Settings
 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
 ┃ Name                            ┃ Value ┃ Description                                                                  ┃
 ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
@@ -151,6 +246,7 @@ deepeval settings -l log-level
 │ LOG_LEVEL                       │ 40    │ Global logging level (e.g. DEBUG/INFO/WARNING/ERROR/CRITICAL or numeric).    │
 └─────────────────────────────────┴───────┴──────────────────────────────────────────────────────────────────────────────┘
 ```
+
 :::
 
 To restore defaults and clean persisted values:
@@ -204,7 +300,6 @@ If you want to see what environment variables `deepeval` manages under the hood,
 | Local (HTTP)          | `set-local-embeddings`       | `unset-local-embeddings`       |
 | Ollama                | `set-ollama-embeddings`      | `unset-ollama-embeddings`      |
 
-
 :::tip
 For provider-specific flags, run `deepeval set-<provider> --help`.
 :::
@@ -215,4 +310,4 @@ For provider-specific flags, run `deepeval set-<provider> --help`.
 - **Provider still active after unsetting?** Unsetting turns off target provider `USE_*` flags; if a provider remains enabled and properly configured it will become the active provider. If no provider is enabled, but OpenAI credentials are present, OpenAI may be used as a fallback. To force a provider, run the corresponding `set-<provider>` command.
 - **Dotenv edits not picked up?** deepeval loads dotenv files from the current working directory by default, or `ENV_DIR_PATH` if set. Ensure your Python process runs in that context.
 
-If you’re still stuck, the dedicated [Troubleshooting](/docs/troubleshooting) page covers deeper debugging (TLS errors, logging, timeouts, dotenv loading, and config caching).
+If you’re still stuck, the dedicated [Troubleshooting](/docs/troubleshooting) page covers deeper debugging (TLS errors, logging, timeouts, dotenv loading, and config caching).
diff --git a/docs/content/docs/golden-synthesizer/index.mdx b/docs/content/docs/golden-synthesizer/index.mdx
@@ -180,118 +180,6 @@ Here's an example of what the resulting DataFrame might look like for a single-t
 
 And that's it! You now have access to a list of synthetic goldens generated using information from your knowledge base.
 
-## Generate Goldens From The CLI
-
-You can also generate synthetic goldens directly from the command line with `deepeval generate`. The two required selectors are:
-
-- `--method`: choose where the goldens come from: `docs`, `contexts`, `scratch`, or `goldens`.
-- `--variation`: choose the golden type: `single-turn` or `multi-turn`.
-
-The shortest document-based example looks like this:
-
-```bash
-deepeval generate \
-  --method docs \
-  --variation single-turn \
-  --documents example.txt \
-  --output-dir ./synthetic_data
-```
-
-### CLI Examples
-
-Generate goldens from documents:
-
-```bash
-deepeval generate \
-  --method docs \
-  --variation single-turn \
-  --documents example.txt \
-  --documents another.pdf \
-  --max-goldens-per-context 2
-```
-
-Generate goldens from prepared contexts:
-
-```bash
-deepeval generate \
-  --method contexts \
-  --variation single-turn \
-  --contexts-file contexts.json \
-  --max-goldens-per-context 2
-```
-
-The `contexts.json` file should contain a list of context lists:
-
-```json
-[
-  ["context chunk 1", "context chunk 2"],
-  ["another context chunk"]
-]
-```
-
-Generate goldens from scratch:
-
-```bash
-deepeval generate \
-  --method scratch \
-  --variation single-turn \
-  --num-goldens 25 \
-  --scenario "Non-technical users querying a database" \
-  --task "Answer text-to-SQL questions" \
-  --input-format "Questions in English" \
-  --expected-output-format "SQL query"
-```
-
-For multi-turn scratch generation, use the conversational styling options:
-
-```bash
-deepeval generate \
-  --method scratch \
-  --variation multi-turn \
-  --num-goldens 25 \
-  --scenario-context "Non-technical users querying a database" \
-  --conversational-task "Help users query data" \
-  --participant-roles "User and assistant"
-```
-
-Generate more goldens from an existing golden file:
-
-```bash
-deepeval generate \
-  --method goldens \
-  --variation single-turn \
-  --goldens-file existing_goldens.json \
-  --max-goldens-per-golden 2
-```
-
-### CLI Options Reference
-
-Common options:
-
-| Option | Description |
-| --- | --- |
-| `--method docs\|contexts\|scratch\|goldens` | Selects the generation method. |
-| `--variation single-turn\|multi-turn` | Selects whether to generate `Golden`s or `ConversationalGolden`s. |
-| `--output-dir` | Directory for the generated file. Defaults to `./synthetic_data`. |
-| `--file-type json\|csv\|jsonl` | Output file type. Defaults to `json`. |
-| `--file-name` | Optional output filename without extension. |
-| `--model` | Model to use for generation. |
-| `--async-mode / --sync-mode` | Enables or disables concurrent generation. |
-| `--max-concurrent` | Maximum number of concurrent generation tasks. |
-| `--include-expected / --no-include-expected` | Generates or skips expected outputs/outcomes. |
-| `--cost-tracking` | Prints generation cost when supported by the model. |
-
-Method-specific options:
-
-| Method | Required Options | Useful Optional Options |
-| --- | --- | --- |
-| `docs` | `--documents` | `--max-goldens-per-context`, `--max-contexts-per-document`, `--min-contexts-per-document`, `--chunk-size`, `--chunk-overlap`, `--context-quality-threshold`, `--context-similarity-threshold`, `--max-retries` |
-| `contexts` | `--contexts-file` | `--max-goldens-per-context` |
-| `scratch` | `--num-goldens` plus styling options | Single-turn: `--scenario`, `--task`, `--input-format`, `--expected-output-format`. Multi-turn: `--scenario-context`, `--conversational-task`, `--participant-roles`, `--scenario-format`, `--expected-outcome-format` |
-| `goldens` | `--goldens-file` | `--max-goldens-per-golden` |
-
-For deeper configuration details, see the method-specific pages for [generating from docs](/docs/synthesizer-generate-from-docs), [contexts](/docs/synthesizer-generate-from-contexts), [scratch](/docs/synthesizer-generate-from-scratch), and [goldens](/docs/synthesizer-generate-from-goldens).
-
 ## Save Your Synthetic Dataset
 
 <Tabs items={["Confident AI", "Locally"]}>
diff --git a/docs/lib/generated/contributors.json b/docs/lib/generated/contributors.json
@@ -2396,7 +2396,7 @@
       "name": "Jeffrey Ip",
       "avatarUrl": "https://avatars.githubusercontent.com/u/143328635?v=4",
       "url": "https://github.com/penguine-ip",
-      "commits": 26
+      "commits": 27
     },
     {
       "login": "kritinv",

Original file line number	Diff line number	Diff line change
`@@ -2396,7 +2396,7 @@`
`2396`	`2396`	`"name": "Jeffrey Ip",`
`2397`	`2397`	`"avatarUrl": "https://avatars.githubusercontent.com/u/143328635?v=4",`
`2398`	`2398`	`"url": "https://github.com/penguine-ip",`
`2399`		`- "commits": 26`
	`2399`	`+ "commits": 27`
`2400`	`2400`	`},`
`2401`	`2401`	`{`
`2402`	`2402`	`"login": "kritinv",`