Add explain mode to HumanCliSolver#1652
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an optional explain=True mode to HumanCliSolver to improve auditability/debugging by printing structured context and outcome details around the human CLI interaction, while keeping default behavior unchanged.
Changes:
- Add
explainflag and structured explain-output helpers toHumanCliSolver. - Adjust CLI prompting behavior in explain mode (print context separately; prompt only for input).
- Add unit tests covering default-mode parity, explain-mode output, and postprocessor event recording.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
evals/solvers/human_cli_solver.py |
Adds explain-mode printing, refactors prompt construction, and returns prompt metadata in SolverResult. |
evals/solvers/human_cli_solver_test.py |
Adds regression + explain-mode tests and validates sampling/postprocessor event behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __init__( | ||
| self, | ||
| input_prompt: str = "assistant (you): ", | ||
| explain: bool = False, | ||
| postprocessors: list[str] = [], | ||
| registry: Any = None, |
There was a problem hiding this comment.
postprocessors uses a mutable list default ([]). This can lead to accidental state sharing if the list is ever mutated (now or in future refactors). Prefer postprocessors: list[str] | None = None and then pass postprocessors or [] to super().__init__.
| input_prompt: str = "assistant (you): ", | ||
| explain: bool = False, | ||
| postprocessors: list[str] = [], | ||
| registry: Any = None, | ||
| ): | ||
| """ | ||
| Args: | ||
| input_prompt: Prompt to be printed before the user input. | ||
| If None, no prompt is printed. | ||
| explain: If True, print structured context and output details for |
There was a problem hiding this comment.
The docstring says input_prompt can be None ("If None, no prompt is printed"), but the implementation treats it as a str and, in explain=True, passes it directly to input(). If input_prompt=None, input(None) will raise a TypeError. Either implement the None behavior (handle it in _get_prompt and cli_prompt, and type as str | None) or remove the None claim from the docstring.
| def __call__(self, task_state: TaskState, **kwargs) -> SolverResult: | ||
| res = self._solve(deepcopy(task_state), **kwargs) | ||
| raw_output = res.output | ||
|
|
||
| if hasattr(self, "postprocessors"): | ||
| for postprocessor in self.postprocessors: | ||
| prev_output = res.output | ||
| res = postprocessor(res) | ||
| record_event( | ||
| "postprocessor", | ||
| { | ||
| "name": postprocessor.__class__.__name__, | ||
| "input": prev_output, | ||
| "output": res.output, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
HumanCliSolver.__call__ duplicates the base Solver.__call__ postprocessor loop and record_event behavior. This increases maintenance risk if the base class changes (e.g., additional bookkeeping around postprocessors). Consider reusing Solver.__call__ by storing raw_answer (and optionally prompt) in SolverResult.metadata from _solve, then calling super().__call__ and printing explain output using metadata, rather than reimplementing the postprocessor application here.
Closes #1628.
Overview
This adds an optional
explain=Truemode toHumanCliSolverfor human-in-the-loop debugging and auditability.When enabled, the solver now prints structured sections around the interaction instead of only relying on the flattened CLI prompt:
Default behavior is unchanged when
explain=False.What changed
explainflag toHumanCliSolverValidation
I ran multiple focused checks before pushing:
python -m pytest evals/solvers/human_cli_solver_test.py -qpython -m pytest evals/solvers/solver_test.py evals/solvers/human_cli_solver_test.py -qgit diff --check