Commit 16203d6
[NVBug 6007314] Deprecate MT-Bench support, remove openai pin, and add NeMo Evaluator reference (#1116)
### What does this PR do?
Type of change: Deprecation, Bug fix, Documentation
Removes MT-Bench (FastChat) evaluation support from `examples/llm_eval`
and `examples/llm_ptq`. Also removes the stale `openai>=0.28.1` pin from
`requirements.txt` that caused dependency conflicts with TRT-LLM (see
[NVBug 6007314](https://nvbugspro.nvidia.com/bug/6007314)). Adds a NeMo
Evaluator section to the llm_eval README as the recommended evaluation
workflow for quantized checkpoints.
**Changes:**
- Delete `examples/llm_eval/run_fastchat.sh` and
`examples/llm_eval/gen_model_answer.py`
- Remove `mtbench` task from `examples/llm_ptq/scripts/parser.sh` and
`huggingface_example.sh`
- Remove `openai` dependency from `examples/llm_eval/requirements.txt`
- Add NeMo Evaluator section to `examples/llm_eval/README.md` as the
recommended way to evaluate quantized checkpoints from llm_ptq via
TensorRT-LLM, vLLM, or SGLang
- Update README docs in both `llm_eval` and `llm_ptq`
- Add deprecation note to CHANGELOG.rst for 0.43
### Usage
N/A — this is a removal and documentation update.
### Testing
N/A — removed code paths; no new functionality introduced.
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ❌ — MT-Bench evaluation via
`--tasks mtbench` is no longer supported.
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: N/A
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅
### Additional Information
Related: [NVBug 6007314](https://nvbugspro.nvidia.com/bug/6007314) —
openai dependency conflict caused by FastChat's `llm_judge` extra
pinning `openai<1`.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Deprecations**
* Removed MT-Bench (FastChat) evaluation support. NeMo Evaluator is now
the recommended approach for evaluating quantized model checkpoints
across multiple benchmarks.
* **Documentation**
* Updated evaluation guides to reflect NeMo Evaluator as the primary
evaluation method, with support for TensorRT-LLM, vLLM, and SGLang
serving backends.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>1 parent ae965a9 commit 16203d6
9 files changed
Lines changed: 7 additions & 725 deletions
File tree
- examples
- llm_eval
- llm_ptq
- scripts
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
93 | 92 | | |
94 | 93 | | |
95 | 94 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
7 | 11 | | |
8 | 12 | | |
9 | 13 | | |
| |||
143 | 147 | | |
144 | 148 | | |
145 | 149 | | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | 150 | | |
175 | 151 | | |
176 | 152 | | |
| |||
0 commit comments