Commit ae965a9
add: HF PTQ support and modelopt_recipes mount in launcher (#1089)
- Add common/hf_ptq/hf_ptq.sh script for running hf_ptq.py directly
- Add Qwen3-8B hf_ptq_local.yaml example config
- Mount modelopt_recipes alongside modelopt in both Slurm and Docker
executors so modelopt.recipe imports work with the overlay
- Update default container to tensorrt-llm/release:1.3.0rc2
### What does this PR do?
Type of change: ? <!-- Use one of the following: Bug fix, new feature,
new example, new tests, documentation. -->
<!-- Details about the change. -->
### Usage
```python
# Add a code snippet demonstrating how to use this
```
### Testing
<!-- Mention how have you tested your change if applicable. -->
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain
why. -->
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A
<!--- Mandatory -->
- Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory
for new features or examples. -->
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes
or backward incompatible changes. -->
### Additional Information
<!-- E.g. related issue. -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added a Hugging Face PTQ workflow to run post-training quantization
for models.
* Added a local single-GPU pipeline for Qwen3-8B with fp8 quantization
and export support.
* Added a small command-line launcher to invoke the PTQ workflow with
configurable model, quantization, calibration, and export options.
* **Chores**
* Updated the default runtime container image to a newer release.
* Included model-optimization recipe files in container mounts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 174f3a2 commit ae965a9
4 files changed
Lines changed: 82 additions & 1 deletion
File tree
- tools/launcher
- common/hf_ptq
- examples/Qwen/Qwen3-8B
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
235 | 235 | | |
236 | 236 | | |
237 | 237 | | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
238 | 246 | | |
239 | 247 | | |
240 | 248 | | |
| 249 | + | |
241 | 250 | | |
242 | 251 | | |
243 | 252 | | |
| |||
291 | 300 | | |
292 | 301 | | |
293 | 302 | | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
294 | 308 | | |
295 | 309 | | |
296 | 310 | | |
297 | 311 | | |
298 | 312 | | |
| 313 | + | |
299 | 314 | | |
300 | 315 | | |
301 | 316 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| |||
0 commit comments