Commit 168cd82
Add qwen3 moe experts only test (#1274)
## Summary
- Add unit test for Qwen3 MoE HF export with `NVFP4_EXPERTS_ONLY_CFG`
quantization config
- Verifies that `hf_quant_config.json` correctly reports `quant_algo:
NVFP4` and that non-expert modules (`self_attn`, `lm_head`) appear in
`exclude_modules` while routed expert layers (`mlp.experts.*`) do not
- Reference:
https://huggingface.co/nvidia/Qwen3.5-397B-A17B-NVFP4/blob/main/hf_quant_config.json
Type of change: New tests
### Known issue
On `transformers>=5.0`, fused MoE experts (`_QuantFusedExperts`) are not
recognized by `get_quant_config`, causing `quant_algo=None` in the
exported config. This test currently **fails** on transformers 5.x and
is intended to be fixed by a follow-up change.
## Testing
- **transformers 4.57.6**: PASSED
- **transformers 5.5.4**: FAILED (`quant_algo` is `None` due to fused
expert export gap)
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(...,
weights_only=False)`, `pickle`, etc.).
- Is this change backward compatible?: ✅
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: N/A
- Did you write any new necessary tests?: ✅
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
N/A
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Tests**
* Added GPU test coverage for exporting Qwen3 Mixture-of-Experts models
with NVFP4 quantization.
* Verifies the exported checkpoint records the NVFP4 quantization
algorithm and that module exclusion patterns correctly exclude attention
and LM head components while not excluding routed expert paths.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>1 parent 3ad4f4f commit 168cd82
1 file changed
Lines changed: 54 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
16 | 19 | | |
17 | 20 | | |
18 | 21 | | |
| |||
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
| 35 | + | |
32 | 36 | | |
33 | 37 | | |
34 | 38 | | |
| |||
53 | 57 | | |
54 | 58 | | |
55 | 59 | | |
| 60 | + | |
56 | 61 | | |
57 | 62 | | |
58 | 63 | | |
59 | 64 | | |
60 | 65 | | |
61 | 66 | | |
62 | 67 | | |
| 68 | + | |
63 | 69 | | |
64 | 70 | | |
65 | 71 | | |
| |||
466 | 472 | | |
467 | 473 | | |
468 | 474 | | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
0 commit comments