Commit 1d6ec89
authored
[OMNIML-3495] Add TEGroupedMLP export support for NemotronH models (#967)
### What does this PR do?
Type of change: New feature
Add export support for `TEGroupedMLP` (fused grouped GEMM experts) in
the MCore-to-HuggingFace checkpoint exporter. Previously, the exporter
only supported `SequentialMLP` (which has `local_experts` as a
`ModuleList`). `TEGroupedMLP` stores per-expert weights as `weight0`,
`weight1`, ..., `weight{N-1}` in a single `TEGroupedLinear` module
instead. This caused an `AttributeError: 'QuantTEGroupedMLP' object has
no attribute 'local_experts'` when exporting NemotronH models.
Changes:
- Add `GroupedMLPSlicing` class in `mcore_custom.py` — the export
counterpart of `GroupedMLPMerging`
- Add `_grouped_mlp_slicing` method in `GPTModelExporter` that iterates
`TEGroupedLinear`'s per-expert weights and exports them as individual
HF-format weights with proper quantization scale handling
- Add `"experts.linear_fc1"` and `"experts.linear_fc2"` rules using
`GroupedMLPSlicing` to `nemotron_h_causal_lm_export`
- Route `TEGroupedMLP` (detected by absence of `local_experts`
attribute) to the new `"experts.linear_fc1"` rule in
`_get_transformer_layer_state_dict`
### Usage
No API change. NemotronH models using `TEGroupedMLP` can now be
exported:
```python
import modelopt.torch.export as mtex
mtex.export_mcore_gpt_to_hf(
model=megatron_model,
export_dir="/path/to/hf_export",
pretrained_model_name_or_path="/path/to/hf_model",
)
```
### Testing
Inside Model-Bridge
```
torchrun --nproc_per_node 4 examples/quantization/export.py \
--hf-model-id /models/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/ \
--megatron-load-path /models/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4-MLM \
--export-dir /models/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4-MLM_hf \
--pp 4 \
--dtype bfloat16 \
--trust-remote-code
```
### Before your PR is "*Ready for review*"
Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)
and your commits are signed (`git commit -s -S`).
Make sure you read and follow the [Security Best
Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors)
(e.g. avoiding hardcoded `trust_remote_code=True`, using
`torch.load(..., weights_only=True)`, avoiding `pickle`, etc.).
- Is this change backward compatible?: ✅ The existing `SequentialMLP`
(`local_experts`) path is guarded by `hasattr(layer.mlp.experts,
"local_experts")` and remains unchanged. The new `TEGroupedMLP` path
only activates when `local_experts` is absent and `"experts.linear_fc1"`
is defined in the architecture's rules.
- If you copied code from any other source, did you follow IP policy in
[CONTRIBUTING.md](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md#-copying-code-from-other-sources)?:
N/A
- Did you write any new necessary tests?: ❌ Tested manually with
Nemotron-3-Nano-30B-A3B. Unit test coverage should be added for
`_grouped_mlp_slicing`.
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
❌ New feature for a specific model architecture.
### Additional Information
- The import counterpart (`GroupedMLPMerging` / `_grouped_mlp_merging`)
was added by @jennifchen in PR #830. This PR completes the round-trip by
adding the export side.
- `_grouped_mlp_slicing` temporarily assigns `module.weight =
module.weight0` so that `_get_quantized_state` can extract
qformat/scales from the module's quantizers, then removes it afterward.
This follows the same pattern used by `_QuantTEGroupedLinear._setup()`
in the quantization plugin.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Export now supports grouped-expert MLP slicing to split fused expert
weights into per-expert tensors for downstream formats.
* Per-expert export logic enhanced with clear fallbacks between packed
and per-expert layouts, including a grouped-MLP export path.
* Nemotron H causal LM import/export mappings updated to better align
with grouped local-expert exports.
* Added fused-normalization export support and safer handling when
loading remote model code.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: James Shen <yueshen@nvidia.com>1 parent 2bb404e commit 1d6ec89
3 files changed
Lines changed: 127 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
118 | 130 | | |
119 | 131 | | |
120 | 132 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
| 129 | + | |
128 | 130 | | |
129 | 131 | | |
130 | 132 | | |
| |||
147 | 149 | | |
148 | 150 | | |
149 | 151 | | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
150 | 155 | | |
151 | 156 | | |
152 | 157 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
285 | 285 | | |
286 | 286 | | |
287 | 287 | | |
288 | | - | |
| 288 | + | |
| 289 | + | |
289 | 290 | | |
290 | 291 | | |
291 | 292 | | |
292 | 293 | | |
293 | 294 | | |
294 | 295 | | |
295 | | - | |
| 296 | + | |
| 297 | + | |
296 | 298 | | |
297 | 299 | | |
298 | 300 | | |
| |||
420 | 422 | | |
421 | 423 | | |
422 | 424 | | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
423 | 432 | | |
424 | 433 | | |
425 | 434 | | |
| |||
458 | 467 | | |
459 | 468 | | |
460 | 469 | | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
461 | 479 | | |
462 | 480 | | |
463 | 481 | | |
| |||
473 | 491 | | |
474 | 492 | | |
475 | 493 | | |
476 | | - | |
477 | | - | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
478 | 505 | | |
479 | | - | |
| 506 | + | |
480 | 507 | | |
481 | 508 | | |
482 | | - | |
| 509 | + | |
483 | 510 | | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | | - | |
488 | | - | |
489 | | - | |
490 | | - | |
491 | | - | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
492 | 518 | | |
493 | 519 | | |
494 | 520 | | |
| |||
529 | 555 | | |
530 | 556 | | |
531 | 557 | | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
532 | 566 | | |
533 | 567 | | |
534 | 568 | | |
| |||
655 | 689 | | |
656 | 690 | | |
657 | 691 | | |
| 692 | + | |
658 | 693 | | |
659 | 694 | | |
660 | 695 | | |
| |||
855 | 890 | | |
856 | 891 | | |
857 | 892 | | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
858 | 954 | | |
859 | 955 | | |
860 | 956 | | |
| |||
0 commit comments