Skip to content

Commit d541324

Browse files
authored
Disable QKV NVFP4 quantization for Qwen3 MOE (#735)
## What does this PR do? **Type of change:** ? Recipe improvement **Overview:** ? Disable QKV NVFP4 quantization for Qwen3 MOE models following the Qwen3 Next recipe for accuracy recovery ## Testing Model accuracy benchmarking Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
1 parent b655321 commit d541324

2 files changed

Lines changed: 2 additions & 1 deletion

File tree

examples/llm_ptq/example_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ def build_quant_cfg(
180180
quant_cfg["quant_cfg"]["*image*"] = {"enable": False}
181181
quant_cfg["quant_cfg"]["*vision*"] = {"enable": False}
182182

183-
if model_type == "qwen3next" and qformat == "nvfp4":
183+
if model_type in ["qwen3moe", "qwen3next"] and qformat == "nvfp4":
184184
# Disable the attention projection layers to retain accuracy
185185
quant_cfg["quant_cfg"]["model*.*attn*in_proj*"] = {"enable": False}
186186
quant_cfg["quant_cfg"]["model*.*attn*q_proj*"] = {"enable": False}

modelopt/torch/export/model_utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
"MPT": "mpt",
3030
"Bloom": "bloom",
3131
"ChatGLM": "chatglm",
32+
"Qwen3Moe": "qwen3moe",
3233
"Qwen3Next": "qwen3next",
3334
"QWen": "qwen",
3435
"RecurrentGemma": "recurrentgemma",

0 commit comments

Comments
 (0)