[DeepSeek] Fix weight_dequant kwargs in fixup_moe_expert_amax

cjluo-nv · cjluo-nv · commit 3e6b05ec50bb · 2026-05-01T20:11:08.000Z
weight_dequant(x, s, block_size=128, dtype=...) — the third positional arg is block_size, not dtype. Passing torch.bfloat16 there sets block_size to the dtype object, which would either fail inside the triton kernel or compute amax over corrupt blocks for any uncalibrated expert. The bug never fired in our validation run because every expert was activated during calibration (top-k over 256 experts × 1024 samples), so the _missing(wq.amax) branch was dead. Spotted by bot review on PR #1380. Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
diff --git a/examples/deepseek/ptq.py b/examples/deepseek/ptq.py
@@ -290,7 +290,9 @@ def _missing(amax):
                     continue
                 # DeepSeek stores experts as FP8 with a per-block .scale; dequantize
                 # to bf16 first so we measure the real weight distribution, not bytes.
-                deq = weight_dequant(w, w.scale, torch.bfloat16) if w.element_size() == 1 else w
+                deq = (
+                    weight_dequant(w, w.scale, dtype=torch.bfloat16) if w.element_size() == 1 else w
+                )
                 axis = getattr(wq, "_axis", None)
                 if axis is None:
                     reduce_axis = None