Skip to content

Commit 9f96df3

Browse files
committed
remove latent moe fp8
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
1 parent 2a0c852 commit 9f96df3

1 file changed

Lines changed: 0 additions & 23 deletions

File tree

modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -91,29 +91,6 @@ quantize:
9191
num_bits: e4m3
9292
axis:
9393

94-
# latent MOE down/up projections) -> FP8 per-tensor.
95-
# NOTE: only 3 layers quantized latent MOE to FP8, layers 1, 3, 5
96-
- quantizer_name: '*mixer.fc1_latent_proj*weight_quantizer'
97-
enable: true
98-
cfg:
99-
num_bits: e4m3
100-
axis:
101-
- quantizer_name: '*mixer.fc1_latent_proj*input_quantizer'
102-
enable: true
103-
cfg:
104-
num_bits: e4m3
105-
axis:
106-
- quantizer_name: '*mixer.fc2_latent_proj*weight_quantizer'
107-
enable: true
108-
cfg:
109-
num_bits: e4m3
110-
axis:
111-
- quantizer_name: '*mixer.fc2_latent_proj*input_quantizer'
112-
enable: true
113-
cfg:
114-
num_bits: e4m3
115-
axis:
116-
11794
# KV cache -> FP8.
11895
- quantizer_name: '*[kv]_bmm_quantizer'
11996
enable: true

0 commit comments

Comments
 (0)