From e88dc21044f04da37d5d503f728632f830f22395 Mon Sep 17 00:00:00 2001
From: Tony Liu <tonyliu0512@gmail.com>
Date: Tue, 28 Apr 2026 07:14:14 +0800
Subject: [PATCH] [Hardware] DeepGEMM MoE: extend device gates to SM 12.x
 consumer Blackwell
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two parallel device-capability gates currently exclude SM 12.x
(consumer Blackwell — RTX 50-series and GB10 / DGX Spark) from the
DeepGEMM-backed MXFP4 MoE path:

1. `CudaPlatformBase.support_deep_gemm()` only accepts SM 90 (Hopper)
   and SM 100+ family (datacenter Blackwell), so `is_deep_gemm_supported()`
   returns False on SM 120/121.

2. `DeepGemmFP4Experts._supports_current_device()` further requires
   `is_device_capability_family(100)`, so even with the platform gate
   relaxed it still rejects SM 12.x.

Hardware reality: SM 120 / SM 121 use the same MMA family as datacenter
Blackwell for FP4 / FP8 matmuls (SM 10.x uses `tcgen05.*`, SM 12.x uses
`mma.*`, but at the Python-level dispatch they share the DeepGEMM MoE
oracle). For kernels DeepGEMM (or its forks like jasl/DeepGEMM with
SM 120 native ports) compile for SM 12.x, the wrappers should accept
the device.

This PR widens both gates to also accept `is_device_capability_family(120)`,
matching the comment intent in `support_deep_gemm` ("Hopper and Blackwell
GPUs are supported"). The kernel-level fallback to `tcgen05.*` is still
guarded by DeepGEMM's own dispatch, which now has paths for SM 12.x in
recent forks.

Verified locally on dual NVIDIA GB10 / SM 121 (DGX Spark): with this
change `is_deep_gemm_supported() == True` and `DeepGemmFP4Experts.
_supports_current_device() == True`. (Boot still requires DeepGEMM
itself to provide SM 12.x kernels for the specific operations the
deployment uses, which is independent of these vLLM-side gates.)

Companion to #41028 (Triton MXFP4 SM 12.x device-range fix) and #40923
(Marlin SM 12.x cubin).

Signed-off-by: Tony Liu <tonyliu0512@gmail.com>
---
 .../layers/fused_moe/experts/deep_gemm_moe.py              | 6 +++---
 vllm/platforms/cuda.py                                     | 7 +++++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/vllm/model_executor/layers/fused_moe/experts/deep_gemm_moe.py b/vllm/model_executor/layers/fused_moe/experts/deep_gemm_moe.py
index 3b354dd3ef13..4662bd690af3 100644
--- a/vllm/model_executor/layers/fused_moe/experts/deep_gemm_moe.py
+++ b/vllm/model_executor/layers/fused_moe/experts/deep_gemm_moe.py
@@ -355,9 +355,9 @@ def activation_format() -> mk.FusedMoEActivationFormat:
     def _supports_current_device() -> bool:
         from vllm.platforms import current_platform
 
-        return (
-            is_deep_gemm_supported()
-            and current_platform.is_device_capability_family(100)
+        return is_deep_gemm_supported() and (
+            current_platform.is_device_capability_family(100)
+            or current_platform.is_device_capability_family(120)
         )
 
     @staticmethod
diff --git a/vllm/platforms/cuda.py b/vllm/platforms/cuda.py
index 57814d29bef9..bfd84171c8c2 100644
--- a/vllm/platforms/cuda.py
+++ b/vllm/platforms/cuda.py
@@ -555,8 +555,11 @@ def support_static_graph_mode(cls) -> bool:
 
     @classmethod
     def support_deep_gemm(cls) -> bool:
-        """Currently, only Hopper and Blackwell GPUs are supported."""
-        return cls.is_device_capability(90) or cls.is_device_capability_family(100)
+        """Currently, Hopper, datacenter Blackwell (SM 100+) and consumer
+        Blackwell (SM 12x — RTX 50-series, GB10/DGX Spark) are supported."""
+        return (cls.is_device_capability(90)
+                or cls.is_device_capability_family(100)
+                or cls.is_device_capability_family(120))
 
     @classmethod
     def is_integrated_gpu(cls, device_id: int = 0) -> bool: