feat(npu): Add 4 Ascend NPU custom operators for sparse video inference by liaopingbo · Pull Request #1157 · ModelTC/LightX2V

liaopingbo · 2026-06-16T10:00:10Z

Add 4 Ascend NPU custom operators for sparse video inference:

sparse attention
npu_layer_norm
npu_rms_norm
npu_rope

gemini-code-assist

Code Review

This pull request adds support for the Ascend NPU platform by implementing several key operators, including a blockwise Rainfusion attention module, NPU-specific LayerNorm and RMSNorm layers, and an NPU Rotary Position Embedding (RoPE) implementation. The feedback highlights several critical issues in the Rainfusion_blockwise module, such as a potential runtime error from calling .view() on a transposed tensor, an inaccurate calculation of protect_len when sequence lengths are not multiples of the pool size, and the presence of unused code (do_tensor_pooling). Additionally, using torch.cat instead of torch.concat is recommended for consistency.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request introduces support for Ascend NPU operations, including the implementation of Rainfusion blockwise attention, NPU layer normalization, NPU RMS normalization, and NPU rotary position embedding (RoPE). The review feedback highlights critical issues where calling .view() on non-contiguous tensors (resulting from transpose or slicing operations) can cause runtime errors, suggesting the use of .reshape() instead. Additionally, it is recommended to replace torch.concat with torch.cat across the codebase for consistency and compatibility with older PyTorch versions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

liaopingbo · 2026-06-16T10:22:48Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces Ascend NPU platform-specific operations, including Rainfusion blockwise attention, layer and RMS normalization, and rotary position embeddings. The review feedback highlights several safety and robustness improvements: replacing .view() with .reshape() on sliced tensors to avoid non-contiguous memory crashes, ensuring tensors are contiguous before calling torch.view_as_complex, and replacing runtime validation assert statements with explicit ValueError exceptions to prevent them from being optimized away.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T10:26:21Z

+        if num_full_blocks > 0:
+            full_blocks = input_tensor[:, : num_full_blocks * pool_size, :, :]
+            full_blocks_reshaped = full_blocks.view(batch, num_full_blocks, pool_size, headnum, dim)
+            pooled_tensors.append(full_blocks_reshaped.mean(dim=2))
+        if tail_size > 0:
+            tail_block = input_tensor[:, num_full_blocks * pool_size :, :, :]
+            tail_reshaped = tail_block.view(batch, 1, tail_size, headnum, dim)
+            pooled_tensors.append(tail_reshaped.mean(dim=2))


Using .view() on sliced tensors (like full_blocks and tail_block) can cause runtime crashes if the tensor is not contiguous in memory. Slicing along the sequence dimension often results in non-contiguous memory layouts. It is safer to use .reshape() instead of .view(), as .reshape() automatically handles non-contiguous tensors by copying them if necessary.

Suggested change

if num_full_blocks > 0:

full_blocks = input_tensor[:, : num_full_blocks * pool_size, :, :]

full_blocks_reshaped = full_blocks.view(batch, num_full_blocks, pool_size, headnum, dim)

pooled_tensors.append(full_blocks_reshaped.mean(dim=2))

if tail_size > 0:

tail_block = input_tensor[:, num_full_blocks * pool_size :, :, :]

tail_reshaped = tail_block.view(batch, 1, tail_size, headnum, dim)

pooled_tensors.append(tail_reshaped.mean(dim=2))

if num_full_blocks > 0:

full_blocks = input_tensor[:, : num_full_blocks * pool_size, :, :]

full_blocks_reshaped = full_blocks.reshape(batch, num_full_blocks, pool_size, headnum, dim)

pooled_tensors.append(full_blocks_reshaped.mean(dim=2))

if tail_size > 0:

tail_block = input_tensor[:, num_full_blocks * pool_size :, :, :]

tail_reshaped = tail_block.reshape(batch, 1, tail_size, headnum, dim)

pooled_tensors.append(tail_reshaped.mean(dim=2))

gemini-code-assist · 2026-06-16T10:26:21Z

+        xq_fp32 = torch.view_as_complex(xq[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2))
+        xk_fp32 = torch.view_as_complex(xk[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2))


torch.view_as_complex requires the input tensor to be contiguous in memory. If xq or xk is already in torch.float32 and is non-contiguous, .to(torch.float32) will be a no-op, and the subsequent .reshape() may return a non-contiguous tensor, causing a runtime crash. Calling .contiguous() before torch.view_as_complex ensures safety.

Suggested change

xq_fp32 = torch.view_as_complex(xq[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2))

xk_fp32 = torch.view_as_complex(xk[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2))

xq_fp32 = torch.view_as_complex(xq[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2).contiguous())

xk_fp32 = torch.view_as_complex(xk[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2).contiguous())

gemini-code-assist · 2026-06-16T10:26:21Z

+            x = attention_forward(q, k, v, opt_mode="manual", op_type="ascend_laser_attention", layout="BNSD")
+        else:
+            batch, qSeqlen, numHeads, headDim = q.shape
+            assert batch == 1, "Rainfusion_blockwise currently only supports batch size 1."


Using assert statements for runtime input validation is discouraged because they can be optimized away when Python is run with the -O (optimize) flag. This would bypass the batch size check and could lead to unexpected behavior or crashes later in the execution. Please replace this with an explicit if check and raise a ValueError.

Suggested change

assert batch == 1, "Rainfusion_blockwise currently only supports batch size 1."

if batch != 1:

raise ValueError("Rainfusion_blockwise currently only supports batch size 1.")

helloyongyang · 2026-06-16T10:43:08Z

你好，感谢你的贡献。

请问，你的Rainfusion_blockwise是用来做什么的？它的格式没有和lightx2v的attention格式对齐(比如：https://github.com/ModelTC/LightX2V/blob/main/lightx2v/common/ops/attn/sla_attn.py)
，也没有看到pr中有任何使用到它的地方。

liaopingbo · 2026-06-16T10:53:21Z

你好！ Rainfusion_blockwise是用RainFusion sparse attention算法，在NPU场景下可提升性能。我稍后调整下格式和sla_attn.py对齐。谢谢！ ***@***.*** From: Yang Yong (雍洋) Date: 2026-06-16 18:43 To: ModelTC/LightX2V CC: liaopingbo; Author Subject: Re: [ModelTC/LightX2V] feat(npu): Add 4 Ascend NPU custom operators for sparse video inference (PR #1157) helloyongyang left a comment (ModelTC/LightX2V#1157) 你好，感谢你的贡献。请问，你的Rainfusion_blockwise是用来做什么的？它的格式没有和lightx2v的attention格式对齐(比如：https://github.com/ModelTC/LightX2V/blob/main/lightx2v/common/ops/attn/sla_attn.py) ，也没有看到pr中有任何使用到它的地方。 — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today! You are receiving this because you authored the thread.Message ID: ***@***.***>

liaopingbo and others added 5 commits June 16, 2026 17:53

feat(npu): Add 4 Ascend NPU custom operators for sparse video inference

d4d0d37

feat(npu): Add 4 Ascend NPU custom operators for sparse video inference

3fb61d9

feat(npu): Add 4 Ascend NPU custom operators for sparse video inference

969b41a

feat(npu): Add 4 Ascend NPU custom operators for sparse video inference

525d729

feat(npu): Add 4 Ascend NPU custom operators for sparse video inference

06dec8d

liaopingbo marked this pull request as ready for review June 16, 2026 10:03

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

Update lightx2v_platform/ops/attn/ascend_npu/rainfusion_blockwise.py

bdd48bb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

feat(npu): Add 4 Ascend NPU custom operators for sparse video inference

b6d2a63

liaopingbo marked this pull request as draft June 16, 2026 10:21

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

helloyongyang marked this pull request as ready for review June 16, 2026 10:35

		xq_fp32 = torch.view_as_complex(xq[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2))
		xk_fp32 = torch.view_as_complex(xk[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2))

	assert batch == 1, "Rainfusion_blockwise currently only supports batch size 1."
	if batch != 1:
	raise ValueError("Rainfusion_blockwise currently only supports batch size 1.")

Conversation

liaopingbo commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

liaopingbo commented Jun 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

helloyongyang commented Jun 16, 2026

Uh oh!

liaopingbo commented Jun 16, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants