feat(npu): Add 4 Ascend NPU custom operators for sparse video inference#1157
feat(npu): Add 4 Ascend NPU custom operators for sparse video inference#1157liaopingbo wants to merge 7 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for the Ascend NPU platform by implementing several key operators, including a blockwise Rainfusion attention module, NPU-specific LayerNorm and RMSNorm layers, and an NPU Rotary Position Embedding (RoPE) implementation. The feedback highlights several critical issues in the Rainfusion_blockwise module, such as a potential runtime error from calling .view() on a transposed tensor, an inaccurate calculation of protect_len when sequence lengths are not multiples of the pool size, and the presence of unused code (do_tensor_pooling). Additionally, using torch.cat instead of torch.concat is recommended for consistency.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for Ascend NPU operations, including the implementation of Rainfusion blockwise attention, NPU layer normalization, NPU RMS normalization, and NPU rotary position embedding (RoPE). The review feedback highlights critical issues where calling .view() on non-contiguous tensors (resulting from transpose or slicing operations) can cause runtime errors, suggesting the use of .reshape() instead. Additionally, it is recommended to replace torch.concat with torch.cat across the codebase for consistency and compatibility with older PyTorch versions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces Ascend NPU platform-specific operations, including Rainfusion blockwise attention, layer and RMS normalization, and rotary position embeddings. The review feedback highlights several safety and robustness improvements: replacing .view() with .reshape() on sliced tensors to avoid non-contiguous memory crashes, ensuring tensors are contiguous before calling torch.view_as_complex, and replacing runtime validation assert statements with explicit ValueError exceptions to prevent them from being optimized away.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if num_full_blocks > 0: | ||
| full_blocks = input_tensor[:, : num_full_blocks * pool_size, :, :] | ||
| full_blocks_reshaped = full_blocks.view(batch, num_full_blocks, pool_size, headnum, dim) | ||
| pooled_tensors.append(full_blocks_reshaped.mean(dim=2)) | ||
| if tail_size > 0: | ||
| tail_block = input_tensor[:, num_full_blocks * pool_size :, :, :] | ||
| tail_reshaped = tail_block.view(batch, 1, tail_size, headnum, dim) | ||
| pooled_tensors.append(tail_reshaped.mean(dim=2)) |
There was a problem hiding this comment.
Using .view() on sliced tensors (like full_blocks and tail_block) can cause runtime crashes if the tensor is not contiguous in memory. Slicing along the sequence dimension often results in non-contiguous memory layouts. It is safer to use .reshape() instead of .view(), as .reshape() automatically handles non-contiguous tensors by copying them if necessary.
| if num_full_blocks > 0: | |
| full_blocks = input_tensor[:, : num_full_blocks * pool_size, :, :] | |
| full_blocks_reshaped = full_blocks.view(batch, num_full_blocks, pool_size, headnum, dim) | |
| pooled_tensors.append(full_blocks_reshaped.mean(dim=2)) | |
| if tail_size > 0: | |
| tail_block = input_tensor[:, num_full_blocks * pool_size :, :, :] | |
| tail_reshaped = tail_block.view(batch, 1, tail_size, headnum, dim) | |
| pooled_tensors.append(tail_reshaped.mean(dim=2)) | |
| if num_full_blocks > 0: | |
| full_blocks = input_tensor[:, : num_full_blocks * pool_size, :, :] | |
| full_blocks_reshaped = full_blocks.reshape(batch, num_full_blocks, pool_size, headnum, dim) | |
| pooled_tensors.append(full_blocks_reshaped.mean(dim=2)) | |
| if tail_size > 0: | |
| tail_block = input_tensor[:, num_full_blocks * pool_size :, :, :] | |
| tail_reshaped = tail_block.reshape(batch, 1, tail_size, headnum, dim) | |
| pooled_tensors.append(tail_reshaped.mean(dim=2)) |
| xq_fp32 = torch.view_as_complex(xq[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2)) | ||
| xk_fp32 = torch.view_as_complex(xk[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2)) |
There was a problem hiding this comment.
torch.view_as_complex requires the input tensor to be contiguous in memory. If xq or xk is already in torch.float32 and is non-contiguous, .to(torch.float32) will be a no-op, and the subsequent .reshape() may return a non-contiguous tensor, causing a runtime crash. Calling .contiguous() before torch.view_as_complex ensures safety.
| xq_fp32 = torch.view_as_complex(xq[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2)) | |
| xk_fp32 = torch.view_as_complex(xk[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2)) | |
| xq_fp32 = torch.view_as_complex(xq[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2).contiguous()) | |
| xk_fp32 = torch.view_as_complex(xk[:seq_len].to(torch.float32).reshape(seq_len, n, -1, 2).contiguous()) |
| x = attention_forward(q, k, v, opt_mode="manual", op_type="ascend_laser_attention", layout="BNSD") | ||
| else: | ||
| batch, qSeqlen, numHeads, headDim = q.shape | ||
| assert batch == 1, "Rainfusion_blockwise currently only supports batch size 1." |
There was a problem hiding this comment.
Using assert statements for runtime input validation is discouraged because they can be optimized away when Python is run with the -O (optimize) flag. This would bypass the batch size check and could lead to unexpected behavior or crashes later in the execution. Please replace this with an explicit if check and raise a ValueError.
| assert batch == 1, "Rainfusion_blockwise currently only supports batch size 1." | |
| if batch != 1: | |
| raise ValueError("Rainfusion_blockwise currently only supports batch size 1.") |
|
你好,感谢你的贡献。 请问,你的Rainfusion_blockwise是用来做什么的?它的格式没有和lightx2v的attention格式对齐(比如:https://github.com/ModelTC/LightX2V/blob/main/lightx2v/common/ops/attn/sla_attn.py) |
|
你好!
Rainfusion_blockwise是用RainFusion sparse attention算法,在NPU场景下可提升性能。我稍后调整下格式和sla_attn.py对齐。
谢谢!
***@***.***
From: Yang Yong (雍洋)
Date: 2026-06-16 18:43
To: ModelTC/LightX2V
CC: liaopingbo; Author
Subject: Re: [ModelTC/LightX2V] feat(npu): Add 4 Ascend NPU custom operators for sparse video inference (PR #1157)
helloyongyang left a comment (ModelTC/LightX2V#1157)
你好,感谢你的贡献。
请问,你的Rainfusion_blockwise是用来做什么的?它的格式没有和lightx2v的attention格式对齐(比如:https://github.com/ModelTC/LightX2V/blob/main/lightx2v/common/ops/attn/sla_attn.py)
,也没有看到pr中有任何使用到它的地方。
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications, keep track of coding agent tasks and review pull requests on the go with GitHub Mobile for iOS and Android. Download it today!
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Add 4 Ascend NPU custom operators for sparse video inference: