Skip to content

Add the getter and setter of skip_fp8_weight_update_tensor#3015

Open
xrennvidia wants to merge 5 commits into
NVIDIA:mainfrom
xrennvidia:xren/fix_skip_fp8_weight_update
Open

Add the getter and setter of skip_fp8_weight_update_tensor#3015
xrennvidia wants to merge 5 commits into
NVIDIA:mainfrom
xrennvidia:xren/fix_skip_fp8_weight_update

Conversation

@xrennvidia
Copy link
Copy Markdown
Collaborator

@xrennvidia xrennvidia commented May 20, 2026

Description

The getter and setter of skip_fp8_weight_update_tensor were deleted in @pggPL 's PR2759, but MCore local Cuda Graph implementation still needs it (like here), so create this PR to recover it back.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Xiaowei Ren <xren@nvidia.com>
@xrennvidia xrennvidia requested a review from ksivaman as a code owner May 20, 2026 09:48
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This PR restores the get_skip_fp8_weight_update_tensor and set_skip_fp8_weight_update_tensor classmethods on FP8GlobalStateManager that were removed in PR #2759, which broke external consumers such as Megatron-LM's CUDA graph implementation.

  • quantization.py: Adds the two classmethods; the setter lazily initializes the backing CUDA float32 tensor on first call, making it also safe against the previously possible None.fill_() crash path.
  • graph.py: Replaces two sites of direct quantization_state.skip_fp8_weight_update_tensor attribute access with the new setter, improving encapsulation and gaining the null-safety benefit for free.

Confidence Score: 5/5

Safe to merge — the change is a narrow, focused restoration of a deleted public interface with no behavioral regressions.

The two new classmethods are simple wrappers around already-tested state; the setter's lazy initialization is an improvement over the previous bare .fill_() call that could have panicked if the tensor was None. Internal module files continue to access the attribute directly, which is consistent with how they worked before and is not broken by this change.

No files require special attention.

Important Files Changed

Filename Overview
transformer_engine/pytorch/quantization.py Adds set_skip_fp8_weight_update_tensor and get_skip_fp8_weight_update_tensor classmethods to FP8GlobalStateManager, restoring the public interface removed in PR #2759.
transformer_engine/pytorch/graph.py Replaces two direct quantization_state.skip_fp8_weight_update_tensor accesses with the new set_skip_fp8_weight_update_tensor setter, also gaining null-safety for the uninitialized-tensor edge case.

Sequence Diagram

sequenceDiagram
    participant MCore as MCore (cuda_graphs.py)
    participant FGSM as FP8GlobalStateManager
    participant State as FP8GlobalState

    Note over MCore,State: Setter — called during CUDA graph capture setup
    MCore->>FGSM: set_skip_fp8_weight_update_tensor(True/False)
    FGSM->>State: "create tensor if None (device=cuda, dtype=float32)"
    FGSM->>State: fill_(skip)

    Note over MCore,State: Getter — called to read current flag
    MCore->>FGSM: get_skip_fp8_weight_update_tensor()
    FGSM->>State: skip_fp8_weight_update_tensor
    State-->>MCore: Optional[torch.Tensor]
Loading

Reviews (4): Last reviewed commit: "Merge branch 'main' into xren/fix_skip_f..." | Re-trigger Greptile

Comment thread transformer_engine/pytorch/quantization.py Outdated
return type fix

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Xiaowei Ren <103958965+xrennvidia@users.noreply.github.com>
@ptrendx ptrendx requested a review from pggPL May 21, 2026 00:39
Copy link
Copy Markdown
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there could be a reason why Pawel removed those functions from this object and we may need to change MCore instead in order to have this be compatible with torch.compile. Setting 'request changes' status for now until @pggPL reviews it.

@github-actions github-actions Bot added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label May 22, 2026
Copy link
Copy Markdown
Collaborator

@pggPL pggPL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I didn't know that this is used in mcore, I've run the torch compile test with this code and it also passes.

@ptrendx
Copy link
Copy Markdown
Member

ptrendx commented May 27, 2026

/te-ci pytorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants