increasing precision tolerance#3060
Conversation
Greptile SummaryThis PR widens the
Confidence Score: 5/5Safe to merge — the change is limited to a test tolerance value with no effect on library or runtime code. The only modification is a slight relaxation of a numerical tolerance in a test helper, accompanied by a well-reasoned comment. The new value stays below the existing tolerance for the analogous UnfusedAttention path, the narrowly scoped condition (TransformerLayer + head_dim > 128 + fused/flash backend + fp16) is correct, and bfloat16 is intentionally left untouched. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[get_tols called] --> B{module == TransformerLayer?}
B -- No --> C[DotProductAttention tolerances\ntorch.half: 1e-3, 1e-3]
B -- Yes --> D{head_dim_qk <= 128?}
D -- Yes --> E[torch.half: 5e-3, 5e-3\ntorch.bfloat16: 3.5e-2, 3.5e-2]
D -- No --> F{backend == UnfusedAttention?}
F -- Yes --> G[torch.half: 1.6e-2, 1.6e-2\ntorch.bfloat16: 1.2e-1, 1e-1]
F -- No --> H["torch.half: 1.5e-2, 1.5e-2 ← CHANGED\ntorch.bfloat16: 8e-2, 7e-2 (unchanged)"]
style H fill:#fffbe6,stroke:#f0ad4e
Reviews (2): Last reviewed commit: "increasing precision tolerance" | Re-trigger Greptile |
|
/te-ci pytorch L0 |
Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai>
3695fe4 to
2c76593
Compare
Here’s a cleaner Markdown rewrite of your PR description:
Description
test_kv_cachecompares full-sequence attention against incremental KV-cache decoding. In theTransformerLayerconfiguration wherehead_dim > 128infp16, these two execution paths use different kernels and masking strategies (e.g.,causalvs.padding_causal_bottom_right, and full-matrix vs. single-query-row kernels). As a result, their outputs diverge slightly due to accumulatedfp16rounding differences.On Ampere, this divergence can reach the current tolerance threshold in rare cases, producing a spurious failure. In one observed instance, a single element out of 4096 showed an absolute difference of ~0.0107, which narrowly exceeds the existing
1e-2tolerance.This change slightly relaxes the
fp16tolerance for the affected configuration to make the test robust across architectures. No runtime or library code is modified.Fixes: N/A (spurious
test_kv_cachefailure on sm80 / fp16 / head_dim=256)Type of change
Checklist
Notes
bfloat16tolerance in this branch is unchanged, as it was not failing.Diff summary