Commit 76cdf28
authored
fix: Use correct byte representation for decimal hashing (#1998)
## Which issue does this PR close?
- Closes #1981.
## What changes are included in this PR?
The
[spec](https://iceberg.apache.org/spec/#appendix-b-32-bit-hash-requirements)
states that:
>"Decimal values are hashed using the minimum number of bytes required
to hold the unscaled value as a two's complement big-endian".
Prior to this fix, we would incorrectly consume leading `0xFF` bytes and
hash them. Now, we only consume the bytes starting with the one that is
used to preserve the sign, and everything that follows it.
## Are these changes tested?
Added unit tests for original scenario mentioned in the issue, as well
as some additional cases1 parent 700e62e commit 76cdf28
1 file changed
Lines changed: 39 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
81 | 85 | | |
82 | | - | |
83 | | - | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
84 | 92 | | |
85 | | - | |
86 | | - | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
87 | 101 | | |
88 | 102 | | |
89 | 103 | | |
| |||
790 | 804 | | |
791 | 805 | | |
792 | 806 | | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
793 | 828 | | |
794 | 829 | | |
795 | 830 | | |
| |||
0 commit comments