Use Boolean and List for ORCRecordReader boolean and list columns#18391
Use Boolean and List for ORCRecordReader boolean and list columns#18391rsrkpatwari1234 wants to merge 8 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18391 +/- ##
============================================
+ Coverage 63.48% 63.50% +0.02%
Complexity 1701 1701
============================================
Files 3254 3254
Lines 199114 199119 +5
Branches 30833 30834 +1
============================================
+ Hits 126399 126443 +44
+ Misses 62643 62587 -56
- Partials 10072 10089 +17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| values[j] = extractValue(field, listColumnVector.child, childType, offset + j); | ||
| values.add(extractValue(field, listColumnVector.child, childType, offset + j)); | ||
| } | ||
| return values; |
There was a problem hiding this comment.
Returning List here changes the RecordExtractor/GenericRow multi-value contract from Object[] to List. This PR only patches a few ingestion call sites, but many existing consumers still cast MV values to Object[] or test for instanceof Object[], so ORC rows can still break outside the narrow segment-build path. Please keep the extractor output as Object[] and add any user-facing List adaptation at a higher layer instead of changing the reader contract.
Fixed #18222
Summary
ORC rows are built by
ORCRecordExtractor(the reader only reads the file and calls the extractor). This change makes common types match what users expect:Segment building and tests already assumed multi-value fields as
Object[]in many places. Those paths now acceptListas well (viaRecordReaderUtils.toObjectArrayand small updates in stats collection and special-value handling), so ORC ingestion keeps working.Test Plan
Unit tests are updated so the exact Java types for boolean and list extraction are checked, including empty lists and null list elements.