Core: Add partition tuple to ManifestInfo#16213
Conversation
d3241e3 to
5468398
Compare
| } | ||
|
|
||
| @Override | ||
| @SuppressWarnings("unchecked") |
There was a problem hiding this comment.
Minor: just add it to case 9?
5468398 to
b73b211
Compare
@anoopj yes |
| Types.StructType PARTITION_SUMMARY_TYPE = | ||
| Types.StructType.of( | ||
| Types.NestedField.required( | ||
| 509, | ||
| "contains_null", | ||
| Types.BooleanType.get(), | ||
| "True if any file has a null partition value"), | ||
| Types.NestedField.optional( | ||
| 518, | ||
| "contains_nan", | ||
| Types.BooleanType.get(), | ||
| "True if any file has a nan partition value"), | ||
| Types.NestedField.optional( | ||
| 510, "lower_bound", Types.BinaryType.get(), "Partition lower bound for all files"), | ||
| Types.NestedField.optional( | ||
| 511, "upper_bound", Types.BinaryType.get(), "Partition upper bound for all files")); | ||
| Types.NestedField PARTITION_SUMMARIES = |
There was a problem hiding this comment.
Even if we keep partition tuples for data file entries, I don't think we neccessarily need to keep partition field summaries for manifests as well, we can still use stats at this level in the tree. In fact, I largely think we shouldn't since it adds more complexity to the structure.
Ultimately, at the manifest level we want to preserve pruning but I think we can do that with stats, so long as we ensure that stats on source columns are collected and we have the rules to aggregate them correct. That was independent of the decision to keep the tuple or not. Let me know if I'm missing something @nastra @anoopj @rdblue
Adds a
partitionsfield toManifestInfoand its implementationManifestInfoStruct. This mirrors the existingpartitionsfield onManifestFile