Core: Cache PartitionData template in PartitionsTable to avoid rebuilding Avro schema per partition by Wenjun7J · Pull Request #16208 · apache/iceberg

Wenjun7J · 2026-05-04T14:44:36Z

What is changed

This change avoids rebuilding the same PartitionData Avro schema for every partition row when scanning the partitions metadata table.

Instead of creating a fresh PartitionData(partitionType) for each partition value, PartitionsTable now creates one PartitionData template per scan and reuses it through copyFor(key).

A regression test is also added to verify that partition rows produced within the same scan reuse the same underlying Avro schema instance.

Why

PartitionsTable currently constructs partition rows like this:

create PartitionData(partitionType)
convert partition type to Avro schema
copy the partition key into the new object

When a table has many partition values, this repeats the same schema conversion over and over again, creating heavy allocation pressure in:

PartitionData.partitionDataSchema
AvroSchemaUtil.convert
TypeToSchema$WithTypeToName.struct

This is especially visible for wide partition specs and large metadata table scans.

External reproduction

Used a standalone repro app that scans the Iceberg partitions metadata table for a table with:

20,000 partition values
4 partition columns
repeated full partitionsTable scans

         try (CloseableIterable<FileScanTask> tasks = partitionsTable.newScan().planFiles()) {
                for (FileScanTask task : tasks) {
                    try (CloseableIterable<StructLike> rows = task.asDataTask().rows()) {
                        for (StructLike row : rows) {
                            StructProjection partitionData = row.get(0, StructProjection.class);
                            if (partitionData == null) {
                                throw new IllegalStateException("Partition row returned null partition data");
                            }
                            partitionRows++;
                        }
                    }
                }
            }

Before fix (origin/main)

Average wall clock time: 12.71s
Average max RSS: 5,938,604 KB (~5.66 GiB)

After fix

Average wall clock time: 5.24s
Average max RSS: 1,483,155 KB (~1.41 GiB)

Improvement

Wall clock time reduced by 58.8%
Max RSS reduced by 75.0%

Signed-off-by: SevenJ <wenjun7j@gmail.com>

Wenjun7J · 2026-05-05T12:50:37Z

@RussellSpitzer @pvary could you please take a look?

avoid partition schema alloc frequently

f7b717a

Signed-off-by: SevenJ <wenjun7j@gmail.com>

github-actions Bot added the core label May 4, 2026

Test: fix spotless formatting in metadata table scan test

1d73386

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Cache PartitionData template in PartitionsTable to avoid rebuilding Avro schema per partition#16208

Core: Cache PartitionData template in PartitionsTable to avoid rebuilding Avro schema per partition#16208
Wenjun7J wants to merge 2 commits intoapache:mainfrom
Wenjun7J:partitions-table-schema-cache

Wenjun7J commented May 4, 2026 •

edited

Loading

Uh oh!

Wenjun7J commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Wenjun7J commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is changed

Why

External reproduction

Before fix (origin/main)

After fix

Improvement

Uh oh!

Wenjun7J commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Wenjun7J commented May 4, 2026 •

edited

Loading