Spark: Support aggregate pushdown for identity partition column GROUP BY by hemanthboyina · Pull Request #16176 · apache/iceberg

hemanthboyina · 2026-04-30T18:17:55Z

This PR enables aggregate pushdown for queries with GROUP BY on identity partition columns. Currently, Iceberg supports pushing down aggregates (COUNT, MIN, MAX) for queries without GROUP BY, computing results from file metadata instead of reading data files. However, when a query includes GROUP BY, the pushdown is disabled even when the GROUP BY columns are identity partition fields.

singhpk234 · 2026-05-02T02:26:57Z

+    Map<List<Object>, AggregateEvaluator> evaluatorsByPartition =
+        groupFilesByPartition(spec, groupByPositions, boundAggregates);


i am not confident this is correct, plus we are just checking the recent partitioning, a table could comprise of lot of different partition spec files which evolved across snapshots

Thanks for the review @singhpk234 You raised a valid point. the current implementation only considers the current partition spec and bails out for files from different specs. Will look into handling spec evolution properly and update the PR.

handled partition spec evolution changes, can you please review

anuragmantri

Thanks for the useful PR @hemanthboyina. Overall, it looks good to me. I made some suggestions.

anuragmantri · 2026-05-08T17:30:50Z

+    return -1;
+  }
+
+  private boolean allGroupByAreIdentityPartitionFields(Aggregation aggregation) {


allGroupByAreIdentityPartitionFields() and resolveGroupByFields() look very similar except

allGroupByAreIdentityPartitionFields additionally checks instanceof NamedReference

resolveGroupByFields additionally collects field IDs and fields into output lists
Can we merge these two?

Or maybe let canPushDownAggregation() allow group by and then have the checks in this merged method? What do you think?

Done. Merged allGroupByAreIdentityPartitionFields into resolveGroupByFields. Removed the separate method and simplified canPushDownAggregation

anuragmantri · 2026-05-08T17:32:20Z

+    return true;
+  }
+
+  private static class ArrayStructLike implements StructLike {


Can we use AggregateEvaluator.ArrayStructLike instead? May have to make it package-private.

AggregateEvaluator.ArrayStructLike is private static in the api module. Since SparkScanBuilder is in spark module, I assume even package-private wouldn't help, we'd need to make it public. Kept the changes same to avoid API surface changes. Happy to follow up separately if preferred.

anuragmantri · 2026-05-08T17:34:43Z

@@ -568,11 +568,9 @@ public void testAggregationPushdownOnBucketedColumn() {
    sql(
        "CREATE TABLE %s (id BIGINT, struct_with_int STRUCT<c1:INT>) USING iceberg PARTITIONED BY (bucket(8, id))",
        tableName);
-


Nit: Unrelated whitespace change.

anuragmantri · 2026-05-08T17:48:26Z

@@ -909,4 +907,183 @@ public void testAggregatePushDownForIncrementalScan() {
    assertEquals(
        "min/max/count push down", expected2, rowsToJava(unboundedPushdownDs.collectAsList()));
  }
+
+  @TestTemplate
+  public void testGroupByIdentityPartitionColumnCountPushDown() {


Can we also verify the EXPLAIN string has the pushdown like other tests?

anuragmantri · 2026-05-08T17:48:56Z

+  }
+
+  @TestTemplate
+  public void testGroupByIdentityPartitionColumnWithMinMax() {


Same here, can we also have explain string verification?

Spark: Support aggregate pushdown for identity partition column GROUP BY

fde0869

github-actions Bot added the spark label Apr 30, 2026

fix complexity and checkstyle

751c3b1

singhpk234 requested a review from huaxingao May 2, 2026 02:24

singhpk234 reviewed May 2, 2026

View reviewed changes

fix spec evolution changes

8e38932

anuragmantri reviewed May 8, 2026

View reviewed changes

fix review comments and tests

48e396b

		Map<List<Object>, AggregateEvaluator> evaluatorsByPartition =
		groupFilesByPartition(spec, groupByPositions, boundAggregates);

Conversation

hemanthboyina commented Apr 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anuragmantri left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants