DiversifyingChildren benchmark by aruggero · Pull Request #16082 · apache/lucene

aruggero · 2026-05-18T14:11:32Z

Summary

This PR adds a new JMH microbenchmark for DiversifyingChildrenFloatKnnVectorQuery (the join-based parent-child KNN query), which was previously lacking dedicated performance coverage in lucene/benchmark-jmh.

Motivation

DiversifyingChildrenFloatKnnVectorQuery operates over a nested document structure (child vectors + parent block). A dedicated benchmark enables measurement and tracking of query latency across realistic corpus shapes and query configurations.
This would also give the basis for evaluating future performance improvements in nested KNN search.

What the benchmark does

The DiversifyingChildrenKnnQueryBenchmark benchmark builds an index of parent–child document blocks, where each parent owns a configurable number of child documents, each carrying a random float vector. A pool of 256 pre-generated unit query vectors is rotated during measurement to avoid caching effects.

The following parameters are benchmarked:

Parameter	Values	Description
numParents	5000	Total number of parent groups
childrenPerParent	4, 50	Children per parent; controls filter selectivity
k	10, 100	Number of top results requested
dim	128, 768	Vector dimension

The benchmark uses SampleTime mode (5 warm-up iterations, 5 measurement iterations).

Setting

BenchmarkMode(Mode.SampleTime): rather than averaging all measurements into a single number, JMH records the latency of every individual operation and computes a histogram. This gives you p50, p90, p99, p99.9 automatically.
For a search benchmark, this matters: HNSW graph traversal has variable-length paths (some queries terminate early, some explore more nodes), so the mean alone is misleading. Percentiles tell you whether improvements are consistent or only in the best case.

Warmup(iterations = 5, time = 2): the JVM's JIT compiler needs to observe a method being called thousands of times before it applies the most aggressive optimisations.
HNSW traversal involves polymorphic call sites, priority queue operations, and BitSet accesses — complex enough that JIT convergence takes longer than simpler benchmarks.
5 iterations × 2s = 10s gives the JIT enough invocations to fully optimise the hot paths before measurement begins.

Measurement(iterations = 5, time = 5): more time per iteration means more samples collected per iteration (since SampleTime records every call). 5×5s per fork × 1 fork = 25s of samples per combination, which gives enough data points for JMH to compute reliable percentile estimates for fast combinations; tail percentiles for the heaviest combinations (high dim, many children) remain approximate due to longer per-query latency.

How to run

./gradlew -p lucene/benchmark-jmh assemble
java -jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-*.jar DiversifyingChildrenKnnQueryBenchmark

alessandrobenedetti · 2026-05-22T10:30:26Z

just realised that according to the CHANGES.txt this is only cming to Lucene 11.0, so I won't do any cherry picking!

aruggero added 5 commits May 15, 2026 11:59

Added benchmark for DiversifyingChildrenKnnQuery

d86b038

Gralew tidy

c7f573a

Changed parameters to reduce execution time around 30min

51bc1f9

Updated parameters

597c09e

Merge branch 'upStream-main' into diversifyingChildrenBenchmark

13db0f4

github-actions Bot added the module:build-infra label May 18, 2026

Added changes.txt row

6427ac4

github-actions Bot added this to the 11.0.0 milestone May 19, 2026

benwtrent approved these changes May 19, 2026

View reviewed changes

alessandrobenedetti approved these changes May 22, 2026

View reviewed changes

alessandrobenedetti merged commit 15f20e4 into apache:main May 22, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiversifyingChildren benchmark#16082

DiversifyingChildren benchmark#16082
alessandrobenedetti merged 6 commits into
apache:mainfrom
SeaseLtd:diversifyingChildrenBenchmark

aruggero commented May 18, 2026

Uh oh!

Uh oh!

alessandrobenedetti commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aruggero commented May 18, 2026

Summary

Motivation

What the benchmark does

Setting

How to run

Uh oh!

Uh oh!

alessandrobenedetti commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants