Skip to content

DiversifyingChildren benchmark#16082

Merged
alessandrobenedetti merged 6 commits into
apache:mainfrom
SeaseLtd:diversifyingChildrenBenchmark
May 22, 2026
Merged

DiversifyingChildren benchmark#16082
alessandrobenedetti merged 6 commits into
apache:mainfrom
SeaseLtd:diversifyingChildrenBenchmark

Conversation

@aruggero
Copy link
Copy Markdown
Contributor

Summary

This PR adds a new JMH microbenchmark for DiversifyingChildrenFloatKnnVectorQuery (the join-based parent-child KNN query), which was previously lacking dedicated performance coverage in lucene/benchmark-jmh.

Motivation

DiversifyingChildrenFloatKnnVectorQuery operates over a nested document structure (child vectors + parent block). A dedicated benchmark enables measurement and tracking of query latency across realistic corpus shapes and query configurations.
This would also give the basis for evaluating future performance improvements in nested KNN search.

What the benchmark does

The DiversifyingChildrenKnnQueryBenchmark benchmark builds an index of parent–child document blocks, where each parent owns a configurable number of child documents, each carrying a random float vector. A pool of 256 pre-generated unit query vectors is rotated during measurement to avoid caching effects.

The following parameters are benchmarked:

Parameter Values Description
numParents 5000 Total number of parent groups
childrenPerParent 4, 50 Children per parent; controls filter selectivity
k 10, 100 Number of top results requested
dim 128, 768 Vector dimension

The benchmark uses SampleTime mode (5 warm-up iterations, 5 measurement iterations).

Setting

BenchmarkMode(Mode.SampleTime): rather than averaging all measurements into a single number, JMH records the latency of every individual operation and computes a histogram. This gives you p50, p90, p99, p99.9 automatically.
For a search benchmark, this matters: HNSW graph traversal has variable-length paths (some queries terminate early, some explore more nodes), so the mean alone is misleading. Percentiles tell you whether improvements are consistent or only in the best case.

Warmup(iterations = 5, time = 2): the JVM's JIT compiler needs to observe a method being called thousands of times before it applies the most aggressive optimisations.
HNSW traversal involves polymorphic call sites, priority queue operations, and BitSet accesses — complex enough that JIT convergence takes longer than simpler benchmarks.
5 iterations × 2s = 10s gives the JIT enough invocations to fully optimise the hot paths before measurement begins.

Measurement(iterations = 5, time = 5): more time per iteration means more samples collected per iteration (since SampleTime records every call). 5×5s per fork × 1 fork = 25s of samples per combination, which gives enough data points for JMH to compute reliable percentile estimates for fast combinations; tail percentiles for the heaviest combinations (high dim, many children) remain approximate due to longer per-query latency.

How to run

./gradlew -p lucene/benchmark-jmh assemble
java -jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-*.jar DiversifyingChildrenKnnQueryBenchmark

@github-actions github-actions Bot added this to the 11.0.0 milestone May 19, 2026
@alessandrobenedetti alessandrobenedetti merged commit 15f20e4 into apache:main May 22, 2026
13 checks passed
@alessandrobenedetti
Copy link
Copy Markdown
Contributor

just realised that according to the CHANGES.txt this is only cming to Lucene 11.0, so I won't do any cherry picking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants