Introduce intoCacheAndCount method to BulkScorer to allow cache specializations by iverase · Pull Request #16083 · apache/lucene

iverase · 2026-05-18T14:41:32Z

Currently scorers are always cache using a dense representation, using either RoaringBitSets or FixedBitSets. This feels very inefficient for scorers that can be represented in a sparse way, like dense ranges.

This PR proposes to allow for scorer specialisations by moving the current code to materialize the scorer to the BulkScorer base class under the method #intoCacheAndCount(int maxDoc). This method can be overriden by subclasses, for example RangeBulkScorer can represent itself in a sparse way saving a good bunch of heap.

closes #16071

…alizations

romseygeek · 2026-05-18T15:21:03Z

-    if (scorer.cost() * 100 >= maxDoc) {
-      // FixedBitSet is faster for dense sets and will enable the random-access
-      // optimization in ConjunctionDISI
-      return cacheIntoBitSet(scorer, maxDoc);


I wonder if this comment is still true now that we have intoBitSet() and docIDRunEnd()? I don't really like the idea of pushing things from LRUQueryCache onto BulkScorer - I feel like the way to fix this is to make RoaringBitSet more performant.

And how do you fix it if the range matches 2% of the index? having to build a full FixedBitSet for a dense range feels wrong.

I mean that maybe we should always be using RoaringBitSet?

I see, you mean using RoaringBitSet more aggresively.

Yeah, exactly. I need to do some code archaeology but I think ConjunctionDISI used to do instanceof checks for its input filters but now just uses intoBitSet and docIdRunEnd, so if we can get the performance of RoaringBitSet up for those methods then we don't need to use FixedBitSet here at all. Which would save us a whole bunch of memory.

So this would be a step in that direction: #16084

I like the idea of going away of those fixed bit set in caching as they are a source of humongous allocations which moving to RoaringBitSet would avoid. How can be test the performance here?

I think we should be able to adjust luceneutil to use a query cache which should give us an idea of performance changes?

Introduce intoCacheAndCount method to BulkScorer to allow cache speci…

49494ee

…alizations

iverase added this to the 10.5.0 milestone May 18, 2026

romseygeek reviewed May 18, 2026

View reviewed changes

github-actions Bot added the module:core/search label May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce intoCacheAndCount method to BulkScorer to allow cache specializations#16083

Introduce intoCacheAndCount method to BulkScorer to allow cache specializations#16083
iverase wants to merge 1 commit into
apache:mainfrom
iverase:RangeDocIdSet

iverase commented May 18, 2026 •

edited

Loading

Uh oh!

romseygeek May 18, 2026

Uh oh!

iverase May 18, 2026

Uh oh!

romseygeek May 18, 2026

Uh oh!

iverase May 18, 2026

Uh oh!

romseygeek May 18, 2026

Uh oh!

iverase May 19, 2026

Uh oh!

romseygeek May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iverase commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

romseygeek May 18, 2026

Choose a reason for hiding this comment

Uh oh!

iverase May 18, 2026

Choose a reason for hiding this comment

Uh oh!

romseygeek May 18, 2026

Choose a reason for hiding this comment

Uh oh!

iverase May 18, 2026

Choose a reason for hiding this comment

Uh oh!

romseygeek May 18, 2026

Choose a reason for hiding this comment

Uh oh!

iverase May 19, 2026

Choose a reason for hiding this comment

Uh oh!

romseygeek May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iverase commented May 18, 2026 •

edited

Loading