Fix undercounting of RAM used by vectors buffered in in-memory segments by iprithv · Pull Request #15982 · apache/lucene

iprithv · 2026-04-24T21:01:19Z

Description

Vector RAM accounting in ramBytesUsed() had three bugs causing IndexWriter to undercount memory usage for buffered vectors, leading to delayed flush decisions and higher than expected memory consumption.

Bugs Fixed

Fixes #15901

1. BufferingKnnVectorsWriter hardcoded Float.BYTES for all encodings

Byte vectors (VectorEncoding.BYTE) were reported as 4x their actual size because ramBytesUsed() always multiplied by Float.BYTES (4) instead of Byte.BYTES (1). This is technically an overcount for byte vectors, but it's wrong in the opposite direction, it masks the undercounting elsewhere and produces incorrect flush thresholds.

2. Quantized writers never counted rawVectorDelegate RAM

Lucene104ScalarQuantizedVectorsWriter, Lucene99ScalarQuantizedVectorsWriter, and Lucene102BinaryQuantizedVectorsWriter all wrap a rawVectorDelegate (Lucene99FlatVectorsWriter). For FLOAT32 fields, the delegate's field-level data was counted indirectly through FieldWriter.flatFieldVectorsWriter.ramBytesUsed(). But for BYTE fields, which bypass the quantized FieldWriter entirely, the delegate was never queried, making byte vector RAM completely invisible (48 bytes reported for hundreds of KB of actual data).

Refactored all three writers to call rawVectorDelegate.ramBytesUsed() at the writer level for all flat vector data, and quantizationOverheadBytesUsed() for quantization-specific state (magnitudes, dimensionSums) to avoid double-counting.

3. dimensionSums array not counted

The float[dimension] array used for centroid calculation during flush was not included in ramBytesUsed() for Lucene104ScalarQuantizedVectorsWriter and Lucene102BinaryQuantizedVectorsWriter.

…ts (apache#15901)

iprithv · 2026-04-28T15:15:26Z

@mikemccand could you please take a look at this when you get a chance? Thanks!

shubhamvishu

Thanks for taking this up! I left a few comments.

shubhamvishu · 2026-05-01T15:37:09Z

+
    @Override
    public long ramBytesUsed() {
      long size = SHALLOW_SIZE;


Should this be removed like in Lucene102BinaryQuantizedVectorsWriter so its not double counted in both #ramByesUsed and quantizationOverheadBytesUsed?

I've updated matching the pattern in Lucene102 and Lucene104.

There's no actual double counting though, the writer level ramBytesUsed() never calls field.ramBytesUsed(). It only calls field.quantizationOverheadBytesUsed(). The FieldWriter.ramBytesUsed() method exists purely to satisfy the Accountable interface for standalone introspection (e.g. tests, debugging), not to feed the writer's own accounting. In Lucene99's case, quantizationOverheadBytesUsed() and the old SHALLOW_SIZE return the same value anyway (Lucene99's FieldWriter has no extra quant state like magnitudes or dimensionSums), so it was functionally identical, just inconsistent.

Thanks @shubhamvishu!

The FieldWriter.ramBytesUsed() method exists purely to satisfy the Accountable interface for standalone introspection (e.g. tests, debugging), not to feed the writer's own accounting.

Ahh this makes sense and gives me clarity now. Thanks !

So the FieldWriter does not track any flat vector data anymore even though it does if asked specifically(which we should call on the main code path for accounting). Could you add a comment above flatFieldVectorsWriter.ramBytesUsed() in FieldWriter#ramBytesUsed to mention this is a no-op in case of overall accounting(but solely exists for maintaining api correctness/contract) so others are also not confused?

shubhamvishu · 2026-05-01T15:53:01Z

    public long ramBytesUsed() {
-      long size = SHALLOW_SIZE;
+      long size = quantizationOverheadBytesUsed();
      size += flatFieldVectorsWriter.ramBytesUsed();


So the raw delegate above would now be responsible to account for vector data for both float and bytes and hence we switched to call the overhead part in this? But then will we not double count it for floats her with flatFieldVectorsWriter.ramBytesUsed and also rawVectorDelegate.ramBytesUsed(the newly added one)?

Yes, rawVectorDelegate is now the single source of truth for all flat vector data (both byte and float32).

No double counting happens, FieldWriter.flatFieldVectorsWriter is the same Java object that rawVectorDelegate holds internally as the per-field writer, it's what this.rawVectorDelegate.addField(fieldInfo) returns and then passes into new FieldWriter(fieldInfo, rawVectorDelegate). So rawVectorDelegate.ramBytesUsed() already accounts for those float vectors.

The writer level loop then calls field.quantizationOverheadBytesUsed(), which only counts the FieldWriter shell + magnitudes + dimensionSums, NOT flatFieldVectorsWriter. FieldWriter.ramBytesUsed() (which does include flatFieldVectorsWriter.ramBytesUsed()) is never called from the writer level accounting. It's there solely for the Accountable interface. So each byte of flat float data is counted exactly once through rawVectorDelegate.

Thanks @shubhamvishu!

shubhamvishu · 2026-05-01T16:12:03Z

Would it be better if instead of (normal + quantization overhead) we see it like keeping things specific to FieldWriter in its #ramBytesUsed otherwise all on Lucene[XYZ]ScalarQuantizedVectorsWriter#ramBytesUsed? Very likely I might be confused with this accounting code part(lemme know if thats the case) but it would be good to keep accounting local basically. Right now we don't use FieldWriter for quantized fields as you mentioned but we do call quantizationOverheadBytesUsed in both places. Could we simplify keeping the accounting local?

mikemccand

Thank you @iprithv -- I haven't had time to review more deeply -- I left a small question about the byte[] vector input case.

mikemccand · 2026-05-01T15:28:36Z

                  (RamUsageEstimator.NUM_BYTES_OBJECT_REF
                      + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER)
-          + vectors.size() * (long) dim * Float.BYTES;
+          + vectors.size() * (long) dim * fieldInfo.getVectorEncoding().byteSize;


Whoa, so this means, if Lucene user was index vectors coming in as byte[] (like they pre-quantize, outside of Lucene), we were incorrectly counting them as 4X larger RAM usage, and IW would flush way too early?

Yes, exactly. Before the fix, ramBytesUsed() always multiplied by Float.BYTES (4) regardless of encoding, so a byte[] vector field was reported as 4x its actual memory cost causing IndexWriter to flush up to 4x too early for byte encoded vector fields. After this goes in, it switches to fieldInfo.getVectorEncoding().byteSize, which is 1 for BYTE and 4 for FLOAT32, giving the correct cost in both cases. Thanks @mikemccand!

@mikemccand just wanted to touch base with you on this, in case it got buried. Thanks!

iprithv · 2026-05-06T21:25:00Z

Would it be better if instead of (normal + quantization overhead) we see it like keeping things specific to FieldWriter in its #ramBytesUsed otherwise all on Lucene[XYZ]ScalarQuantizedVectorsWriter#ramBytesUsed? Very likely I might be confused with this accounting code part(lemme know if thats the case) but it would be good to keep accounting local basically. Right now we don't use FieldWriter for quantized fields as you mentioned but we do call quantizationOverheadBytesUsed in both places. Could we simplify keeping the accounting local?

@shubhamvishu I think the current design is the right one, and there is no correct alternative for this specific structure.

If we made FieldWriter.ramBytesUsed() return only quantizationOverheadBytesUsed() and called field.ramBytesUsed() at the writer level instead of quantizationOverheadBytesUsed():

Byte vector fields still need separate handling, they have no FieldWriter at all, so rawVectorDelegate.ramBytesUsed() would still be required
Breaks the Accountable contract, FieldWriter.ramBytesUsed() would return an incomplete number when queried in isolation
Gains nothing, we would still need two calls at the writer level, rawVectorDelegate for byte fields + field.ramBytesUsed() for float32 overhead

I think the quantizationOverheadBytesUsed() method is the right abstraction because the delegate already "owns" the flat data layer and the writer should only know about the extra quant state on top. Everything is accounted for exactly once.

Thanks @shubhamvishu!

shubhamvishu · 2026-05-18T17:51:27Z

@iprithv Thanks for iterating here and looks ready. I think I understand now what was my confusion around double accounting(which we are not doing apparently). I added a small comment above to mention that in a inline code comment.

Could you also run luceneutil benchmarks with you PR to see how this impacts the vector indexing or merging etc?

iprithv · 2026-05-20T01:26:21Z

@iprithv Thanks for iterating here and looks ready. I think I understand now what was my confusion around double accounting(which we are not doing apparently). I added a small comment above to mention that in a inline code comment.

Could you also run luceneutil benchmarks with you PR to see how this impacts the vector indexing or merging etc?

@shubhamvishu sure, done. added the comment. Thanks!

this change is just fixing ram accounting, no changes to actual indexing or merge logic. for float vectors, nothing changes. for byte vectors, we were overcounting memory before (around 4x), so now it just reports the correct usage. this mainly helps avoid early flushes from IndexWriter. so indexing/merging performance shouldn’t really change.

I still ran KnnGraphTester with 50k cohere vectors (1024d, 8 threads, hnsw):
index time
main → 5.31 sec
this PR → 5.09 sec

no regression. both runs behave the same (same segments, same merges). as expected since this is only a ram accounting fix. Thanks!

github-actions Bot added module:core/codecs module:test-framework labels Apr 24, 2026

github-actions Bot added this to the 11.0.0 milestone Apr 24, 2026

Fix undercounting of RAM used by vectors buffered in in-memory segmen…

918fa44

…ts (apache#15901)

iprithv force-pushed the fix/vector-ram-accounting-undercount branch from f97d7cf to 918fa44 Compare April 24, 2026 21:11

Merge branch 'main' into fix/vector-ram-accounting-undercount

d87ecfa

shubhamvishu reviewed May 1, 2026

View reviewed changes

mikemccand reviewed May 6, 2026

View reviewed changes

Merge branch 'apache:main' into fix/vector-ram-accounting-undercount

f05f0db

review changes

a7d64b5

iprithv requested review from mikemccand and shubhamvishu May 6, 2026 21:29

Merge branch 'main' into fix/vector-ram-accounting-undercount

fd283d3

review changes

bdc0a8c

Conversation

iprithv commented Apr 24, 2026

Description

Bugs Fixed

Uh oh!

iprithv commented Apr 28, 2026

Uh oh!

shubhamvishu left a comment

Choose a reason for hiding this comment

Uh oh!

shubhamvishu May 1, 2026

Choose a reason for hiding this comment

Uh oh!

iprithv May 6, 2026

Choose a reason for hiding this comment

Uh oh!

shubhamvishu May 18, 2026

Choose a reason for hiding this comment

Uh oh!

shubhamvishu May 1, 2026

Choose a reason for hiding this comment

Uh oh!

iprithv May 6, 2026

Choose a reason for hiding this comment

Uh oh!

shubhamvishu commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

mikemccand May 1, 2026

Choose a reason for hiding this comment

Uh oh!

iprithv May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iprithv May 18, 2026

Choose a reason for hiding this comment

Uh oh!

iprithv commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shubhamvishu commented May 18, 2026

Uh oh!

iprithv commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shubhamvishu commented May 1, 2026 •

edited

Loading

iprithv May 6, 2026 •

edited

Loading

iprithv commented May 6, 2026 •

edited

Loading

iprithv commented May 20, 2026 •

edited

Loading