Skip to content

Fix undercounting of RAM used by vectors buffered in in-memory segments#15982

Open
iprithv wants to merge 6 commits into
apache:mainfrom
iprithv:fix/vector-ram-accounting-undercount
Open

Fix undercounting of RAM used by vectors buffered in in-memory segments#15982
iprithv wants to merge 6 commits into
apache:mainfrom
iprithv:fix/vector-ram-accounting-undercount

Conversation

@iprithv
Copy link
Copy Markdown
Contributor

@iprithv iprithv commented Apr 24, 2026

Description

Vector RAM accounting in ramBytesUsed() had three bugs causing IndexWriter to undercount memory usage for buffered vectors, leading to delayed flush decisions and higher than expected memory consumption.

Bugs Fixed

Fixes #15901

1. BufferingKnnVectorsWriter hardcoded Float.BYTES for all encodings

Byte vectors (VectorEncoding.BYTE) were reported as 4x their actual size because ramBytesUsed() always multiplied by Float.BYTES (4) instead of Byte.BYTES (1). This is technically an overcount for byte vectors, but it's wrong in the opposite direction, it masks the undercounting elsewhere and produces incorrect flush thresholds.

2. Quantized writers never counted rawVectorDelegate RAM

Lucene104ScalarQuantizedVectorsWriter, Lucene99ScalarQuantizedVectorsWriter, and Lucene102BinaryQuantizedVectorsWriter all wrap a rawVectorDelegate (Lucene99FlatVectorsWriter). For FLOAT32 fields, the delegate's field-level data was counted indirectly through FieldWriter.flatFieldVectorsWriter.ramBytesUsed(). But for BYTE fields, which bypass the quantized FieldWriter entirely, the delegate was never queried, making byte vector RAM completely invisible (48 bytes reported for hundreds of KB of actual data).

Refactored all three writers to call rawVectorDelegate.ramBytesUsed() at the writer level for all flat vector data, and quantizationOverheadBytesUsed() for quantization-specific state (magnitudes, dimensionSums) to avoid double-counting.

3. dimensionSums array not counted

The float[dimension] array used for centroid calculation during flush was not included in ramBytesUsed() for Lucene104ScalarQuantizedVectorsWriter and Lucene102BinaryQuantizedVectorsWriter.

@iprithv iprithv force-pushed the fix/vector-ram-accounting-undercount branch from f97d7cf to 918fa44 Compare April 24, 2026 21:11
@iprithv
Copy link
Copy Markdown
Contributor Author

iprithv commented Apr 28, 2026

@mikemccand could you please take a look at this when you get a chance? Thanks!

Copy link
Copy Markdown
Contributor

@shubhamvishu shubhamvishu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this up! I left a few comments.


@Override
public long ramBytesUsed() {
long size = SHALLOW_SIZE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed like in Lucene102BinaryQuantizedVectorsWriter so its not double counted in both #ramByesUsed and quantizationOverheadBytesUsed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated matching the pattern in Lucene102 and Lucene104.

There's no actual double counting though, the writer level ramBytesUsed() never calls field.ramBytesUsed(). It only calls field.quantizationOverheadBytesUsed(). The FieldWriter.ramBytesUsed() method exists purely to satisfy the Accountable interface for standalone introspection (e.g. tests, debugging), not to feed the writer's own accounting. In Lucene99's case, quantizationOverheadBytesUsed() and the old SHALLOW_SIZE return the same value anyway (Lucene99's FieldWriter has no extra quant state like magnitudes or dimensionSums), so it was functionally identical, just inconsistent.

Thanks @shubhamvishu!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FieldWriter.ramBytesUsed() method exists purely to satisfy the Accountable interface for standalone introspection (e.g. tests, debugging), not to feed the writer's own accounting.

Ahh this makes sense and gives me clarity now. Thanks !

So the FieldWriter does not track any flat vector data anymore even though it does if asked specifically(which we should call on the main code path for accounting). Could you add a comment above flatFieldVectorsWriter.ramBytesUsed() in FieldWriter#ramBytesUsed to mention this is a no-op in case of overall accounting(but solely exists for maintaining api correctness/contract) so others are also not confused?

public long ramBytesUsed() {
long size = SHALLOW_SIZE;
long size = quantizationOverheadBytesUsed();
size += flatFieldVectorsWriter.ramBytesUsed();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the raw delegate above would now be responsible to account for vector data for both float and bytes and hence we switched to call the overhead part in this? But then will we not double count it for floats her with flatFieldVectorsWriter.ramBytesUsed and also rawVectorDelegate.ramBytesUsed(the newly added one)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, rawVectorDelegate is now the single source of truth for all flat vector data (both byte and float32).

No double counting happens, FieldWriter.flatFieldVectorsWriter is the same Java object that rawVectorDelegate holds internally as the per-field writer, it's what this.rawVectorDelegate.addField(fieldInfo) returns and then passes into new FieldWriter(fieldInfo, rawVectorDelegate). So rawVectorDelegate.ramBytesUsed() already accounts for those float vectors.

The writer level loop then calls field.quantizationOverheadBytesUsed(), which only counts the FieldWriter shell + magnitudes + dimensionSums, NOT flatFieldVectorsWriter. FieldWriter.ramBytesUsed() (which does include flatFieldVectorsWriter.ramBytesUsed()) is never called from the writer level accounting. It's there solely for the Accountable interface. So each byte of flat float data is counted exactly once through rawVectorDelegate.

Thanks @shubhamvishu!

@shubhamvishu
Copy link
Copy Markdown
Contributor

shubhamvishu commented May 1, 2026

Would it be better if instead of (normal + quantization overhead) we see it like keeping things specific to FieldWriter in its #ramBytesUsed otherwise all on Lucene[XYZ]ScalarQuantizedVectorsWriter#ramBytesUsed? Very likely I might be confused with this accounting code part(lemme know if thats the case) but it would be good to keep accounting local basically. Right now we don't use FieldWriter for quantized fields as you mentioned but we do call quantizationOverheadBytesUsed in both places. Could we simplify keeping the accounting local?

Copy link
Copy Markdown
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @iprithv -- I haven't had time to review more deeply -- I left a small question about the byte[] vector input case.

(RamUsageEstimator.NUM_BYTES_OBJECT_REF
+ RamUsageEstimator.NUM_BYTES_ARRAY_HEADER)
+ vectors.size() * (long) dim * Float.BYTES;
+ vectors.size() * (long) dim * fieldInfo.getVectorEncoding().byteSize;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, so this means, if Lucene user was index vectors coming in as byte[] (like they pre-quantize, outside of Lucene), we were incorrectly counting them as 4X larger RAM usage, and IW would flush way too early?

Copy link
Copy Markdown
Contributor Author

@iprithv iprithv May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. Before the fix, ramBytesUsed() always multiplied by Float.BYTES (4) regardless of encoding, so a byte[] vector field was reported as 4x its actual memory cost causing IndexWriter to flush up to 4x too early for byte encoded vector fields. After this goes in, it switches to fieldInfo.getVectorEncoding().byteSize, which is 1 for BYTE and 4 for FLOAT32, giving the correct cost in both cases. Thanks @mikemccand!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikemccand just wanted to touch base with you on this, in case it got buried. Thanks!

@iprithv
Copy link
Copy Markdown
Contributor Author

iprithv commented May 6, 2026

Would it be better if instead of (normal + quantization overhead) we see it like keeping things specific to FieldWriter in its #ramBytesUsed otherwise all on Lucene[XYZ]ScalarQuantizedVectorsWriter#ramBytesUsed? Very likely I might be confused with this accounting code part(lemme know if thats the case) but it would be good to keep accounting local basically. Right now we don't use FieldWriter for quantized fields as you mentioned but we do call quantizationOverheadBytesUsed in both places. Could we simplify keeping the accounting local?

@shubhamvishu I think the current design is the right one, and there is no correct alternative for this specific structure.

If we made FieldWriter.ramBytesUsed() return only quantizationOverheadBytesUsed() and called field.ramBytesUsed() at the writer level instead of quantizationOverheadBytesUsed():

  1. Byte vector fields still need separate handling, they have no FieldWriter at all, so rawVectorDelegate.ramBytesUsed() would still be required
  2. Breaks the Accountable contract, FieldWriter.ramBytesUsed() would return an incomplete number when queried in isolation
  3. Gains nothing, we would still need two calls at the writer level, rawVectorDelegate for byte fields + field.ramBytesUsed() for float32 overhead

I think the quantizationOverheadBytesUsed() method is the right abstraction because the delegate already "owns" the flat data layer and the writer should only know about the extra quant state on top. Everything is accounted for exactly once.

Thanks @shubhamvishu!

@iprithv iprithv requested review from mikemccand and shubhamvishu May 6, 2026 21:29
@shubhamvishu
Copy link
Copy Markdown
Contributor

@iprithv Thanks for iterating here and looks ready. I think I understand now what was my confusion around double accounting(which we are not doing apparently). I added a small comment above to mention that in a inline code comment.

Could you also run luceneutil benchmarks with you PR to see how this impacts the vector indexing or merging etc?

@iprithv
Copy link
Copy Markdown
Contributor Author

iprithv commented May 20, 2026

@iprithv Thanks for iterating here and looks ready. I think I understand now what was my confusion around double accounting(which we are not doing apparently). I added a small comment above to mention that in a inline code comment.

Could you also run luceneutil benchmarks with you PR to see how this impacts the vector indexing or merging etc?

@shubhamvishu sure, done. added the comment. Thanks!

this change is just fixing ram accounting, no changes to actual indexing or merge logic. for float vectors, nothing changes. for byte vectors, we were overcounting memory before (around 4x), so now it just reports the correct usage. this mainly helps avoid early flushes from IndexWriter. so indexing/merging performance shouldn’t really change.

I still ran KnnGraphTester with 50k cohere vectors (1024d, 8 threads, hnsw):
index time
main → 5.31 sec
this PR → 5.09 sec

no regression. both runs behave the same (same segments, same merges). as expected since this is only a ram accounting fix. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Are we undercounting RAM used by vectors buffered in an in-memory segment?

3 participants