Fix undercounting of RAM used by vectors buffered in in-memory segments#15982
Fix undercounting of RAM used by vectors buffered in in-memory segments#15982iprithv wants to merge 6 commits into
Conversation
f97d7cf to
918fa44
Compare
|
@mikemccand could you please take a look at this when you get a chance? Thanks! |
shubhamvishu
left a comment
There was a problem hiding this comment.
Thanks for taking this up! I left a few comments.
|
|
||
| @Override | ||
| public long ramBytesUsed() { | ||
| long size = SHALLOW_SIZE; |
There was a problem hiding this comment.
Should this be removed like in Lucene102BinaryQuantizedVectorsWriter so its not double counted in both #ramByesUsed and quantizationOverheadBytesUsed?
There was a problem hiding this comment.
I've updated matching the pattern in Lucene102 and Lucene104.
There's no actual double counting though, the writer level ramBytesUsed() never calls field.ramBytesUsed(). It only calls field.quantizationOverheadBytesUsed(). The FieldWriter.ramBytesUsed() method exists purely to satisfy the Accountable interface for standalone introspection (e.g. tests, debugging), not to feed the writer's own accounting. In Lucene99's case, quantizationOverheadBytesUsed() and the old SHALLOW_SIZE return the same value anyway (Lucene99's FieldWriter has no extra quant state like magnitudes or dimensionSums), so it was functionally identical, just inconsistent.
Thanks @shubhamvishu!
There was a problem hiding this comment.
The FieldWriter.ramBytesUsed() method exists purely to satisfy the Accountable interface for standalone introspection (e.g. tests, debugging), not to feed the writer's own accounting.
Ahh this makes sense and gives me clarity now. Thanks !
So the FieldWriter does not track any flat vector data anymore even though it does if asked specifically(which we should call on the main code path for accounting). Could you add a comment above flatFieldVectorsWriter.ramBytesUsed() in FieldWriter#ramBytesUsed to mention this is a no-op in case of overall accounting(but solely exists for maintaining api correctness/contract) so others are also not confused?
| public long ramBytesUsed() { | ||
| long size = SHALLOW_SIZE; | ||
| long size = quantizationOverheadBytesUsed(); | ||
| size += flatFieldVectorsWriter.ramBytesUsed(); |
There was a problem hiding this comment.
So the raw delegate above would now be responsible to account for vector data for both float and bytes and hence we switched to call the overhead part in this? But then will we not double count it for floats her with flatFieldVectorsWriter.ramBytesUsed and also rawVectorDelegate.ramBytesUsed(the newly added one)?
There was a problem hiding this comment.
Yes, rawVectorDelegate is now the single source of truth for all flat vector data (both byte and float32).
No double counting happens, FieldWriter.flatFieldVectorsWriter is the same Java object that rawVectorDelegate holds internally as the per-field writer, it's what this.rawVectorDelegate.addField(fieldInfo) returns and then passes into new FieldWriter(fieldInfo, rawVectorDelegate). So rawVectorDelegate.ramBytesUsed() already accounts for those float vectors.
The writer level loop then calls field.quantizationOverheadBytesUsed(), which only counts the FieldWriter shell + magnitudes + dimensionSums, NOT flatFieldVectorsWriter. FieldWriter.ramBytesUsed() (which does include flatFieldVectorsWriter.ramBytesUsed()) is never called from the writer level accounting. It's there solely for the Accountable interface. So each byte of flat float data is counted exactly once through rawVectorDelegate.
Thanks @shubhamvishu!
|
Would it be better if instead of (normal + quantization overhead) we see it like keeping things specific to FieldWriter in its |
mikemccand
left a comment
There was a problem hiding this comment.
Thank you @iprithv -- I haven't had time to review more deeply -- I left a small question about the byte[] vector input case.
| (RamUsageEstimator.NUM_BYTES_OBJECT_REF | ||
| + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER) | ||
| + vectors.size() * (long) dim * Float.BYTES; | ||
| + vectors.size() * (long) dim * fieldInfo.getVectorEncoding().byteSize; |
There was a problem hiding this comment.
Whoa, so this means, if Lucene user was index vectors coming in as byte[] (like they pre-quantize, outside of Lucene), we were incorrectly counting them as 4X larger RAM usage, and IW would flush way too early?
There was a problem hiding this comment.
Yes, exactly. Before the fix, ramBytesUsed() always multiplied by Float.BYTES (4) regardless of encoding, so a byte[] vector field was reported as 4x its actual memory cost causing IndexWriter to flush up to 4x too early for byte encoded vector fields. After this goes in, it switches to fieldInfo.getVectorEncoding().byteSize, which is 1 for BYTE and 4 for FLOAT32, giving the correct cost in both cases. Thanks @mikemccand!
There was a problem hiding this comment.
@mikemccand just wanted to touch base with you on this, in case it got buried. Thanks!
@shubhamvishu I think the current design is the right one, and there is no correct alternative for this specific structure. If we made FieldWriter.ramBytesUsed() return only quantizationOverheadBytesUsed() and called field.ramBytesUsed() at the writer level instead of quantizationOverheadBytesUsed():
I think the quantizationOverheadBytesUsed() method is the right abstraction because the delegate already "owns" the flat data layer and the writer should only know about the extra quant state on top. Everything is accounted for exactly once. Thanks @shubhamvishu! |
|
@iprithv Thanks for iterating here and looks ready. I think I understand now what was my confusion around double accounting(which we are not doing apparently). I added a small comment above to mention that in a inline code comment. Could you also run luceneutil benchmarks with you PR to see how this impacts the vector indexing or merging etc? |
@shubhamvishu sure, done. added the comment. Thanks! this change is just fixing ram accounting, no changes to actual indexing or merge logic. for float vectors, nothing changes. for byte vectors, we were overcounting memory before (around 4x), so now it just reports the correct usage. this mainly helps avoid early flushes from IndexWriter. so indexing/merging performance shouldn’t really change. I still ran KnnGraphTester with 50k cohere vectors (1024d, 8 threads, hnsw): no regression. both runs behave the same (same segments, same merges). as expected since this is only a ram accounting fix. Thanks! |
Description
Vector RAM accounting in
ramBytesUsed()had three bugs causing IndexWriter to undercount memory usage for buffered vectors, leading to delayed flush decisions and higher than expected memory consumption.Bugs Fixed
Fixes #15901
1.
BufferingKnnVectorsWriterhardcodedFloat.BYTESfor all encodingsByte vectors (
VectorEncoding.BYTE) were reported as 4x their actual size becauseramBytesUsed()always multiplied byFloat.BYTES(4) instead ofByte.BYTES(1). This is technically an overcount for byte vectors, but it's wrong in the opposite direction, it masks the undercounting elsewhere and produces incorrect flush thresholds.2. Quantized writers never counted
rawVectorDelegateRAMLucene104ScalarQuantizedVectorsWriter,Lucene99ScalarQuantizedVectorsWriter, andLucene102BinaryQuantizedVectorsWriterall wrap arawVectorDelegate(Lucene99FlatVectorsWriter). For FLOAT32 fields, the delegate's field-level data was counted indirectly throughFieldWriter.flatFieldVectorsWriter.ramBytesUsed(). But for BYTE fields, which bypass the quantizedFieldWriterentirely, the delegate was never queried, making byte vector RAM completely invisible (48 bytes reported for hundreds of KB of actual data).Refactored all three writers to call
rawVectorDelegate.ramBytesUsed()at the writer level for all flat vector data, andquantizationOverheadBytesUsed()for quantization-specific state (magnitudes, dimensionSums) to avoid double-counting.3.
dimensionSumsarray not countedThe
float[dimension]array used for centroid calculation during flush was not included inramBytesUsed()forLucene104ScalarQuantizedVectorsWriterandLucene102BinaryQuantizedVectorsWriter.