Improve REST Support for lazy snapshot loading#16207
Improve REST Support for lazy snapshot loading#16207grantatspothero wants to merge 1 commit intoapache:mainfrom
Conversation
e0ba204 to
e606251
Compare
c9bc366 to
c756769
Compare
|
Hi @grantatspothero , |
|
Our problem was excessive memory usage due to caching TableMetadata on the client side. Storing a Example:
Note: this is "resident set size" not "total allocations" which tends to be significantly higher due to intermediate allocations of parsing JSON. For multi-tenant coordinator services (eg: query engines, cache services) this memory usage is a problem. The biggest memory hog is by far the snapshots array, but snapshotLog is the next biggest. Since iceberg already defers snapshots, it seemed reasonable to defer snapshotLog. |
It is becoming more common to have large numbers of snapshots in iceberg due to prevalence of streaming ingestion/low latency commits. See mailing list discussions: https://www.mail-archive.com/dev@iceberg.apache.org/msg12764.html This doesn't solve the full problem mentioned in that mailing list thread (writes still pay the full cost of writing snapshots/snapshotLog), but it does solve the problem for readers. And for query engine/caching usecases, reads >> writes so this could be beneficial. |
c756769 to
083fb92
Compare
Previously only lazily loaded snapshots
083fb92 to
3c6b025
Compare
|
Thanks you for the explanation, @grantatspothero ! I can take a look at the code, if the improvement is simple enough, I don't see why not to include. If it's messy or complicated, we might need some community support to get it through. |
Two different definitions of cache:
|
|
Thanks again for the explanation @grantatspothero ! In the meantime, I think logically the fix is 2 parts:
|
|
Big +1 to everything @gaborkaszab said in #16207 (comment) , that also is where I was at in my reply on the dev-list, but thank you @grantatspothero for the explanation of the use case. I'll take a look at this PR today |
Walking through the two points you mentioned above:
Summary: we need both the client side and server side change. Server side suppression of historical snapshots in REFS mode and client side lazy loading of historical snapshots when requested. Can you help me understand what your commit is trying to test above? It removes the server side change to suppress historical snapshots in REFs mode, and as mentioned above we need both parts. |
|
@grantatspothero I disagree with 2) though, or I misunderstand the motivation :) Let me explain: I might overthink this. In case you have full control on the REST catalog server implementation (i.e. you have your own proprietary implementation) then you're fine here. In case you don't and you connect to any REST catalogs out there, then you have no control on whether they send filtered or full snapshotLog in REFS mode. Sorry if I got this too long or complicated, hope it makes sense! |
|
Re: Polaris, good to know. Sounds like an example that this PR could be valuable in standardizing approach of handling snapshotLog. Re: Other REST catalog implementations, thank you for the explanation that makes sense. For arbitrary REST catalog implementations, there is no guarantee on the performance of using |
|
Just wanted to mention that I attempted essentially the same thing in #8173 a while ago but there were som concerns about increased complexity |
|
Hey @grantatspothero , Now, this PR might still makes sense, IMO, if we want to be foolproof on the client side and not break internal state of the TableMetadata if a catalog returns a reduced snapshot log for REFS mode for some reason (maybe misinterpreting the spec), and then we load the snapshots lazily (but not the log). But then, we kind of guard against some behavior of another projects that in fact breaks the contract of the REST spec. I'm somewhat hesitant to do that, but lean to have this PR anyway. I'm curious of others opinions here. Taking one step back, what I still don't entirely get is this: @grantatspothero you mentioned that you'd like to reduce memory pressure by not keeping entire snapshot log in REFS mode. Could you help me understand the full picture how you try to achieve that? This PR doesn't help you with that. |
Under this definition I agree this PR changes the behavior of REFS mode and that is not good. That said, the spec is ambiguous as written given different catalog implementations are already interpreting it differently. Taking a step back, what was the purpose of introducing REFS mode? I thought it was introduced as a performance optimization to allow clients to have a slim representation of TableMetadata for common operations while falling back to the full representation for less common operations. This PR seems to fit into that goal of REFS mode, but maybe REFS mode was introduced for a different reason?
Consider a multitenant coordinator service (ie: a query engine like trino). If the coordinator service does a read only query that doesn't access historical snapshots, then loading a slim version of table metadata in the client is enough to satisfy query planning entirely. The full table metadata must only be pulled into the client for read queries using time travel or write queries. Both these types of queries happen significantly less often than read only queries over the current snapshot. |
Thank you for the context, I missed that PR. Given it is a few years later and low latency commit into iceberg is becoming more prevalent it might be worth another round of feedback from @rdblue |
|
Thanks for the explanation @grantatspothero !
I get this, what I don't get is how this PR helps with achieving this. If a REST server sends a full snapshot log we still put that full log into the TableMetadata, this PR doesn't help you with reducing the memory pressure, then, right?
I think this is partially correct: this PR changes the behavior of REFS mode for the reference REST catalog we use for tests. |
Yes agreed, this PR is only helpful if REFS mode is defined per spec as "truncate both snapshots and snapshotLog to only snapshot entries referenced by refs". You are suggesting that is not the current definition of REFS mode per spec. I'm suggesting the spec is ambiguous as currently written and to meet the goals of REFS mode (slim metadata loading on clients) it might be useful to be more explicit in the spec. where "The snapshots" could refer to just snapshots array or snapshots array and snapshotLog. Right now it is ambiguous and some implementations have interpreted that differently. |
|
FYI, I opened a PR just to clarify the original intent of the REFS mode. cc @nastra Now with this PR, the question is if we want to be robust enough on the client-side to not to have inconsistent |
IMO the spec is pretty clear, because it mentions |
Previously this PR added support for lazy snapshot loading: https://github.com/apache/iceberg/pull/6850/changes
This PR improves the lazy loading by supporting lazy loading of
snapshotLog.For tables with high numbers of snapshots (eg: tables with low latency commits) this can result in significant memory savings.
Considerations:
setSnapshotsSupplier