Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,21 @@
* </ul>
* <p>
* Streaming reads support carry-over removal, update detection, and net change
* computation. Net change collapses are kept in the state store keyed by row identity;
* row identities only touched in the latest observed commit are held back until either a
* later commit (with strictly greater `_commit_timestamp`) advances the global watermark
* past them, or the source terminates.
* computation. Two streaming-specific behaviors to be aware of:
* <ul>
* <li><b>Output is buffered until the watermark advances past the commit.</b>
* When a micro-batch ingests a commit, that commit's output rows are
* buffered in state and not emitted in the same batch. They are emitted
* by a later micro-batch -- whichever one advances the watermark past
* the commit's {@code _commit_timestamp}. The last commit's output is
* emitted when the source terminates.</li>
* <li><b>netChanges only merges changes that are buffered together.</b>
* When each row identity appears in at most one commit within any
* buffered window, the streaming output is the same as
* {@code computeUpdates}. Cross-commit merging only happens when
* several commits touch the same row before the earliest one's output
* has been released. For full-range collapse, use a batch read.</li>
* </ul>
* <p>
* <b>Pushdown contract.</b> When any post-processing pass applies (carry-over
* removal, update detection, or netChanges), Spark only pushes predicates
Expand Down