Fix cell duplication when YStore diverges from disk#376
Conversation
Fixes cell duplication and DuplicateCellId warnings that occur when YStore state diverges from the notebook file on disk, typically after disconnect/reconnect beyond document_cleanup_delay where the server
loads from YStore then detects out-of-sync with disk.
Root cause: YNotebook.set() would insert new cells when encountering retained cells in the wrong position, creating duplicates instead of reordering existing cells. This happens when:
- Server loads doc from YStore after room cleanup
- Compares with model["content"] (file on disk)
- Detects out-of-sync and calls set() to reload from file
- File has same cell IDs but different order than YStore
Solution: Use pycrdt Array.move() to reorder retained cells instead of recreating them. This:
- Prevents duplication when cell order differs between YStore and disk
- Preserves CRDT structure (observers, undo scope, granular updates)
- Fixes the root cause in the document model layer (not server layer)
- Handles already-corrupted state with duplicate IDs
- O(n) complexity for pure reordering of retained cells
Implementation:
- Deduplicate cell IDs when building old_ycells_by_id mapping
- Remove duplicate IDs during pruning phase
- Build id_to_index map for O(1) retained cell lookups
- Use Array.move(cur, index) to reorder retained cells
- Update id_to_index only for affected range after moves (not entire map)
- Rebuild id_to_index after inserts (rare, so O(n) rebuild acceptable)
- Use .get("id") instead of ["id"] for safer access
- Append new_cell (not old_cell snapshot) to new_cell_list
- Add defensive fallthrough if retained cell not found
Complexity:
- Pure reordering: O(n) with range updates
- With inserts: O(n) per insert for map rebuild
- Worst case: O(n²) if many inserts, but inserts are uncommon in
out-of-sync recovery (most cells are retained)
Tests:
- test_set_reorder_does_not_duplicate_cells
- test_set_removes_preexisting_duplicate_ids
- test_set_reorder_with_mixed_operations
- test_set_simple_adjacent_swap
Fixes [jupyter-server/jupyter-collaboration#526](jupyterlab/jupyter-collaboration#526)
Fixes [jupyterlab/jupyterlab#14031](jupyterlab/jupyterlab#14031)
for more information, see https://pre-commit.ci
Co-authored-by: David Brochart <david.brochart@gmail.com>
Co-authored-by: David Brochart <david.brochart@gmail.com>
for more information, see https://pre-commit.ci
Co-authored-by: David Brochart <david.brochart@gmail.com>
Co-authored-by: David Brochart <david.brochart@gmail.com>
|
Unfortunately I had to remove the |
…exity without move
|
refactored and simplified the change since we now just have O(n^2) complexity without being able to use |
|
On the other hand it might be time to release https://github.com/yjs/yjs/releases/tag/v14.0.0-0 @dmonad ? |
|
It is unlikely that this happens anytime soon. The work lives in another branch of the yjs repo. |
|
But except that it needs to be merged back into the Yjs repo, anything else preventing to get that feature in? It seems natural for Yjs to support it if Yrs supports it. |
|
Time. I currently focus on other work. The move feature conflicts with the new suggestion / attribution feature. I want to backport it eventually, but I'm short on time and resources as-is. |
| # Use delete+recreate instead of move() for yjs 13.x compatibility | ||
| # (yjs 13.x doesn't support the move operation that pycrdt generates) | ||
| del self._ycells[cur] | ||
| self._ycells.insert(index, self.create_ycell(new_cell)) |
There was a problem hiding this comment.
Instead of creating the ycell anew, can we insert the existing model to avoid side-effects (like cursor jumping/cell outputs re-rendering)?
There was a problem hiding this comment.
Unfortunately the tests are failing because pycrdt orphans objects in that case which is why I originally chose to use the YArray.move method but that was not an option either it turns out so we're stuck with this until we can get on yjs 14.x
assert nb.cell_number == 3
> ids = [cell["id"] for cell in nb.get()["cells"]]
^^^^^^^^^^
E KeyError: 'id'
tests/test_ynotebook.py:355: KeyError
=========================== short test summary info ============================
FAILED tests/test_pycrdt_yjs.py::test_ypy_yjs_0[0] - KeyError('source') [sing...
FAILED tests/test_pycrdt_yjs.py::test_ypy_yjs_1[1] - TimeoutError() [single e...
FAILED tests/test_ynotebook.py::test_set_reorder_does_not_duplicate_cells - K...
FAILED tests/test_ynotebook.py::test_set_reorder_with_mixed_operations - KeyE...
FAILED tests/test_ynotebook.py::test_set_simple_adjacent_swap - KeyError: 'id'
======================== 5 failed, 37 passed in 14.85s =========================
krassowski
left a comment
There was a problem hiding this comment.
Thank you @jordanhboxer!
It's a pity we cannot use move() but hopefully it will be possible in the future.
I tested it locally and intend to merge and publish in a patch release today. I will wait a few hours in case if @davidbrochart has any outstanding comments.
Co-authored-by: David Brochart <david.brochart@gmail.com>
of course glad I could help! yeah might be worth adding a todo somewhere to update this when the move to yjs 14.x happens |
davidbrochart
left a comment
There was a problem hiding this comment.
Thanks @jordanhboxer and @krassowski.
Fixes cell duplication and
DuplicateCellIdwarnings that occur whenYStorestate diverges from the notebook file on disk, typically after disconnect/reconnect beyonddocument_cleanup_delaywhere the server loads fromYStorethen detects out-of-sync with disk.Root cause:
YNotebook.set()would insert new cells when encountering retained cells in the wrong position, creating duplicates instead of reordering existing cells. This happens when:YStoreafter room cleanupmodel["content"](file on disk)set()to reload from fileYStoreSolution: Use pycrdt
Array.move()to reorder retained cells instead of recreating them. This:YStoreand diskImplementation:
old_ycells_by_idmappingid_to_indexmap for O(1) retained cell lookupsArray.move(cur, index)to reorder retained cellsid_to_indexonly for affected range after moves (not entire map)id_to_indexafter inserts (O(n) rebuild).get("id")instead of["id"]for safer accessnew_cell(notold_cellsnapshot) tonew_cell_listComplexity:
Tests:
test_set_reorder_does_not_duplicate_cellstest_set_removes_preexisting_duplicate_idstest_set_reorder_with_mixed_operationstest_set_simple_adjacent_swapFixes jupyter-server/jupyter-collaboration#526
Fixes jupyterlab/jupyterlab#14031