Skip to content

Fix cell duplication when YStore diverges from disk#376

Merged
krassowski merged 10 commits into
jupyter-server:mainfrom
jordanhboxer:fix-duplicate-cell-bug
Jan 9, 2026
Merged

Fix cell duplication when YStore diverges from disk#376
krassowski merged 10 commits into
jupyter-server:mainfrom
jordanhboxer:fix-duplicate-cell-bug

Conversation

@jordanhboxer
Copy link
Copy Markdown
Contributor

@jordanhboxer jordanhboxer commented Jan 2, 2026

Fixes cell duplication and DuplicateCellId warnings that occur when YStore state diverges from the notebook file on disk, typically after disconnect/reconnect beyond document_cleanup_delay where the server loads from YStore then detects out-of-sync with disk.

Root cause: YNotebook.set() would insert new cells when encountering retained cells in the wrong position, creating duplicates instead of reordering existing cells. This happens when:

  • Server loads doc from YStore after room cleanup
  • Compares with model["content"] (file on disk)
  • Detects out-of-sync and calls set() to reload from file
  • File has same cell IDs but different order than YStore

Solution: Use pycrdt Array.move() to reorder retained cells instead of recreating them. This:

  • Prevents duplication when cell order differs between YStore and disk
  • Preserves CRDT structure (observers, undo scope, granular updates)
  • Fixes the root cause in the document model layer (not server layer)
  • Handles already-corrupted state with duplicate IDs
  • O(n) complexity for pure reordering of retained cells

Implementation:

  • Deduplicate cell IDs when building old_ycells_by_id mapping
  • Remove duplicate IDs during pruning phase
  • Build id_to_index map for O(1) retained cell lookups
  • Use Array.move(cur, index) to reorder retained cells
  • Update id_to_index only for affected range after moves (not entire map)
  • Rebuild id_to_index after inserts (O(n) rebuild)
  • Use .get("id") instead of ["id"] for safer access
  • Append new_cell (not old_cell snapshot) to new_cell_list
  • Add defensive fallthrough if retained cell not found

Complexity:

  • Pure reordering: O(n) with range updates
  • With inserts: O(n) per insert for map rebuild
  • Worst case: O(n²) if many inserts, but inserts are uncommon in out-of-sync recovery

Tests:

  • test_set_reorder_does_not_duplicate_cells
  • test_set_removes_preexisting_duplicate_ids
  • test_set_reorder_with_mixed_operations
  • test_set_simple_adjacent_swap

Fixes jupyter-server/jupyter-collaboration#526
Fixes jupyterlab/jupyterlab#14031

Fixes cell duplication and DuplicateCellId warnings that occur when YStore state diverges from the notebook file on disk, typically after disconnect/reconnect beyond document_cleanup_delay where the server
loads from YStore then detects out-of-sync with disk.

Root cause: YNotebook.set() would insert new cells when encountering retained cells in the wrong position, creating duplicates instead of reordering existing cells. This happens when:
- Server loads doc from YStore after room cleanup
- Compares with model["content"] (file on disk)
- Detects out-of-sync and calls set() to reload from file
- File has same cell IDs but different order than YStore

Solution: Use pycrdt Array.move() to reorder retained cells instead of recreating them. This:
- Prevents duplication when cell order differs between YStore and disk
- Preserves CRDT structure (observers, undo scope, granular updates)
- Fixes the root cause in the document model layer (not server layer)
- Handles already-corrupted state with duplicate IDs
- O(n) complexity for pure reordering of retained cells

Implementation:
- Deduplicate cell IDs when building old_ycells_by_id mapping
- Remove duplicate IDs during pruning phase
- Build id_to_index map for O(1) retained cell lookups
- Use Array.move(cur, index) to reorder retained cells
- Update id_to_index only for affected range after moves (not entire map)
- Rebuild id_to_index after inserts (rare, so O(n) rebuild acceptable)
- Use .get("id") instead of ["id"] for safer access
- Append new_cell (not old_cell snapshot) to new_cell_list
- Add defensive fallthrough if retained cell not found

Complexity:
- Pure reordering: O(n) with range updates
- With inserts: O(n) per insert for map rebuild
- Worst case: O(n²) if many inserts, but inserts are uncommon in
  out-of-sync recovery (most cells are retained)

Tests:
- test_set_reorder_does_not_duplicate_cells
- test_set_removes_preexisting_duplicate_ids
- test_set_reorder_with_mixed_operations
- test_set_simple_adjacent_swap

Fixes [jupyter-server/jupyter-collaboration#526](jupyterlab/jupyter-collaboration#526)
Fixes [jupyterlab/jupyterlab#14031](jupyterlab/jupyterlab#14031)
@krassowski krassowski added the bug Something isn't working label Jan 2, 2026
Comment thread jupyter_ydoc/ynotebook.py Outdated
Comment thread jupyter_ydoc/ynotebook.py Outdated
Comment thread tests/test_ynotebook.py Outdated
Comment thread tests/test_ynotebook.py Outdated
jordanhboxer and others added 5 commits January 3, 2026 10:50
Co-authored-by: David Brochart <david.brochart@gmail.com>
Co-authored-by: David Brochart <david.brochart@gmail.com>
Co-authored-by: David Brochart <david.brochart@gmail.com>
Co-authored-by: David Brochart <david.brochart@gmail.com>
@jordanhboxer
Copy link
Copy Markdown
Contributor Author

Unfortunately I had to remove the YArray.move method because it was producing binary that was incompatible with yjs 13.x. Found this out after trying to read a notebook in the UI after making out of band moves and got hit with Caught error while handling a Yjs update TypeError: hh[(e & Ft)] is not a function. This was confirmed with v14 release notes https://github.com/yjs/yjs/releases/tag/v14.0.0-0

@jordanhboxer
Copy link
Copy Markdown
Contributor Author

jordanhboxer commented Jan 5, 2026

refactored and simplified the change since we now just have O(n^2) complexity without being able to use move

@davidbrochart
Copy link
Copy Markdown
Collaborator

On the other hand it might be time to release https://github.com/yjs/yjs/releases/tag/v14.0.0-0 @dmonad ?

@dmonad
Copy link
Copy Markdown

dmonad commented Jan 6, 2026

It is unlikely that this happens anytime soon. The work lives in another branch of the yjs repo.

@davidbrochart
Copy link
Copy Markdown
Collaborator

But except that it needs to be merged back into the Yjs repo, anything else preventing to get that feature in? It seems natural for Yjs to support it if Yrs supports it.

@dmonad
Copy link
Copy Markdown

dmonad commented Jan 6, 2026

Time. I currently focus on other work. The move feature conflicts with the new suggestion / attribution feature. I want to backport it eventually, but I'm short on time and resources as-is.

Comment thread jupyter_ydoc/ynotebook.py
# Use delete+recreate instead of move() for yjs 13.x compatibility
# (yjs 13.x doesn't support the move operation that pycrdt generates)
del self._ycells[cur]
self._ycells.insert(index, self.create_ycell(new_cell))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating the ycell anew, can we insert the existing model to avoid side-effects (like cursor jumping/cell outputs re-rendering)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the tests are failing because pycrdt orphans objects in that case which is why I originally chose to use the YArray.move method but that was not an option either it turns out so we're stuck with this until we can get on yjs 14.x

             assert nb.cell_number == 3
     >       ids = [cell["id"] for cell in nb.get()["cells"]]
                    ^^^^^^^^^^
     E       KeyError: 'id'

     tests/test_ynotebook.py:355: KeyError
     =========================== short test summary info ============================
     FAILED tests/test_pycrdt_yjs.py::test_ypy_yjs_0[0] - KeyError('source') [sing...
     FAILED tests/test_pycrdt_yjs.py::test_ypy_yjs_1[1] - TimeoutError() [single e...
     FAILED tests/test_ynotebook.py::test_set_reorder_does_not_duplicate_cells - K...
     FAILED tests/test_ynotebook.py::test_set_reorder_with_mixed_operations - KeyE...
     FAILED tests/test_ynotebook.py::test_set_simple_adjacent_swap - KeyError: 'id'
     ======================== 5 failed, 37 passed in 14.85s =========================

Copy link
Copy Markdown
Collaborator

@krassowski krassowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jordanhboxer!

It's a pity we cannot use move() but hopefully it will be possible in the future.

I tested it locally and intend to merge and publish in a patch release today. I will wait a few hours in case if @davidbrochart has any outstanding comments.

Comment thread tests/test_ynotebook.py Outdated
Comment thread tests/test_ynotebook.py Outdated
Comment thread tests/test_ynotebook.py Outdated
Comment thread tests/test_ynotebook.py Outdated
Comment thread tests/test_ynotebook.py Outdated
Comment thread tests/test_ynotebook.py Outdated
Co-authored-by: David Brochart <david.brochart@gmail.com>
@jordanhboxer
Copy link
Copy Markdown
Contributor Author

Thank you @jordanhboxer!

It's a pity we cannot use move() but hopefully it will be possible in the future.

I tested it locally and intend to merge and publish in a patch release today. I will wait a few hours in case if @davidbrochart has any outstanding comments.

of course glad I could help! yeah might be worth adding a todo somewhere to update this when the move to yjs 14.x happens

Copy link
Copy Markdown
Collaborator

@davidbrochart davidbrochart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Damaged notebook (duplicate cells) in RTC-active hosted hub

4 participants