Add support for PyMongo Async and N+1 Dereference/select_related using Pipeline by arunsureshkumar · Pull Request #2904 · MongoEngine/mongoengine

arunsureshkumar · 2026-01-05T07:45:02Z

This PR contributes to #2902 , which tracks the ongoing effort to add native async support to MongoEngine using PyMongo’s official async API (>= 4.14).

It includes foundational changes toward async-first workflows, improvements to select_related via aggregation pipelines, and related updates across core internals, tests, CI, and documentation.

Notes

Async support is native, not layered on top of sync behavior
Focus on unified sync/async code paths and improved performance
Version and compatibility changes apply (see issue for details)

📌 Details, motivation, and full scope are documented in the issue: #2902
🚧 Work is still in progress — feedback is welcome.

- Refactored the core ORM to support PyMongo’s native async API - Unified sync and async code paths across documents, querysets, and transactions - Replaced legacy async implementations - Removed deprecated and compatibility code BREAKING CHANGE: - Removed legacy async behavior - Removed LazyReferenceField - Removed GenericLazyReferenceField - GenericReferenceField now requires `choices` - Dropped support for PyMongo < 4.14 - Dropped support for MongoDB < 4.2

… scenarios

BaseQuerySet is now defined only in the synchronous queryset implementation.

add: TestQuerysetLookupMatch

…uilder stages - Extract query normalization, match planning, lookup planning, stage building, and tail stages - Introduce clear aggregation pipeline architecture aligned with MongoDB stages - Reduce PipelineBuilder to a small orchestration layer - Improve readability, isolation, and long-term maintainability

…ilter-only lookups Body: • refactor StageBuilder traversal/handlers for readability • keep $lookup unfiltered for correct hydration; apply foreign predicates via local $filter • emit explicit _missing_reference markers so deref raises DoesNotExist • reduce duplication with shared helpers and structured dispatch

- Normalize refIds generation across scalar and container fields - Use reduce-based flattening for ListField reference lookups - Ensure missing references emit `{_missing_reference, _ref}` only - Fix select_related pipelines to match MongoDB 4.2+ semantics - Expand pipeline builder tests for nested and container references

…ort, updated installation steps, supported MongoDB versions, and improved examples.

…l image and init script.

…ster dep management

…date tox env deps

# Conflicts: # docs/requirements.txt

…project - Use uv for dependency management and builds - Simplify GitHub Actions matrix and MongoDB setup - Replace custom CI scripts with uv-based dependency management, actions - Match tox environment with GitHub Action matrix

…registry cleanup - Update assertions to use `await` where needed for async compatibility. - Introduce `_DocumentRegistry.clear()` and `_CollectionRegistry.clear()` calls in test setups and teardowns. - Normalize test workflows to ensure proper registry state management across synchronous and asynchronous environments. - Simplify pipeline builder tests by removing unnecessary async and ensuring compatibility with recent updates.

…ts, and async example usage in `query_counter` and `async_query_counter`

…eld docstring

…ection.py`

abhinand-c · 2026-01-05T11:39:53Z

@bagerard @rozza @hmarr
Can you help us with review and feedbacks?

… reducing the flakiness of transaction

…saction

Fix/test support for pickle filteration

Reducing the flakiness of transaction

…ression When a filter condition targets a reference/list-of-reference field (e.g. articles__headline='Hello') and select_related is used, the $addFields hydration stage now applies a $filter on the docs_alias array instead of using the full unfiltered result. This ensures the hydrated field contains only the matching documents rather than all fetched documents. For ListField references the filtered array is assigned directly; for scalar ReferenceField the first matching element is extracted via $arrayElemAt (or null if none match). Also remove unused `Union` import from synchronous queryset.

When _walk_lookups recursed into a ListField(ReferenceField) subtree (e.g. before_child -> parent), it passed embedded_list_path=None, causing the nested ref to use _add_structured_ref_lookup with a dotted path over an array. This generated $indexOfArray(ids, array_of_ids) which always returned -1, writing {_missing_reference: True} to every element even when the referenced document existed. Fix: pass embedded_list_path=full_path so the recursive call uses _add_embedded_list_structured_ref_lookup, which correctly uses $map to update each array element's field individually.

…= 5.0 - StageBuilder now builds the joined-docs hash {id_str: doc} once in an outer $let and uses $getField for O(1) lookup per ref leaf. Cuts hydration cost for List/Map/Dict of ReferenceField from O(N*M) to O(N+M) against large joined collections. Falls back to legacy $indexOfArray when MongoDB < 5.0. - PipelineBuilder accepts mongo_version; all 4 caller sites resolve the queryset's effective alias (matching _get_collection's logic, with using(None,None) guard) so multi-cluster setups probe the correct cluster. - get_mongodb_version/async_get_mongodb_version now accept an alias and cache per-alias to avoid a server_info() roundtrip on every aggregation. Disconnect clears only the disconnected alias's entry. Cleanup bundled in: - Consolidate is_list_of_embedded / embedded_doc_type into Schema; drop the duplicated copies in StageBuilder and utils.py. - Delegate LookupPlanner._get_field_by_db_part and MatchPlanner's local field-lookup closure to Schema.resolve_field_name. - Remove the verbatim duplicate of needs_aggregation in pipeline_builder.py (utils.py is the canonical home, exported via __init__). - Fix __init__.py NameError from referencing modules after star-import. - Drop dead `ids` $let variable in _build_value_expr that computed an unused $map on every hydrated document. - Merge the two sequential `if isinstance(field, DictField)` blocks in _walk_lookups into one explicit dispatch. - Expand pipeline_builder/README.md with the missing schema.py / utils.py, data flow, both build paths, and design invariants.

- Enable ruff-format hook in .pre-commit-config.yaml (ruff-check stays disabled until the ~6,700-error backlog — mostly F403/F405 from `import *` in tests and E501 — is triaged separately). - Run `ruff format` once across the codebase to establish a clean baseline. - Run `ruff check --select F401 --fix` to drop 32 unused imports. - Trailing-newline fix in pipeline_builder/README.md from end-of-file-fixer. No functional changes — purely whitespace, formatting, and dead-import cleanup.

Split the 2,559-line fields.py monolith into a logical folder hierarchy: - string/ - StringField, URLField, EmailField (individual files) - numeric/ - IntField, FloatField, DecimalField, Decimal128Field - datetime/ - DateTimeField, DateField, ComplexDateTimeField - complex/ - ListField, DictField, MapField (renamed from container/) - document/ - EmbeddedDocumentField, GenericEmbeddedDocumentField, DynamicField - reference/ - ReferenceField, GenericReferenceField - file/ - BinaryField, FileField, ImageField, GridFSProxy - geo/ - GeoPointField + 6 GeoJSON types (individual files) - boolean.py, enum.py, uuid.py, sequence.py (single-file modules) - exceptions.py - GridFSError, ImproperlyConfigured All imports remain backward compatible via fields/__init__.py re-exports. Tests pass: 597 field tests (299 sync + 298 async).

Split the 689-line base/fields.py into individual class files: - base_field.py - BaseField (260 lines, core field descriptor) - complex_base_field.py - ComplexBaseField (217 lines, for lists/dicts) - object_id_field.py - ObjectIdField (31 lines, ObjectId wrapper) - geo_json_base_field.py - GeoJsonBaseField (159 lines, GeoJSON validation) All imports remain backward compatible via base/fields/__init__.py. Tests pass: 198 base field tests (99 sync + 99 async).

Split the 516-line base/datastructures.py into individual class files: - helpers.py - mark_as_changed_wrapper decorators (26 lines) - base_dict.py - BaseDict (74 lines, change-tracking dict) - base_list.py - BaseList (106 lines, change-tracking list) - embedded_document_list.py - EmbeddedDocumentList (173 lines, queryable embedded doc list) - strict_dict.py - StrictDict (85 lines, slot-based efficient dict) - lazy_reference.py - LazyReference (70 lines, deferred document loading) All imports remain backward compatible via base/datastructures/__init__.py. Tests pass: 60 dereference tests + 31 embedded document list tests.

Add ZonedDateTimeField that stores both UTC time and timezone name, enabling accurate time comparisons while preserving the original timezone for frontend display. Storage format: - MongoDB: {"utc": ISODate(...), "tz": "Asia/Kolkata"} - Python: timezone-aware datetime in original timezone Key features: - DST-safe: stores timezone name (e.g., "America/New_York"), not offset - Query support: start_time__utc__gte for time queries, start_time__tz for timezone queries - Automatic index expansion: 'start_time' → 'start_time.utc' in meta.indexes - Works with both sync and async APIs Tests cover storage, retrieval, DST handling, querying, ordering, and indexing.

- Add ZonedDateTimeField to API reference - Add ZonedDateTimeField to field list in defining-documents guide - Fix Sphinx 9.x incompatibility with readthedocs_ext (only load on ReadTheDocs) - Update dependencies to latest versions with environment markers for Sphinx - Sphinx 8.1.3 for Python 3.10-3.11, 9.1.0 for Python 3.12+ - ruff 0.15, pre-commit 4.6, pytest 9.0.3, coverage 7.14, pillow 12.2 - tox 4.54, tox-uv 1.35.2, uv_build 0.11.16

MongoDB 5.0-7.0 support $getField but require the 'field' parameter to be a constant, not a variable expression. The pipeline builder was using {"$getField": {"field": {"$toString": "$$rid"}, ...}} which works on MongoDB 4.4 (lenient) and 8.0+ (relaxed), but fails on 5.0-7.0 with error 5654601: "$getField requires 'field' to evaluate to a constant". Changed the version check to only enable $getField optimization on MongoDB >= 8.0, falling back to the legacy $indexOfArray approach on earlier versions. This trades O(1) lookup performance for compatibility on 5.0-7.0, while MongoDB 8.0+ still gets the optimized path. Tested on MongoDB 5.0.31, 6.0.28, 7.0.34 - all tests pass.

The tox -a command outputs environment names separated by newlines, but tox -e expects comma-separated values. This was causing CI failures with: 'provided environments not found in configuration file'. Added 'tr "\n" "," | sed "s/,$//"' to convert the newline-separated list to comma-separated format that tox expects.

The 'readthedocs' builder doesn't exist in Sphinx. Changed the html-readthedocs target to use the standard 'html' builder with the -T -E flags for strict checking and fresh build. This fixes the CI build_doc_dryrun job that was failing with: 'Builder name readthedocs not registered or available through entry point'

Split the build-n-publish job into two separate jobs: 1. build-release: Builds wheel and sdist, uploads as artifacts 2. publish-to-pypi: Downloads artifacts and publishes to PyPI Benefits: - Better separation of concerns - Build artifacts can be verified before publishing - Failed publish doesn't require rebuilding - Follows GitHub Actions best practices for release workflows The publish job depends on build-release and both only run on tag creation (refs/tags/v*).

- Add pymongo 4.16 and 4.17 to tox environments - Add MongoDB 8.3 to CI workflow matrix - Keep all MongoDB versions (4.4-8.3) for backward compatibility

- Add pymongo-version to test matrix to isolate test runs - Run one Python × MongoDB × PyMongo combination per job - Change from tox run-parallel to single tox run per job - Update cache key to include PyMongo version This prevents transaction/lock conflicts that occurred when multiple PyMongo versions ran concurrently against the same MongoDB instance. Matrix: 5 Python × 6 MongoDB × 4 PyMongo = 120 jobs

- Add job name template with proper capitalization - Change pymongo-version format from "414" to "4.14" for readability - Update tox env construction to strip dots from pymongo version Job names now show as: "test (Python 3.10, MongoDB 4.4, PyMongo 4.14)" instead of: "test (3.10, 4.4, 414)"

Adopt Python's standard terminology for timezone-aware datetimes ("aware" vs "naive") instead of "zoned" vs "unzoned". Changes: - Rename ZonedDateTimeField class to AwareDateTimeField - Update all error messages and docstrings - Rename file from zoned_datetime_field.py to aware_datetime_field.py - Update imports in mongoengine/fields/__init__.py - Update imports in mongoengine/fields/datetime/__init__.py - Rename test files and update all test references This is a breaking change for code using ZonedDateTimeField. Users should update their imports to use AwareDateTimeField.

arunsureshkumar and others added 30 commits December 27, 2025 22:43

Added: Complex test case for multiple switch_db and switch_collection…

5ad1e9c

… scenarios

refactor: fields

589ae61

refactor: public api expose import

7d95955

refactor!: remove BaseQuerySet from base module

dfd7e0a

BaseQuerySet is now defined only in the synchronous queryset implementation.

improved PipelineBuilder

d88278a

improved PipelineBuilder

339af1f

add: TestQuerysetLookupMatch

improved PipelineBuilder

16dd790

refactor(pipeline_builder): improved stages.

28fd42a

refactor(pipeline_builder): improved stages.

e2b7c70

feat: Converted to use pyproject tomal files

f21c459

feat: Updated pre-commit hooks

37c9a52

Documentation updated with async support

d6e454d

Update changelog for MongoEngine's native async PyMongo migration

9140a2b

Revamp and expand README documentation to include detailed async supp…

c34c302

…ort, updated installation steps, supported MongoDB versions, and improved examples.

Remove no_dereference from API reference documentation

68816ac

refactor(Docker): simplify local MongoDB docker compose using officia…

2fec8ee

…l image and init script.

feat(build): Migrate from Poetry to uv for better tox support, and fa…

258a145

…ster dep management

feat(test): migrate tox to uv with tox-uv runner

6376a24

fix(test): Fix UTC import in test to support python 3.10

94e3633

fix(test): Fix DB connection name to support tox parallel runners

c57e9df

fix(deps, tox): Migrate docs dependency into pyproject docs group, up…

689ae95

…date tox env deps

Merge remote-tracking branch 'origin/feat/async' into test

fa21035

# Conflicts: # docs/requirements.txt

fix(tests): Fix DB names to support tox parallel runner

122f401

fix: Lint, end of files

91b7c62

fix(tests): fix parallel race condition

613e25f

arunsureshkumar added 6 commits January 3, 2026 10:41

ci: remove MongoDB 4.2 from test matrix in GitHub Actions config

43e89f2

docs: remove MongoDB 4.2 from supported versions in README

e3f77f9

refactor(context_managers): improve consistency in docstrings, commen…

ddb4d99

…ts, and async example usage in `query_counter` and `async_query_counter`

docs: remove async-gridfs guide reference, enhance GenericReferenceFi…

4ea0e43

…eld docstring

refactor(registry): remove unused Dict and Tuple imports in `coll…

1b1307f

…ection.py`

abhinand-c and others added 23 commits January 6, 2026 16:09

fix(tests): Set read concern as local, and remove drop_collection for…

c8149d7

… reducing the flakiness of transaction

fix: enable test for filteration in pickle

49ddd20

fix(signals): send_async for _FakeSignal fallback, for async interface.

187b50e

fix(tests): Remove drop_collection for reducing the flakiness of tran…

eb31d14

…saction

Merge pull request #5 from strollby/fix/test_pickle_support_filteration

2d0acbf

Fix/test support for pickle filteration

Merge pull request #4 from strollby/feat/async-fix

9c63038

Reducing the flakiness of transaction

test: add PyMongo 4.16/4.17 and MongoDB 8.3 to test matrix

82d2cf6

- Add pymongo 4.16 and 4.17 to tox environments - Add MongoDB 8.3 to CI workflow matrix - Keep all MongoDB versions (4.4-8.3) for backward compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PyMongo Async and N+1 Dereference/select_related using Pipeline#2904

Add support for PyMongo Async and N+1 Dereference/select_related using Pipeline#2904
arunsureshkumar wants to merge 66 commits into
MongoEngine:masterfrom
strollby:feat/async

arunsureshkumar commented Jan 5, 2026

Uh oh!

abhinand-c commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arunsureshkumar commented Jan 5, 2026

Notes

Uh oh!

abhinand-c commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants