Add support for PyMongo Async and N+1 Dereference/select_related using Pipeline#2904
Draft
arunsureshkumar wants to merge 66 commits into
Draft
Add support for PyMongo Async and N+1 Dereference/select_related using Pipeline#2904arunsureshkumar wants to merge 66 commits into
arunsureshkumar wants to merge 66 commits into
Conversation
- Refactored the core ORM to support PyMongo’s native async API - Unified sync and async code paths across documents, querysets, and transactions - Replaced legacy async implementations - Removed deprecated and compatibility code BREAKING CHANGE: - Removed legacy async behavior - Removed LazyReferenceField - Removed GenericLazyReferenceField - GenericReferenceField now requires `choices` - Dropped support for PyMongo < 4.14 - Dropped support for MongoDB < 4.2
BaseQuerySet is now defined only in the synchronous queryset implementation.
add: TestQuerysetLookupMatch
…uilder stages - Extract query normalization, match planning, lookup planning, stage building, and tail stages - Introduce clear aggregation pipeline architecture aligned with MongoDB stages - Reduce PipelineBuilder to a small orchestration layer - Improve readability, isolation, and long-term maintainability
…ilter-only lookups Body: • refactor StageBuilder traversal/handlers for readability • keep $lookup unfiltered for correct hydration; apply foreign predicates via local $filter • emit explicit _missing_reference markers so deref raises DoesNotExist • reduce duplication with shared helpers and structured dispatch
- Normalize refIds generation across scalar and container fields
- Use reduce-based flattening for ListField reference lookups
- Ensure missing references emit `{_missing_reference, _ref}` only
- Fix select_related pipelines to match MongoDB 4.2+ semantics
- Expand pipeline builder tests for nested and container references
…ort, updated installation steps, supported MongoDB versions, and improved examples.
…l image and init script.
…ster dep management
…date tox env deps
# Conflicts: # docs/requirements.txt
…project - Use uv for dependency management and builds - Simplify GitHub Actions matrix and MongoDB setup - Replace custom CI scripts with uv-based dependency management, actions - Match tox environment with GitHub Action matrix
…registry cleanup - Update assertions to use `await` where needed for async compatibility. - Introduce `_DocumentRegistry.clear()` and `_CollectionRegistry.clear()` calls in test setups and teardowns. - Normalize test workflows to ensure proper registry state management across synchronous and asynchronous environments. - Simplify pipeline builder tests by removing unnecessary async and ensuring compatibility with recent updates.
…ts, and async example usage in `query_counter` and `async_query_counter`
… reducing the flakiness of transaction
Fix/test support for pickle filteration
Reducing the flakiness of transaction
…ression When a filter condition targets a reference/list-of-reference field (e.g. articles__headline='Hello') and select_related is used, the $addFields hydration stage now applies a $filter on the docs_alias array instead of using the full unfiltered result. This ensures the hydrated field contains only the matching documents rather than all fetched documents. For ListField references the filtered array is assigned directly; for scalar ReferenceField the first matching element is extracted via $arrayElemAt (or null if none match). Also remove unused `Union` import from synchronous queryset.
When _walk_lookups recursed into a ListField(ReferenceField) subtree
(e.g. before_child -> parent), it passed embedded_list_path=None,
causing the nested ref to use _add_structured_ref_lookup with a dotted
path over an array. This generated $indexOfArray(ids, array_of_ids)
which always returned -1, writing {_missing_reference: True} to every
element even when the referenced document existed.
Fix: pass embedded_list_path=full_path so the recursive call uses
_add_embedded_list_structured_ref_lookup, which correctly uses $map
to update each array element's field individually.
…= 5.0
- StageBuilder now builds the joined-docs hash {id_str: doc} once in an outer
$let and uses $getField for O(1) lookup per ref leaf. Cuts hydration cost
for List/Map/Dict of ReferenceField from O(N*M) to O(N+M) against large
joined collections. Falls back to legacy $indexOfArray when MongoDB < 5.0.
- PipelineBuilder accepts mongo_version; all 4 caller sites resolve the
queryset's effective alias (matching _get_collection's logic, with
using(None,None) guard) so multi-cluster setups probe the correct cluster.
- get_mongodb_version/async_get_mongodb_version now accept an alias and
cache per-alias to avoid a server_info() roundtrip on every aggregation.
Disconnect clears only the disconnected alias's entry.
Cleanup bundled in:
- Consolidate is_list_of_embedded / embedded_doc_type into Schema; drop the
duplicated copies in StageBuilder and utils.py.
- Delegate LookupPlanner._get_field_by_db_part and MatchPlanner's local
field-lookup closure to Schema.resolve_field_name.
- Remove the verbatim duplicate of needs_aggregation in pipeline_builder.py
(utils.py is the canonical home, exported via __init__).
- Fix __init__.py NameError from referencing modules after star-import.
- Drop dead `ids` $let variable in _build_value_expr that computed an unused
$map on every hydrated document.
- Merge the two sequential `if isinstance(field, DictField)` blocks in
_walk_lookups into one explicit dispatch.
- Expand pipeline_builder/README.md with the missing schema.py / utils.py,
data flow, both build paths, and design invariants.
- Enable ruff-format hook in .pre-commit-config.yaml (ruff-check stays disabled until the ~6,700-error backlog — mostly F403/F405 from `import *` in tests and E501 — is triaged separately). - Run `ruff format` once across the codebase to establish a clean baseline. - Run `ruff check --select F401 --fix` to drop 32 unused imports. - Trailing-newline fix in pipeline_builder/README.md from end-of-file-fixer. No functional changes — purely whitespace, formatting, and dead-import cleanup.
Split the 2,559-line fields.py monolith into a logical folder hierarchy: - string/ - StringField, URLField, EmailField (individual files) - numeric/ - IntField, FloatField, DecimalField, Decimal128Field - datetime/ - DateTimeField, DateField, ComplexDateTimeField - complex/ - ListField, DictField, MapField (renamed from container/) - document/ - EmbeddedDocumentField, GenericEmbeddedDocumentField, DynamicField - reference/ - ReferenceField, GenericReferenceField - file/ - BinaryField, FileField, ImageField, GridFSProxy - geo/ - GeoPointField + 6 GeoJSON types (individual files) - boolean.py, enum.py, uuid.py, sequence.py (single-file modules) - exceptions.py - GridFSError, ImproperlyConfigured All imports remain backward compatible via fields/__init__.py re-exports. Tests pass: 597 field tests (299 sync + 298 async).
Split the 689-line base/fields.py into individual class files: - base_field.py - BaseField (260 lines, core field descriptor) - complex_base_field.py - ComplexBaseField (217 lines, for lists/dicts) - object_id_field.py - ObjectIdField (31 lines, ObjectId wrapper) - geo_json_base_field.py - GeoJsonBaseField (159 lines, GeoJSON validation) All imports remain backward compatible via base/fields/__init__.py. Tests pass: 198 base field tests (99 sync + 99 async).
Split the 516-line base/datastructures.py into individual class files: - helpers.py - mark_as_changed_wrapper decorators (26 lines) - base_dict.py - BaseDict (74 lines, change-tracking dict) - base_list.py - BaseList (106 lines, change-tracking list) - embedded_document_list.py - EmbeddedDocumentList (173 lines, queryable embedded doc list) - strict_dict.py - StrictDict (85 lines, slot-based efficient dict) - lazy_reference.py - LazyReference (70 lines, deferred document loading) All imports remain backward compatible via base/datastructures/__init__.py. Tests pass: 60 dereference tests + 31 embedded document list tests.
Add ZonedDateTimeField that stores both UTC time and timezone name,
enabling accurate time comparisons while preserving the original timezone
for frontend display.
Storage format:
- MongoDB: {"utc": ISODate(...), "tz": "Asia/Kolkata"}
- Python: timezone-aware datetime in original timezone
Key features:
- DST-safe: stores timezone name (e.g., "America/New_York"), not offset
- Query support: start_time__utc__gte for time queries, start_time__tz for timezone queries
- Automatic index expansion: 'start_time' → 'start_time.utc' in meta.indexes
- Works with both sync and async APIs
Tests cover storage, retrieval, DST handling, querying, ordering, and indexing.
- Add ZonedDateTimeField to API reference - Add ZonedDateTimeField to field list in defining-documents guide - Fix Sphinx 9.x incompatibility with readthedocs_ext (only load on ReadTheDocs) - Update dependencies to latest versions with environment markers for Sphinx - Sphinx 8.1.3 for Python 3.10-3.11, 9.1.0 for Python 3.12+ - ruff 0.15, pre-commit 4.6, pytest 9.0.3, coverage 7.14, pillow 12.2 - tox 4.54, tox-uv 1.35.2, uv_build 0.11.16
MongoDB 5.0-7.0 support $getField but require the 'field' parameter to be
a constant, not a variable expression. The pipeline builder was using
{"$getField": {"field": {"$toString": "$$rid"}, ...}} which works on
MongoDB 4.4 (lenient) and 8.0+ (relaxed), but fails on 5.0-7.0 with
error 5654601: "$getField requires 'field' to evaluate to a constant".
Changed the version check to only enable $getField optimization on
MongoDB >= 8.0, falling back to the legacy $indexOfArray approach on
earlier versions.
This trades O(1) lookup performance for compatibility on 5.0-7.0, while
MongoDB 8.0+ still gets the optimized path.
Tested on MongoDB 5.0.31, 6.0.28, 7.0.34 - all tests pass.
The tox -a command outputs environment names separated by newlines, but tox -e expects comma-separated values. This was causing CI failures with: 'provided environments not found in configuration file'. Added 'tr "\n" "," | sed "s/,$//"' to convert the newline-separated list to comma-separated format that tox expects.
The 'readthedocs' builder doesn't exist in Sphinx. Changed the html-readthedocs target to use the standard 'html' builder with the -T -E flags for strict checking and fresh build. This fixes the CI build_doc_dryrun job that was failing with: 'Builder name readthedocs not registered or available through entry point'
Split the build-n-publish job into two separate jobs: 1. build-release: Builds wheel and sdist, uploads as artifacts 2. publish-to-pypi: Downloads artifacts and publishes to PyPI Benefits: - Better separation of concerns - Build artifacts can be verified before publishing - Failed publish doesn't require rebuilding - Follows GitHub Actions best practices for release workflows The publish job depends on build-release and both only run on tag creation (refs/tags/v*).
- Add pymongo 4.16 and 4.17 to tox environments - Add MongoDB 8.3 to CI workflow matrix - Keep all MongoDB versions (4.4-8.3) for backward compatibility
- Add pymongo-version to test matrix to isolate test runs - Run one Python × MongoDB × PyMongo combination per job - Change from tox run-parallel to single tox run per job - Update cache key to include PyMongo version This prevents transaction/lock conflicts that occurred when multiple PyMongo versions ran concurrently against the same MongoDB instance. Matrix: 5 Python × 6 MongoDB × 4 PyMongo = 120 jobs
- Add job name template with proper capitalization - Change pymongo-version format from "414" to "4.14" for readability - Update tox env construction to strip dots from pymongo version Job names now show as: "test (Python 3.10, MongoDB 4.4, PyMongo 4.14)" instead of: "test (3.10, 4.4, 414)"
Adopt Python's standard terminology for timezone-aware datetimes
("aware" vs "naive") instead of "zoned" vs "unzoned".
Changes:
- Rename ZonedDateTimeField class to AwareDateTimeField
- Update all error messages and docstrings
- Rename file from zoned_datetime_field.py to aware_datetime_field.py
- Update imports in mongoengine/fields/__init__.py
- Update imports in mongoengine/fields/datetime/__init__.py
- Rename test files and update all test references
This is a breaking change for code using ZonedDateTimeField.
Users should update their imports to use AwareDateTimeField.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contributes to #2902 , which tracks the ongoing effort to add native async support to MongoEngine using PyMongo’s official async API (>= 4.14).
It includes foundational changes toward async-first workflows, improvements to
select_relatedvia aggregation pipelines, and related updates across core internals, tests, CI, and documentation.Notes
📌 Details, motivation, and full scope are documented in the issue: #2902
🚧 Work is still in progress — feedback is welcome.