Skip to content

Retry transient transaction failures in the Mongo adapter#889

Merged
abnegate merged 1 commit into
mainfrom
fix-mongo-transaction-write-conflict-retry
Jun 12, 2026
Merged

Retry transient transaction failures in the Mongo adapter#889
abnegate merged 1 commit into
mainfrom
fix-mongo-transaction-write-conflict-retry

Conversation

@ChiragAgg5k

@ChiragAgg5k ChiragAgg5k commented Jun 12, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Gives Mongo::withTransaction the same retry-with-backoff behavior every other adapter already inherits from Adapter::withTransaction: up to 2 retries with 50ms/100ms backoff on transient failures.

Mongo is the only adapter that overrides withTransaction without reproducing the base retry loop, so transient MongoDB errors — most notably E112 WriteConflict, which MongoDB documents as retryable ("Please retry your operation or multi-document transaction") — surface immediately as 500s instead of being retried.

Non-transient typed exceptions (Duplicate, Restricted, Authorization, Relationship, Conflict, Limit) still throw immediately without retrying, matching the base adapter's exclusion list. The existing rollback/session-cleanup semantics are preserved as-is and now run before each retry, so every attempt starts with a fresh session.

Test Plan

  • composer lint passes.
  • Verified the retry path against the production failure below: the conflicting write is a one-shot update, so the first retry succeeds.

Related PRs and Issues

Root cause of the flaky testInvalidSSRSource Sites E2E failures on appwrite/appwrite MongoDB CI jobs (e.g. https://github.com/appwrite/appwrite/actions/runs/27395946536/job/80963312794). The builds worker updates the site document's latestDeployment* attributes at the same time the test's cleanup issues DELETE /v1/sites/:siteId; the delete transaction hits E112 WriteConflict and bubbles up as 500 general_unknown. SQL adapters don't flake because InnoDB row locks make the delete wait, and because their inherited withTransaction retries.

Checklist

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced database transaction reliability with automatic retry capability for transient failures
    • Improved error handling and recovery for specific database exception scenarios
    • Strengthened transaction state cleanup and resource management

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 67d7cdf6-1d00-4575-8e02-9ff768e30a5c

📥 Commits

Reviewing files that changed from the base of the PR and between 5608004 and 298837f.

📒 Files selected for processing (1)
  • src/Database/Adapter/Mongo.php

📝 Walkthrough

Walkthrough

Mongo.php now imports specific exception types (Authorization, Conflict, Limit, Relationship, Restricted) and refactors withTransaction() to retry failed transactions with a fixed sleep backoff, clean up session/transaction state on each attempt, immediately rethrow non-retryable exceptions, and throw TransactionException after exhausting retries.

Changes

MongoDB Transaction Resilience

Layer / File(s) Summary
Exception imports for transaction error handling
src/Database/Adapter/Mongo.php
Adds use imports for Authorization, Conflict, Limit, Relationship, and Restricted exception classes to support granular transaction error classification.
Transaction retry logic with state cleanup
src/Database/Adapter/Mongo.php
withTransaction() is refactored from a single try/catch to a retry-capable wrapper that loops with a fixed count, sleeps between attempts, rolls back and cleans up Mongo session/transaction state in a finally block, rethrows immediately for duplicate/authorization/restricted/relationship/conflict/limit exceptions, and throws TransactionException after retry budget exhaustion.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • utopia-php/database#677: Both PRs refactor withTransaction() with granular exception imports and add retry/bail-out logic with proper transaction/session state cleanup.
  • utopia-php/database#817: Both PRs modify withTransaction() in Mongo.php—one adding retry/exception rethrow logic, the other adjusting transaction state handling.

Suggested reviewers

  • abnegate

Poem

🐰 A rabbit hops through MongoDB's gates,
With retries swift and cleaned-up states,
When conflicts rise or limits bind,
No retry loops—just rethrow kind,
Clean sessions leave no trace behind! 🌾

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding retry logic for transient transaction failures in the Mongo adapter, which directly matches the core objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-mongo-transaction-write-conflict-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR gives Mongo::withTransaction the same retry-with-backoff loop that the base Adapter::withTransaction already provides, fixing transient E112 WriteConflict failures that were surfacing as 500s instead of being silently retried.

  • The retry loop (up to 2 retries, 50 ms/100 ms backoff) mirrors the base adapter exactly — same attempt count, same $sleep * ($attempts + 1) formula, same non-retryable exception exclusion list (Duplicate, Restricted, Authorization, Relationship, Conflict, Limit).
  • Session/state cleanup (endSessions, $this->inTransaction = 0, $this->session = null) is performed inside a finally block on every failed attempt so each retry starts with a fresh session — verified against startTransaction()'s if (!$this->session) guard.
  • The unreachable throw new TransactionException(...) after the loop is a safe dead-code sentinel matching the base adapter.

Confidence Score: 5/5

Safe to merge — the change is a faithful port of the base adapter's retry loop into the Mongo override, with no altered semantics beyond adding the retry path.

The retry loop, backoff math, non-retryable exception list, and per-attempt state cleanup are all verified to match the base Adapter::withTransaction. Session cleanup via the finally block correctly nulls $this->session before startTransaction() is called on the next attempt, and rollbackTransaction() already ends the session internally in all paths, so the if ($this->session) guard in withTransaction's finally avoids any double-end. No logic divergence from the base adapter was found.

No files require special attention.

Important Files Changed

Filename Overview
src/Database/Adapter/Mongo.php Adds retry-with-backoff loop to withTransaction, mirroring base Adapter::withTransaction; non-retryable exception list, backoff math, and session/state cleanup all match the base adapter correctly.

Reviews (2): Last reviewed commit: "(fix): retry transient transaction failu..." | Re-trigger Greptile

Comment thread src/Database/Adapter/Mongo.php
Comment thread src/Database/Adapter/Mongo.php
@ChiragAgg5k

Copy link
Copy Markdown
Member Author

Load test results (local appwrite/appwrite stack, MongoDB 8.2.5 single-node replica set)

Validated this branch inside the full Appwrite stack (appwrite/appwrite#12587 runs it through CI in parallel).

Hammer test — for each iteration, create a document then fire 2× PATCH + 1× DELETE on it simultaneously via curl_multi, forcing concurrent Mongo transactions onto the same document:

Racing requests 5xx responses Rate
5.9.0 (no retry) 1,200 127 10.6%
This branch 1,200 0 0%
This branch (extended) 3,000 2 0.07%

Every 5xx on 5.9.0 was the production signature: E112 WriteConflict / Transaction aborted surfacing as 500 general_unknown. The 2 residual errors in the extended run each took ~170ms — all 3 attempts (0/50ms/100ms backoff) exhausted under sustained 3-way contention, matching the base adapter's retry semantics and far beyond the real-world contention pattern (worker write vs. one delete).

Scenario-faithful test — reproduction of the flaky testInvalidSSRSource flow: failing site deployment, poll at 50ms until status=failed, immediately DELETE /v1/sites/:siteId while the builds worker is still writing latestDeployment* to the site document:

Iterations DELETE → 204 5xx
This branch 62 62 0

The previously flaky Tests / E2E / MongoDB (dedicated) / Sites CI job also passes on appwrite/appwrite#12587.

@ChiragAgg5k ChiragAgg5k marked this pull request as ready for review June 12, 2026 06:13
@abnegate abnegate merged commit 6989524 into main Jun 12, 2026
22 checks passed
@abnegate abnegate deleted the fix-mongo-transaction-write-conflict-retry branch June 12, 2026 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants