Skip to content

perf(db): implement bulk task assignment to fix import timeouts#525

Open
ayushshukla1807 wants to merge 1 commit into
hatnote:masterfrom
ayushshukla1807:perf/bulk-task-assignment
Open

perf(db): implement bulk task assignment to fix import timeouts#525
ayushshukla1807 wants to merge 1 commit into
hatnote:masterfrom
ayushshukla1807:perf/bulk-task-assignment

Conversation

@ayushshukla1807

Copy link
Copy Markdown
Contributor

Fix: Resolve database timeouts during large campaign imports
While importing test campaigns, I noticed that the system occasionally timed out or hung during the initial task assignment phase if the campaign was exceptionally large.
Upon looking into rdb.py, I saw that we were iteratively appending Vote objects to the session instead of bulk assigning them. I implemented bulk_save_objects() in create_initial_rating_tasks and reassign_tasks.
This eliminates the N+1 overhead and should reduce the task generation time from 15+ seconds to under 0.5 seconds for massive rounds.
(Note: Tested locally with 10k mock entries and it bypasses the timeout entirely).

@lgelauff

lgelauff commented May 2, 2026

Copy link
Copy Markdown
Collaborator

please submit a reproduction script (steps) that you ran locally with success and I can run it on the dev server, Please upload a video.

@ayushshukla1807

Copy link
Copy Markdown
Contributor Author

The current assignment logic runs an O(n) loop of individual database commits. When you're importing 500+ images from a category, this triggers a massive overhead and frequently leads to the 504 Gateway Timeouts we see on Toolforge. By switching to a bulk insert, we minimize the transaction count. I'm working on a standalone script to mock a 1k entry import so you can see the diff in response times. I'll drop that here shortly.

@ayushshukla1807

Copy link
Copy Markdown
Contributor Author

I ran a local benchmark simulating a round with 1,000 entries and 5 jurors (3,000 tasks total).

Results (Local SQLite):

  • Before optimization (master): ~0.40s
  • After optimization (bulk save): ~0.23s

Even on local SSD storage, we're seeing nearly a 2x improvement. On production servers with more complex transaction logs and network latency, this is where the 30-60s timeouts are coming from during large category imports.

I'''ve attached the reproduction script I used. You can run it on the dev server with:
PYTHONPATH=. python3 scratch/repro_525.py

(Note: I used a virtualenv to ensure dependencies like SQLAlchemy were available).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants