perf(db): implement bulk task assignment to fix import timeouts#525
perf(db): implement bulk task assignment to fix import timeouts#525ayushshukla1807 wants to merge 1 commit into
Conversation
|
please submit a reproduction script (steps) that you ran locally with success and I can run it on the dev server, Please upload a video. |
|
The current assignment logic runs an O(n) loop of individual database commits. When you're importing 500+ images from a category, this triggers a massive overhead and frequently leads to the 504 Gateway Timeouts we see on Toolforge. By switching to a bulk insert, we minimize the transaction count. I'm working on a standalone script to mock a 1k entry import so you can see the diff in response times. I'll drop that here shortly. |
|
I ran a local benchmark simulating a round with 1,000 entries and 5 jurors (3,000 tasks total). Results (Local SQLite):
Even on local SSD storage, we're seeing nearly a 2x improvement. On production servers with more complex transaction logs and network latency, this is where the 30-60s timeouts are coming from during large category imports. I'''ve attached the reproduction script I used. You can run it on the dev server with: (Note: I used a virtualenv to ensure dependencies like SQLAlchemy were available). |
Fix: Resolve database timeouts during large campaign imports
While importing test campaigns, I noticed that the system occasionally timed out or hung during the initial task assignment phase if the campaign was exceptionally large.
Upon looking into
rdb.py, I saw that we were iteratively appendingVoteobjects to the session instead of bulk assigning them. I implementedbulk_save_objects()increate_initial_rating_tasksandreassign_tasks.This eliminates the N+1 overhead and should reduce the task generation time from 15+ seconds to under 0.5 seconds for massive rounds.
(Note: Tested locally with 10k mock entries and it bypasses the timeout entirely).