fix: drop stale ReadyForQuery expectation when server enters COPY IN mode#6
fix: drop stale ReadyForQuery expectation when server enters COPY IN mode#6NikolayS wants to merge 1 commit into
Conversation
…mode
When a client sends Bind+Execute+Sync for a COPY FROM STDIN statement,
pgdog adds a ReadyForQuery expectation for that Sync. But PostgreSQL
ignores Sync during COPY IN mode (protocol spec §55.2.6) and never sends
ReadyForQuery for it. The stale entry stays in the queue, done() never
returns true, and the connection is never returned to the pool.
Call remove_one_rfq() in forward() when we see CopyInResponse ('G') to
drop the ReadyForQuery that will never arrive.
Verified with end-to-end integration test using tokio-postgres copy_in():
- WITHOUT fix: query timeout - CopyDone hangs because state machine is desynced
- WITH fix: COPY completes, subsequent queries work normally
https://claude.ai/code/session_01PQvrbw2xJHgQBXtASWHFcv
98dfd00 to
dae4519
Compare
End-to-end demo: step by stepThe test
// Connect to pgdog (port 6432), not directly to PostgreSQL
let (conn, connection) = tokio_postgres::connect(
"host=127.0.0.1 user=pgdog dbname=pgdog password=pgdog port=6432",
NoTls,
).await.unwrap();
// Create table
conn.batch_execute(
"DROP TABLE IF EXISTS _copy_test;
CREATE TABLE _copy_test (id BIGINT, value TEXT);",
).await.unwrap();
// COPY FROM STDIN — tokio-postgres sends this via extended protocol
// (Parse, Bind, Execute, Sync), not simple query (Q)
let sink = conn
.copy_in("COPY _copy_test (id, value) FROM STDIN")
.await.unwrap();
// Send 10 rows
let mut buf = BytesMut::new();
for i in 0..10_i64 {
buf.put_slice(format!("{}\trow_{}\n", i, i).as_bytes());
}
futures_util::pin_mut!(sink);
sink.send(buf.freeze()).await.unwrap();
let rows_copied = sink.finish().await.unwrap(); // ← fails WITHOUT fix
assert_eq!(rows_copied, 10);
// Query after COPY — proves the connection is still usable
let rows = conn.query("SELECT count(*) FROM _copy_test", &[]).await.unwrap();
let count: i64 = rows[0].get(0);
assert_eq!(count, 10);What tokio-postgres sends on the wireWhen PostgreSQL enters COPY IN mode after sending What happens inside pgdog WITHOUT the fixpgdog's Then That stale ReadyForQuery will never be satisfied because PostgreSQL already ignored the first Sync. Result WITHOUT fixThe fix (one line)In 'G' => {
self.state.prepend('G');
self.state.remove_one_rfq(); // ← the fix
}Result WITH fixCOPY completes, subsequent SELECT returns 10 rows, connection is clean. Generated by Claude Code |
Driver investigation: which PostgreSQL clients trigger this bug?We tested four popular drivers against pgdog without the fix, running the same pattern: COPY FROM STDIN, then a subsequent SELECT query. Results
Only tokio-postgres triggers the bug. All other drivers use simple query protocol ( Why only tokio-postgres?tokio-postgres's All other drivers use simple query:
Drivers not yet tested
The key question for any driverDoes it send Sync before the COPY data flow begins? If yes, PostgreSQL ignores that Sync during COPY IN mode, and pgdog's state machine will have a stale ReadyForQuery expectation that never gets fulfilled. Lev's upstream fixLev independently confirmed the bug and created pgdogdev#886. His approach is more defensive — instead of removing one ReadyForQuery, he uses Generated by Claude Code |
Comprehensive driver testing: which clients trigger the COPY desync bug?We tested/researched PostgreSQL drivers across all major languages. The question for each: does it send Verified end-to-end against pgdog WITHOUT the fix
All three extended-protocol drivers fail through pgdog, each with a different symptom of the same root cause (stale ReadyForQuery). All simple-query drivers pass fine. Full driver survey (tested + researched)Triggers the bug (Extended protocol with Sync for COPY):
Does NOT trigger the bug (Simple query for COPY):
ORMs that depend on underlying driver: Drizzle (postgres.js OR pg), Kysely (pg default), Slonik (pg), SQLAlchemy (psycopg2/psycopg3/asyncpg/pg8000), TypeORM (pg), MikroORM (configurable). Impact6 drivers across Rust, TypeScript/JavaScript, Python, Elixir, and Swift use extended protocol for COPY and would trigger this bug. This includes the primary/default drivers for:
This is not a niche tokio-postgres-only issue. Generated by Claude Code |
Bug
When a client sends a COPY FROM STDIN statement via extended query protocol (Parse+Bind+Execute+Sync), pgdog adds a
ReadyForQueryexpectation to its protocol state queue for that Sync. But PostgreSQL ignores Sync during COPY IN mode (protocol spec §55.2.6) and never sendsReadyForQueryfor it. The stale entry stays in the queue,done()never returnstrue, and the connection is never returned to the pool.tokio-postgres(the most popular Rust PostgreSQL client) uses exactly this pattern — it sends COPY via extended protocol: Parse+Bind+Execute+Sync, then CopyData..., then CopyDone+Sync. PostgreSQL ignores the first Sync (it's already in COPY IN mode by then), producing only oneReadyForQueryinstead of the two that pgdog expects.Exact message sequence that causes desync
Consequences
done()never returnstruequery.rs:282-289)rollback()fails withRollbackFailedVerified end-to-end
Integration test using
tokio-postgres::copy_in()through pgdog (integration/rust/tests/tokio_postgres/copy.rs):FATAL: query timeoutatsink.finish()— pgdog's state machine is desynced and can't complete the COPYSELECT count(*)returns correct results (passes in 0.09s)Fix
When
forward()receivesCopyInResponse('G'), callremove_one_rfq()to drop theReadyForQuerythat will never arrive. This makes the proxy resilient to clients that send Sync with the initial Parse+Bind+Execute for COPY statements.Note: pgdog already handles COPY via extended protocol in
prepared_statements.rs—CopyDone,CopyFail,CopyDatainhandle()(lines 180-188) andCopyInResponse('G') inforward()(line 229). The fix adds one call to the existing'G'handler.Tests
test_copy_in_with_client_double_sync— exercises the full sequence throughPreparedStatements::forward()(the real code path), asserts clean state after the COPY cycle completestest_copy_in_extended_protocol— end-to-end test usingtokio-postgres::copy_in()through pgdog, verifying both COPY completion and subsequent query successhttps://claude.ai/code/session_01PQvrbw2xJHgQBXtASWHFcv