Skip to content

ts_netstack: poor TCP accept performance #28

@dylan-tailscale

Description

@dylan-tailscale

Under load, our TCP listener implementation accepts as low as ~8% of connection attempts (as in, replies to a SYN with a SYN/ACK); up to 92% of connection attempts are rejected by our netstack with a RST/ACK back to the client.

Those numbers come from a Python script I ran against the axum example, with 10 processes each attempting a full connect/GET /index.html/close cycle 10,000 times, with no sleeps between successive attempts. Dropping the concurrency to 2 processes attempting 10 cycles each, with a one-second sleep between each cycle, only raised the accept rate to ~50% at most. I've confirmed the RST/ACK responses via packet captures. Once the connection is established, there don't seem to be any issues related to the HTTP GET request, response, or close parts of the cycle; the issue seems to only be with the TCP 3-way handshake.

In Chrome/Chromium, this becomes visible to users visiting a tailscale-rs-powered HTTP server. Chrome races multiple TCP connects (three in my packet captures) to the HTTP server on a new connection/refresh, and will display "Connection Refused" to the user if any one of these TCP connects is rejected with a RST/ACK, until one of the other two connect attempts is successful. When a connect succeeds and Chrome gets an HTTP response, it replaces the "Connection Refused" error with the contents of the page, but it's a poor user experience to see "Connection Refused" even briefly prior to the actual page contents being displayed. Firefox seems to race only two connect attempts, and only displays "Connection Refused" if all of the connect attempts fails, so the problem isn't visible in Firefox - but is still present.

Originally reported by @apenwarr using tailscale-rs with the Python bindings, and now reproducible with the axum example via the test script.

Requirements:

  • Our accept rate is "reasonable"; 100% accept rate under heavy load is obviously preferable, but open to discussion of what a reasonable accept rate is.
    • RST/ACK responses are reduced to match the "reasonable" accept rate, compared via packet captures.
  • We no longer see "Connection Refused" displayed temporarily in Chrome/Chromium for requests to HTTP servers built with tailscale-rs.

Metadata

Metadata

Labels

bugSomething isn't workingperformanceImprove throughput, latency, resource usage, etc

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions