Skip to content

feat: Add replicator connect_to support for outbound requests.#5983

Merged
willholley merged 1 commit into
mainfrom
wh/connect_to
May 13, 2026
Merged

feat: Add replicator connect_to support for outbound requests.#5983
willholley merged 1 commit into
mainfrom
wh/connect_to

Conversation

@willholley
Copy link
Copy Markdown
Member

@willholley willholley commented Apr 27, 2026

Overview

This adds a feature to the CouchDB replicator to override the DNS target for specific host
patterns (including wildcards) when making outbound requests. This has the same semantics as curl's [--connect-to](https://curl.se/docs/manpage.html#--connect-to option:

For a request intended for the "HOST1:PORT1" pair, connect to "HOST2:PORT2" instead. This option is only used to establish the network connection. It does NOT affect the hostname/port number that is used for TLS/SSL (e.g. SNI, certificate verification) or for the application protocols.

There is a new configuration option to specify the overrides:

[replicator]
connect_to = host:target, host2:target

The replicator resolves the configured host patterns to the alternative connection targets while
preserving the request URL host (applies to regular requests and session-auth requests).

Note this depends on the connect_to option in ibrowse.

Testing recommendations

Testing this is with TLS a bit involved as it relies on setting up an SNI proxy. I did it using nginx in docker with the configuration attached to proxy to a cloudant.com database. The proxy was running on a non-standard port (e.g. 8443) so that any replications connecting directly to cloudant.com would fail.

I then set connect_to = *.cloudant.com:443:127.0.0.1:8443 in default.ini and configured a replication from myaccount.cloudant.com/mydb. The test succeeds if the proxy logged the connection and the replication completed.

The feature also works without TLS - you can just use it to direct an arbitrary hostname to your local couchdb, for instance. e.g. if you use connect_to = *.cloudant.com:80:127.0.0.1:15984 and couchdb is running on 127.0.0.1:15984, set up a replication with source or target as http://foo.cloudant.com/db1 and it will be redirected to 127.0.0.1:15984/db1.

nginx.conf.zip

Related Issues or Pull Requests

Checklist

  • This is my own work, I did not use AI, LLM's or similar technology
  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
@willholley willholley force-pushed the wh/connect_to branch 3 times, most recently from 6d51f89 to 13796f6 Compare April 28, 2026 20:37
Comment thread src/couch_replicator/test/eunit/couch_replicator_dns_tests.erl Outdated
Copy link
Copy Markdown
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice! I didn't get to play with it locally just did a quick look-over with some comments first

Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_auth_session.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_httpc.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_httpc.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_auth_session.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_httpc.erl Outdated
@willholley willholley marked this pull request as ready for review May 5, 2026 16:27
Copy link
Copy Markdown
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great, almost there but noticed a few more things about setting SNI and ipv6 handling

Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/docs/src/config/replicator.rst Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
Comment thread src/couch_replicator/src/couch_replicator_dns.erl Outdated
@willholley willholley force-pushed the wh/connect_to branch 2 times, most recently from a13803a to 64b42db Compare May 7, 2026 08:06
@willholley willholley changed the title feat: Add replicator DNS override support for outbound requests. feat: Add replicator connect_to support for outbound requests. May 8, 2026
@willholley willholley force-pushed the wh/connect_to branch 5 times, most recently from 71cd112 to 6373f89 Compare May 11, 2026 12:28
PatternPort = binary_to_integer(PatternPortBin),
TargetPort = binary_to_integer(TargetPortBin),
% Strip brackets from IPv6 addresses in targets
TargetHost = string:trim(TargetHost0, both, "[]"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we strip [] bracket from an ipv6 address tcp, when we pass an ip like "::1" to gen_tcp|ssl:connect/3 I think it will still fail, it can only handle a tuple

(node1@127.0.0.1)15> ssl:connect("[2607:f8b0:4023:100d::8b]", 443, [{verify, verify_none}]).
{error,nxdomain}

(node1@127.0.0.1)16> ssl:connect("2607:f8b0:4023:100d::8b", 443, [{verify, verify_none}]).
{error,nxdomain}

> ssl:connect({9735,63664,16419,4109,0,0,0,139}, 443, [{verify, verify_none}]).
{ok,{sslsocket,{gen_tcp,#Port<0.28>,tls_connection, undefined}, [<0.57527.0>,<0.57526.0>]}}

(address is from dig AAAA google.com)

Wonder if we then take the TargetHost and parse with a helper as:

parse_target(Bin) ->
case inet:parse_strict_address(binary_to_list(Bin)) of
    {ok, Tuple} -> {ok, Tuple}; 
    _ -> {ok, Bin};
end

Then connect_to can can either the get a string or a tuple.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - good shout. I've update the branch to strictly parse the TargetHost address and, if valid, store it as a tuple which gets passed through to connect_to. In other cases, it assumes the TargetHost is a hostname and passes it through as a string.

Entry = string:trim(Entry0),
% Regex: HOST:PORT:TARGET:TARGET_PORT where TARGET can be [IPv6]
% Reject IPv6 patterns (starting with [), ensure non-empty captures
Pattern = "^([^:\\[]+):([0-9]+):([^:]+|\\[[^\\]]+\\]):([0-9]+)$",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a gnarly expression but I couldn't come up with anything smaller, after staring at it for a while it does seems right

% Regex: HOST:PORT:TARGET:TARGET_PORT where TARGET can be [IPv6]
% Reject IPv6 patterns (starting with [), ensure non-empty captures
Pattern = "^([^:\\[]+):([0-9]+):([^:]+|\\[[^\\]]+\\]):([0-9]+)$",
case re:run(Entry, Pattern, [{capture, all_but_first, binary}]) of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_but_first

Hadn't seen that used before, that's neat:

All but the first matching subpattern, that is, all explicitly captured subpatterns, but not the complete matching part of the subject string. This is useful if the regular expression as a whole matches a large part of the subject, but the part you are interested in is in an explicitly captured subpattern.

Copy link
Copy Markdown
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 nicely done, Will!

Don't forget to squash the fixup commits.

This adds a feature to the CouchDB replicator
to override the connection target for specific host
patterns (including wildcards) when making
outbound requests. This is similar to the
`--connect-to` option in curl.

One use case is when requests need
to be routed via a transparent SNI proxy e.g.
for network egress monitoring and specifying
overrides in /etc/hosts or similar isn't suffient
/ possible (e.g. due to lack of wildcard support).

There is adds a new configuration option to
specify the overrides:

```
[replicator]
connect_to = patternhost:port:target:targetport,..
```

The replicator resolves the configured host patterns
to the alternative connection targets while
preserving the request URL host (applies to
regular requests and session-auth requests)
and rewriting the port as necessary.

If using https, the SNI header is added for the
original Hostname.

The `pattern` can be a hostname, including leading
wildcards e.g. `*.example.com`. Targets must be
IP addresses. IPv6 addresses are supported using
bracketed notation e.g. `[2001:db8::1]`.
@willholley willholley merged commit a6c5828 into main May 13, 2026
59 checks passed
@willholley willholley deleted the wh/connect_to branch May 13, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants