feat: Add replicator connect_to support for outbound requests.#5983
Conversation
6d51f89 to
13796f6
Compare
nickva
left a comment
There was a problem hiding this comment.
This is very nice! I didn't get to play with it locally just did a quick look-over with some comments first
da53e78 to
503825c
Compare
nickva
left a comment
There was a problem hiding this comment.
It looks great, almost there but noticed a few more things about setting SNI and ipv6 handling
a13803a to
64b42db
Compare
71cd112 to
6373f89
Compare
| PatternPort = binary_to_integer(PatternPortBin), | ||
| TargetPort = binary_to_integer(TargetPortBin), | ||
| % Strip brackets from IPv6 addresses in targets | ||
| TargetHost = string:trim(TargetHost0, both, "[]"), |
There was a problem hiding this comment.
Even if we strip [] bracket from an ipv6 address tcp, when we pass an ip like "::1" to gen_tcp|ssl:connect/3 I think it will still fail, it can only handle a tuple
(node1@127.0.0.1)15> ssl:connect("[2607:f8b0:4023:100d::8b]", 443, [{verify, verify_none}]).
{error,nxdomain}
(node1@127.0.0.1)16> ssl:connect("2607:f8b0:4023:100d::8b", 443, [{verify, verify_none}]).
{error,nxdomain}
> ssl:connect({9735,63664,16419,4109,0,0,0,139}, 443, [{verify, verify_none}]).
{ok,{sslsocket,{gen_tcp,#Port<0.28>,tls_connection, undefined}, [<0.57527.0>,<0.57526.0>]}}
(address is from dig AAAA google.com)
Wonder if we then take the TargetHost and parse with a helper as:
parse_target(Bin) ->
case inet:parse_strict_address(binary_to_list(Bin)) of
{ok, Tuple} -> {ok, Tuple};
_ -> {ok, Bin};
end
Then connect_to can can either the get a string or a tuple.
There was a problem hiding this comment.
Yep - good shout. I've update the branch to strictly parse the TargetHost address and, if valid, store it as a tuple which gets passed through to connect_to. In other cases, it assumes the TargetHost is a hostname and passes it through as a string.
| Entry = string:trim(Entry0), | ||
| % Regex: HOST:PORT:TARGET:TARGET_PORT where TARGET can be [IPv6] | ||
| % Reject IPv6 patterns (starting with [), ensure non-empty captures | ||
| Pattern = "^([^:\\[]+):([0-9]+):([^:]+|\\[[^\\]]+\\]):([0-9]+)$", |
There was a problem hiding this comment.
It's a bit of a gnarly expression but I couldn't come up with anything smaller, after staring at it for a while it does seems right
| % Regex: HOST:PORT:TARGET:TARGET_PORT where TARGET can be [IPv6] | ||
| % Reject IPv6 patterns (starting with [), ensure non-empty captures | ||
| Pattern = "^([^:\\[]+):([0-9]+):([^:]+|\\[[^\\]]+\\]):([0-9]+)$", | ||
| case re:run(Entry, Pattern, [{capture, all_but_first, binary}]) of |
There was a problem hiding this comment.
all_but_first
Hadn't seen that used before, that's neat:
All but the first matching subpattern, that is, all explicitly captured subpatterns, but not the complete matching part of the subject string. This is useful if the regular expression as a whole matches a large part of the subject, but the part you are interested in is in an explicitly captured subpattern.
nickva
left a comment
There was a problem hiding this comment.
+1 nicely done, Will!
Don't forget to squash the fixup commits.
This adds a feature to the CouchDB replicator to override the connection target for specific host patterns (including wildcards) when making outbound requests. This is similar to the `--connect-to` option in curl. One use case is when requests need to be routed via a transparent SNI proxy e.g. for network egress monitoring and specifying overrides in /etc/hosts or similar isn't suffient / possible (e.g. due to lack of wildcard support). There is adds a new configuration option to specify the overrides: ``` [replicator] connect_to = patternhost:port:target:targetport,.. ``` The replicator resolves the configured host patterns to the alternative connection targets while preserving the request URL host (applies to regular requests and session-auth requests) and rewriting the port as necessary. If using https, the SNI header is added for the original Hostname. The `pattern` can be a hostname, including leading wildcards e.g. `*.example.com`. Targets must be IP addresses. IPv6 addresses are supported using bracketed notation e.g. `[2001:db8::1]`.
Overview
This adds a feature to the CouchDB replicator to override the DNS target for specific host
patterns (including wildcards) when making outbound requests. This has the same semantics as curl's [--connect-to](https://curl.se/docs/manpage.html#--connect-to option:
There is a new configuration option to specify the overrides:
The replicator resolves the configured host patterns to the alternative connection targets while
preserving the request URL host (applies to regular requests and session-auth requests).
Note this depends on the
connect_tooption in ibrowse.Testing recommendations
Testing this is with TLS a bit involved as it relies on setting up an SNI proxy. I did it using
nginxin docker with the configuration attached to proxy to a cloudant.com database. The proxy was running on a non-standard port (e.g. 8443) so that any replications connecting directly to cloudant.com would fail.I then set
connect_to = *.cloudant.com:443:127.0.0.1:8443in default.ini and configured a replication frommyaccount.cloudant.com/mydb. The test succeeds if the proxy logged the connection and the replication completed.The feature also works without TLS - you can just use it to direct an arbitrary hostname to your local couchdb, for instance. e.g. if you use
connect_to = *.cloudant.com:80:127.0.0.1:15984and couchdb is running on127.0.0.1:15984, set up a replication with source or target ashttp://foo.cloudant.com/db1and it will be redirected to127.0.0.1:15984/db1.nginx.conf.zip
Related Issues or Pull Requests
Checklist
rel/overlay/etc/default.inisrc/docsfolder