Parallelize routing by michael-struwe-mischok · Pull Request #1292 · VROOM-Project/vroom

michael-struwe-mischok · 2025-10-07T15:46:22Z

Issue

#1291

Tasks

Update CHANGELOG.md (remove if irrelevant)
review

Co-authored-by: aider (openrouter/anthropic/claude-3.5-sonnet) <aider@aider.chat>

jcoupey · 2025-10-08T07:04:38Z

Thanks for submitting a PR! This fixes #1218.

jcoupey · 2025-10-15T13:52:42Z

I've made a couple nitpicking adjustments so that the parallelization code looks similar to other places in the codebase and I've been testing this in a relevant setup: remote osrm-routed server with poor bandwith. The result is as expected: for an instance with 40 routes I'm going from ~17s of routing (mostly network time) down to ~4s.

The problem arise if we stretch the test to many more routes: for an instance with 400 routes, the parallelization has somehow throttled the OSRM server and I'm ending up with a [Error] Failed to connect to XX.XX.XX.XX:5000. This is especially frustrating as it could happen after a long search and spoil everything at the routing stage.

We should probably limit parallelization in a configurable way, maybe re-using the -t value from options.

Note: the same limitation theoretically applies to the parallelized matrix computations, but the number of parallel requests is much lower as it's bounded by the number of profiles in use.

michael-struwe-mischok · 2025-10-28T08:40:14Z

Added a limit to the parallelization using a semaphore. This starts all the threads, but has all except nb_thread of them wait. I think it shouldn't be a big problem to start too many idle threads since at this point the bottleneck is I/O, not CPU/RAM usage.

Tried this with vroom-docker to confirm it reacts to the threads config option & doesn't break catastrophically :-).

jcoupey

The use of counting_semaphore looks like a very neat solution, thanks for the update!

Out of curiosity, what is the reason for choosing a template value of 128?

jcoupey · 2025-10-31T09:10:44Z

I've been running further tests with a remote osrm server and a low bandwidth. Already having a few threads (4 or 8) makes a huge difference: the overall routing time is more than 6x faster. Then for e.g. 32 and 64 threads, the overall routing time is slightly higher so it looks like the threads boilerplate exceeds the benefit.

Maybe that would hint toward using a max number of 32, which would also be in line with the rest of the parallelization strategy. @michael-struwe-mischok do you have other input? what do you think?

michael-struwe-mischok · 2025-10-31T09:50:36Z

From how I understand the mentions of LeastMaxValue and max in https://en.cppreference.com/w/cpp/thread/counting_semaphore.html, we can use the template value to choose a maximum for the semaphore (just that the actual maximum delivered by the implementation might be higher if it wants to). E.g. if we say 128, then even if nb_thread is 200, the semaphore will begin its internal counter at 128 instead of 200 (or it may begin at 200, if the implementation wants to provide a higher maximum). So it's a bit like allocating enough memory for what we need.

The 128 was just a guess on how many parallel requests might be a good idea. We could also choose a higher maximum, e.g. 512. Someone might configure their OSRM to be powerful enough for this many requests / their network conditions may make it appropriate. (I assume the overhead of a too-high maximum is pretty low)

Maybe we shouldn't use nb_thread for the number of parallel requests & configure it separately instead. The nb_thread is more about how much CPU is available for VROOM, and this is more about how many requests to OSRM should be in-flight at once. The default could be sth like nb_thread * 4

Example scenario:

Someone runs VROOM and OSRM on two machines that both have 4 cores
They configure both VROOM and OSRM to use 4 threads
Now if VROOM would send nb_thread requests in parallel, this would send a burst of 4 requests, all of them would come back at once, then send another burst of 4 requests
Instead, if VROOM sends e.g. 32 requests in parallel, but OSRM can only actually handle 4 requests at once: It would send 32 requests, then get a burst of 4 requests back & at that moment send the next 4 requests. While the 4 responses are flying over the network, OSRM still has enough work to do with the other 28 requests to keep its CPU resources used
If the number of parallel requests is high enough to fully use the OSRM CPU resources, but otherwise not overwhelm OSRM, that is the optimal value. Given the network as a constant that we cannot change, the bottleneck we have to exploit is that the OSRM CPU is busy with our requests. Total time = network-towards-OSRM + OSRM-CPU-for-all-the-requests + network-back-to-VROOM

However in an ideal setup, probably you should run OSRM and VROOM on the same machine to minimize the network time. In that case, nb_thread should already be a pretty good value.

jcoupey · 2025-10-31T10:57:36Z

You're right that this should ideally be configured differently, but it's kind of very routing specific so I don't really feel like having another dedicated parameter for that.

As your example shows, this is highly dependent on the routing deployment (not only osrm-routed but potential load balancing on top of it, not to mention other routing engines) so it's hard for us to make a generic guess. What we want to avoid is having regressions where users suddenly start hitting errors because we changed the routing process internally, so we have to be careful/conservative about the defaults.

If using std::counting_semaphore<32> semaphore(nb_thread); as a default we should be safe, while transparently providing a speedup by default. If someone uses -t 4, it might indeed be less efficient on the routing side than using 32 threads as your example points out, but that's still a huge improvement over the previous behavior.

On the other hand, users that have a routing setup allowing higher request rates can increase the default value (we could even make the 128 or 32 a constexpr variable configurable somewhere). @michael-struwe-mischok what do you think?

michael-struwe-mischok · 2025-10-31T14:41:50Z

👍 In general I don't want to block this with details so feel free to just do sth that seems appropriate :-)

I think std::counting_semaphore<32> semaphore(nb_thread); is fine, just two nitpicks:

If the 32 is there as a maximum in order to avoid breakage in some situations, it would be better to do sth like std::counting_semaphore<32> semaphore(min(32, nb_thread)); so that the maximum actually applies. Because the implementation of counting_semaphore could use a higher maximum than what we say in the template value
I think we could do sth like nb_thread * 2 here which should be a bit faster while still being careful

jcoupey · 2025-11-03T15:18:52Z

Good point about taking the min. I've introduced a constexpr unsigned MAX_ROUTING_THREADS set to 32 so it is straightforward to change the behavior for anyone that dives a bit into the code.

I kept the nb_thread value (not using e.g. nb_thread * 2) since if we keep a single parameter to control all parallelization, it feels more consistent this way. I know the number of threads used for routing requests does not make really sense in terms of CPU usage if you have a remote routing server. But then it does if the routing server is on the same machine, and that's somehow expected in a basic setup as we look for a local OSRM instance by default.

jcoupey · 2025-11-03T15:20:35Z

For the record, I've also run some quick tests using libosrm and noticed the same magnitude in routing time reduction.

michael-struwe-mischok and others added 3 commits September 25, 2025 19:22

feat: Parallelize geometry addition using 128 concurrent threads

2c014fb

Co-authored-by: aider (openrouter/anthropic/claude-3.5-sonnet) <aider@aider.chat>

Remove semaphore

7048680

Add changelog entry with link to issue

e941834

jcoupey added enhancement routing labels Oct 8, 2025

jcoupey added this to the v1.15.0 milestone Oct 8, 2025

jcoupey and others added 5 commits October 15, 2025 09:17

Merge branch 'master' into parallelize-routing

276ce44

Merge branch 'master' into parallelize-routing

e426c05

Use exact same exception handling as in other places for consistency.

1654294

Run clang-format.

506c69a

Put thread code into its own dedicated lambda.

6ce5ad8

Limit parallel routing requests to nb_thread, or at most 128

b20bff3

jcoupey reviewed Oct 30, 2025

View reviewed changes

jcoupey mentioned this pull request Oct 30, 2025

Use C++20 counting_semaphore to parallelize solving #1305

Closed

Make semaphore count configurable and default to 32.

172adcb

jcoupey approved these changes Nov 3, 2025

View reviewed changes

jcoupey merged commit a91f5c9 into VROOM-Project:master Nov 3, 2025
4 checks passed

This was referenced Nov 3, 2025

Parallel route requests #1218

Closed

Refactor parallel solving #1307

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize routing#1292

Parallelize routing#1292
jcoupey merged 10 commits intoVROOM-Project:masterfrom
michael-struwe-mischok:parallelize-routing

michael-struwe-mischok commented Oct 7, 2025 •

edited by jcoupey

Loading

Uh oh!

jcoupey commented Oct 8, 2025

Uh oh!

jcoupey commented Oct 15, 2025

Uh oh!

michael-struwe-mischok commented Oct 28, 2025 •

edited

Loading

Uh oh!

jcoupey left a comment

Uh oh!

jcoupey commented Oct 31, 2025

Uh oh!

michael-struwe-mischok commented Oct 31, 2025

Uh oh!

jcoupey commented Oct 31, 2025 •

edited

Loading

Uh oh!

michael-struwe-mischok commented Oct 31, 2025

Uh oh!

jcoupey commented Nov 3, 2025

Uh oh!

jcoupey commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michael-struwe-mischok commented Oct 7, 2025 • edited by jcoupey Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Tasks

Uh oh!

jcoupey commented Oct 8, 2025

Uh oh!

jcoupey commented Oct 15, 2025

Uh oh!

michael-struwe-mischok commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcoupey left a comment

Choose a reason for hiding this comment

Uh oh!

jcoupey commented Oct 31, 2025

Uh oh!

michael-struwe-mischok commented Oct 31, 2025

Uh oh!

jcoupey commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michael-struwe-mischok commented Oct 31, 2025

Uh oh!

jcoupey commented Nov 3, 2025

Uh oh!

jcoupey commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michael-struwe-mischok commented Oct 7, 2025 •

edited by jcoupey

Loading

michael-struwe-mischok commented Oct 28, 2025 •

edited

Loading

jcoupey commented Oct 31, 2025 •

edited

Loading