walk/git: split ls-files and stat paths concurrently#694
Open
Mic92 wants to merge 2 commits into
Open
Conversation
…allel When --cached and --others are passed together, git only flushes output after the untracked scan is done. On nixpkgs that is ~1.5s before the first formatter can be spawned. After that we still Lstat every path sequentially. So run the two ls-files queries concurrently (the index-only one streams right away) and do the Lstat calls in a worker pool. On nixpkgs with --no-cache and nixfmt-rs this brings the wall time from 3.26s to 1.79s on a 64-core machine; the first formatter spawns after 80ms instead of 1564ms. The same paths are emitted, just no longer in a deterministic order. As far as I can tell nothing in the scheduler depends on that.
When the format loop exits early (Ctrl+C, or `on-unmatched = fatal`), walker.Close() is called while producers are still blocked on send -- git/jj on a full 64 KB stdout pipe (~1.6k paths), the filesystem walker on a full filesCh -- so errgroup.Wait() hangs until kill -9. Have Close() fire a cancellation signal first: a context for the git/jj readers (also kills the child via exec.CommandContext) and a done channel that filepath.Walk selects against. Regression tests overflow the buffer and assert Close() returns without Read() being called.
Member
Author
|
I noticed that, while working on nixfmt-rs and treefmt was slower than it should be... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
walk/git: run ls-files --cached / --others separately and stat in parallel
When --cached and --others are passed together, git only flushes output
after the untracked scan is done. On nixpkgs that is ~1.5s before the
first formatter can be spawned. After that we still Lstat every path
sequentially.
So run the two ls-files queries concurrently (the index-only one streams
right away) and do the Lstat calls in a worker pool.
On nixpkgs with --no-cache and nixfmt-rs this brings the wall time from
3.26s to 1.79s on a 64-core machine; the first formatter spawns after
80ms instead of 1564ms.
The same paths are emitted, just no longer in a deterministic order. As
far as I can tell nothing in the scheduler depends on that.