Skip to content

walk/git: split ls-files and stat paths concurrently#694

Open
Mic92 wants to merge 2 commits into
mainfrom
git-walker-parallel
Open

walk/git: split ls-files and stat paths concurrently#694
Mic92 wants to merge 2 commits into
mainfrom
git-walker-parallel

Conversation

@Mic92
Copy link
Copy Markdown
Member

@Mic92 Mic92 commented May 3, 2026

walk/git: run ls-files --cached / --others separately and stat in parallel
When --cached and --others are passed together, git only flushes output
after the untracked scan is done. On nixpkgs that is ~1.5s before the
first formatter can be spawned. After that we still Lstat every path
sequentially.

So run the two ls-files queries concurrently (the index-only one streams
right away) and do the Lstat calls in a worker pool.

On nixpkgs with --no-cache and nixfmt-rs this brings the wall time from
3.26s to 1.79s on a 64-core machine; the first formatter spawns after
80ms instead of 1564ms.

The same paths are emitted, just no longer in a deterministic order. As
far as I can tell nothing in the scheduler depends on that.

Mic92 added 2 commits May 3, 2026 22:23
…allel

When --cached and --others are passed together, git only flushes output
after the untracked scan is done. On nixpkgs that is ~1.5s before the
first formatter can be spawned. After that we still Lstat every path
sequentially.

So run the two ls-files queries concurrently (the index-only one streams
right away) and do the Lstat calls in a worker pool.

On nixpkgs with --no-cache and nixfmt-rs this brings the wall time from
3.26s to 1.79s on a 64-core machine; the first formatter spawns after
80ms instead of 1564ms.

The same paths are emitted, just no longer in a deterministic order. As
far as I can tell nothing in the scheduler depends on that.
When the format loop exits early (Ctrl+C, or `on-unmatched = fatal`),
walker.Close() is called while producers are still blocked on send --
git/jj on a full 64 KB stdout pipe (~1.6k paths), the filesystem walker
on a full filesCh -- so errgroup.Wait() hangs until kill -9.

Have Close() fire a cancellation signal first: a context for the git/jj
readers (also kills the child via exec.CommandContext) and a done
channel that filepath.Walk selects against. Regression tests overflow
the buffer and assert Close() returns without Read() being called.
@Mic92
Copy link
Copy Markdown
Member Author

Mic92 commented May 3, 2026

I noticed that, while working on nixfmt-rs and treefmt was slower than it should be...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant