metal/ggml: keep q4_0 decode on the mat-vec kernel up to 8 rows by czoli1976 · Pull Request #16 · czoli1976/tract

czoli1976 · 2026-06-14T20:01:47Z

The q4_0 matrix-vector kernel is bandwidth-bound on the weight read and stays cheaper than the tiled GEMM up to ~8 activation rows, but the dispatcher switched to GEMM at m > 4, so 5–8-row q4 decode (batched, or speculative / lookahead) paid the full GEMM cost for no gain. This raises the q4 mat-vec row cap to 8; f16/f32 stay at 4.

Stacked on sonos#2366 (perf/metal-ggml-f16-roundtrip).

Perf

Forward-pass latency, Qwen3-1.7B q40ef16, Metal (Apple M-series), 256-token past, median ms/pass:

tokens/pass	main	+sonos#2366	+sonos#2366 +this
1	30.5	26.5	26.6
4	44.0	43.2	41.7
6	72.2	75.5	55.6
8	72.1	76.3	71.3
12	72.8	77.0	76.0

The 5–8-row band now lands on the mat-vec path (m=6: −26% vs sonos#2366). Single-token decode (m=1) and prefill (m≥12) are unchanged. Downstream this turns k=4 speculative decoding on Qwen3-1.7B from a slowdown (~0.81×) into a ~1.19× speedup, and benefits any small-batch q4 decode.

The crossover (8) is measured on Apple GPUs and would ideally be device-tuned.

🤖 Generated with Claude Code

The q4_0 matrix-vector kernel is bandwidth-bound on the weight read and stays cheaper than the tiled GEMM up to ~8 activation rows, but the dispatcher switched to GEMM at m>4, making 5-8-row q4 decode (batched or speculative) needlessly slow. Raise the q4 mat-vec row cap to 8; f16/f32 stay at 4. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

czoli1976 · 2026-06-14T21:06:34Z

Superseded by sonos#2369 (same branch, opened upstream against main, stacked on sonos#2366).

czoli1976 closed this Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal/ggml: keep q4_0 decode on the mat-vec kernel up to 8 rows#16

metal/ggml: keep q4_0 decode on the mat-vec kernel up to 8 rows#16
czoli1976 wants to merge 1 commit into
perf/metal-ggml-f16-roundtripfrom
perf/metal-q4-gemv-rows

czoli1976 commented Jun 14, 2026

Uh oh!

czoli1976 commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

czoli1976 commented Jun 14, 2026

Perf

Uh oh!

czoli1976 commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant