Pre-insert layouts of basic integers for consteval perf by 197g · Pull Request #156718 · rust-lang/rust

197g · 2026-05-18T14:56:16Z

While investigating build performance for image, we noted through perf samples that a surprising amount of time was spent in consteval and more specifically calls to layout_of. That was curious since we use a few const tables but definitely not of extreme size nor complicated computation and really only with primitive types. Through instrumenting (printf debugging the types being queried) a debug build, this pattern emerged the tail end of the summary:

RDTSC    count  type
-------
2968652 48376   u8
2975486 75647   usize
3372234 2024    std::num::NonZero<usize>
3494159 65639   FnDef(DefId(2:705 ~ core[93c5]::f64::{impl#0}::to_bits), [])
6016422 107343  ()
7589954 145971  FnDef(DefId(21:947 ~ pxfm[d8cc]::common::fmla), [])
10072816        175205  i64
15194244        290049  i32
22053787        392617  u32
22284234        322397  bool
40392452        615785  u64
58157921        2110    std::alloc::Layout
179631693       2996946 f64

Several question arise:

Is it necessary to query f64 and other primitive types so often? This fits to a query per evaluation of the lines in MIR instead of a query per line itself. There is a cache of layouts for locals of a block; but it is only local and that cache is constructed within the query to a const evaluation of a block.
What's happening while querying the layout of std::alloc::Layout?

In an attempt to mitigate the first and considering that usize is necessarily used for indexing into an array at the moment, maybe one of the large uses, I've attempted to pre-intern the layouts of common integer types and special case them out.

This is obviously fraught with peril; it must not disagree with the other layout computations and a second source of truth is risky in that regard. So the main question is whether to continue down this path or find a way to reduce the count by doing more clever type assignment in expression evaluation maybe.

Performance results have been okay. I've had problems reliably producing numbers in debug builds in the first place, sometimes the instrumentation did not seem to trigger, so there was definitely something I don't understand about the eval. In a --release profile build I seem to observe a moderate effect of ~1-2% overall.

@RalfJung we chatted about related topics, hopefully this isn't taking too much of your time from other matters.

Contrary to locals this did not have a query cache but the fixed usize type is used constantly, while projecting places into (array) indices.

rustbot · 2026-05-18T14:56:29Z

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

rustbot · 2026-05-18T14:56:32Z

r? @Kivooeo

rustbot has assigned @Kivooeo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

Owners of files modified in this PR: compiler, types
compiler, types expanded to 73 candidates
Random selection from 19 candidates

lqd · 2026-05-18T15:09:47Z

@bors try @rust-timer queue

Consteval layout perf

rust-bors · 2026-05-18T17:22:08Z

☀️ Try build successful (CI)
Build commit: 960cc11 (960cc11f53c88c2a75400df4d7224293ba40d345, parent: 5ea817c65e4896167300b7d2550781b98da9901a)

rust-timer · 2026-05-18T18:03:06Z

Finished benchmarking commit (960cc11): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.0%	[-2.5%, -0.2%]	14
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results (primary -1.6%, secondary 1.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	4.1%	[0.8%, 7.4%]	2
Improvements ✅ (primary)	-1.6%	[-2.7%, -0.5%]	2
Improvements ✅ (secondary)	-1.1%	[-1.5%, -0.6%]	2
All ❌✅ (primary)	-1.6%	[-2.7%, -0.5%]	2

Cycles

Results (primary 1.6%, secondary 2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	1.6%	[1.6%, 1.6%]	1
Regressions ❌ (secondary)	4.7%	[3.3%, 6.6%]	5
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.8%	[-5.8%, -3.8%]	2
All ❌✅ (primary)	1.6%	[1.6%, 1.6%]	1

Binary size

This perf run didn't have relevant results for this metric.

Bootstrap: 511.688s -> 511.616s (-0.01%)
Artifact size: 400.54 MiB -> 402.60 MiB (0.52%)

Kivooeo · 2026-05-18T20:28:38Z

is perf good? i still feel like i'm unable to read it, can someone translate it please

-1,0% sounds good

Mark-Simulacrum · 2026-05-20T12:45:22Z

This is obviously fraught with peril; it must not disagree with the other layout computations and a second source of truth is risky in that regard. So the main question is whether to continue down this path or find a way to reduce the count by doing more clever type assignment in expression evaluation maybe.

Is there a chance we could call the 'real' layout computation during that pre-interning step? I think we do this in a number of other places, though the re-entrancy can be a bit tricky.

IIUC, though, there's not really two sources of truth here -- the real layout computation should never be hit with the pre-interned types, right? If so, can we add unreachable!() or similar into the real layout computation?

What's happening while querying the layout of std::alloc::Layout?

This seems like a good idea to dig into, not sure why that would be specifically so slow...

197g · 2026-05-20T15:33:27Z

Is there a chance we could call the 'real' layout computation during that pre-interning step?

The problem of this direction is that a LayoutCx<'_> for this call does not exist; we're only interning from the special branches where the layout does not depend on the precise environment (apart from TargetDataLayout). But using the pre-interned layouts for the converse seems roughly feasible? In consteval we would can make an educated heuristic approximate of most common types for which it should speculatively short circuit through those then.

The interesting part to me is that this would also avoid a bit of odd double work. When miri hits an expression of common primitive types it builds up a complete Ty. The code for computing its layout then matches on that runtime value and after distinguishing Int from Uint, from floats, builds up a different runtime value of type abi::Primitive in every branch that then itself gets passed on, and will get matched yet another time to fill its range information. If we instead dispatch to different layout code paths earlier based on the Ty we avoid that buildup-teardown repetition. Most of this is not too consequential in terms of performance due to query caching except if we use the first disambiguation to avoid calling into the general layout query system in the first place (as in what this PR does). Although if you're calling lots of functions they all get their own separate empty-layout FnDef interned despite definitely sharing layout.

This is a pattern: building up the layout of primitives mostly constructs one value by filling in fields with constants. That value is then passed off to a more generic function that again dispatches or computes on those fields. (E.g. also happens for FnPtr(..), sized pointers, …). That's a lot of branches taken in the nested function (LayoutData::scalar) that are redundant if they were lifted instead by computing the LayoutData independently.

Ideally we would also communicate to the call to layout_of information on what we know about the parameter: if it is likely to be primitive or not and speculation is worth it; or conversely if we know that it is definitely not primitive and you'll just want to query the layout via the query system. That's a micro-optimization though, much bigger fish to fry.

197g · 2026-05-20T20:41:23Z

It was quite a lot easier to have the real layout layout computation hit the pre-interned cache than I thought the implementation might be; just extending the list to all the signed and unsigned variants is sufficient. That works out as a single-source of truth here.

197g added 4 commits May 18, 2026 16:35

Cache ty-and-layout of usize

64d664c

Contrary to locals this did not have a query cache but the fixed usize type is used constantly, while projecting places into (array) indices.

Pre-Intern layout of usize

01a788d

Revert later: ty specific timings

92c9f5f

Precompute more layouts

bb1539e

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 18, 2026

rustbot assigned Kivooeo May 18, 2026

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 18, 2026

This comment has been minimized.

Sign in to view

rust-bors Bot pushed a commit that referenced this pull request May 18, 2026

Auto merge of #156718 - 197g:consteval-layout-perf, r=<try>

960cc11

Consteval layout perf

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels May 18, 2026

197g changed the title ~~Consteval layout perf~~ Pre-insert layouts of basic integers for consteval perf May 18, 2026

Use pre-interned layouts for layout_of_uncached

cc30d54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pre-insert layouts of basic integers for consteval perf#156718

Pre-insert layouts of basic integers for consteval perf#156718
197g wants to merge 5 commits into
rust-lang:mainfrom
197g:consteval-layout-perf

197g commented May 18, 2026

Uh oh!

rustbot commented May 18, 2026

Uh oh!

rustbot commented May 18, 2026

Uh oh!

lqd commented May 18, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors Bot commented May 18, 2026

Uh oh!

This comment has been minimized.

rust-timer commented May 18, 2026

Uh oh!

Kivooeo commented May 18, 2026

Uh oh!

Mark-Simulacrum commented May 20, 2026

Uh oh!

197g commented May 20, 2026 •

edited

Loading

Uh oh!

197g commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

197g commented May 18, 2026

Uh oh!

rustbot commented May 18, 2026

Uh oh!

rustbot commented May 18, 2026

Uh oh!

lqd commented May 18, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors Bot commented May 18, 2026

Uh oh!

This comment has been minimized.

rust-timer commented May 18, 2026

Overall result: ❌✅ regressions and improvements - please read:

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

Kivooeo commented May 18, 2026

Uh oh!

Mark-Simulacrum commented May 20, 2026

Uh oh!

197g commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

197g commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

197g commented May 20, 2026 •

edited

Loading