Pre-insert layouts of basic integers for consteval perf#156718
Conversation
Contrary to locals this did not have a query cache but the fixed usize type is used constantly, while projecting places into (array) indices.
|
r? @Kivooeo rustbot has assigned @Kivooeo. Use Why was this reviewer chosen?The reviewer was selected based on:
|
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (960cc11): comparison URL. Overall result: ❌✅ regressions and improvements - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.6%, secondary 1.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 1.6%, secondary 2.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 511.688s -> 511.616s (-0.01%) |
|
is perf good? i still feel like i'm unable to read it, can someone translate it please -1,0% sounds good |
Is there a chance we could call the 'real' layout computation during that pre-interning step? I think we do this in a number of other places, though the re-entrancy can be a bit tricky. IIUC, though, there's not really two sources of truth here -- the real layout computation should never be hit with the pre-interned types, right? If so, can we add unreachable!() or similar into the real layout computation?
This seems like a good idea to dig into, not sure why that would be specifically so slow... |
The problem of this direction is that a The interesting part to me is that this would also avoid a bit of odd double work. When miri hits an expression of common primitive types it builds up a complete This is a pattern: building up the layout of primitives mostly constructs one value by filling in fields with constants. That value is then passed off to a more generic function that again dispatches or computes on those fields. (E.g. also happens for Ideally we would also communicate to the call to |
|
It was quite a lot easier to have the real layout layout computation hit the pre-interned cache than I thought the implementation might be; just extending the list to all the signed and unsigned variants is sufficient. That works out as a single-source of truth here. |
While investigating build performance for
image, we noted throughperfsamples that a surprising amount of time was spent in consteval and more specifically calls tolayout_of. That was curious since we use a few const tables but definitely not of extreme size nor complicated computation and really only with primitive types. Through instrumenting (printf debugging the types being queried) adebugbuild, this pattern emerged the tail end of the summary:Several question arise:
f64and other primitive types so often? This fits to a query per evaluation of the lines in MIR instead of a query per line itself. There is a cache of layouts for locals of a block; but it is only local and that cache is constructed within the query to a const evaluation of a block.std::alloc::Layout?In an attempt to mitigate the first and considering that
usizeis necessarily used for indexing into an array at the moment, maybe one of the large uses, I've attempted to pre-intern the layouts of common integer types and special case them out.This is obviously fraught with peril; it must not disagree with the other layout computations and a second source of truth is risky in that regard. So the main question is whether to continue down this path or find a way to reduce the count by doing more clever type assignment in expression evaluation maybe.
Performance results have been okay. I've had problems reliably producing numbers in debug builds in the first place, sometimes the instrumentation did not seem to trigger, so there was definitely something I don't understand about the eval. In a
--releaseprofile build I seem to observe a moderate effect of ~1-2% overall.@RalfJung we chatted about related topics, hopefully this isn't taking too much of your time from other matters.