[GC] Precompute: Fix ref.eq comparisons of structs with nested effects#7857
[GC] Precompute: Fix ref.eq comparisons of structs with nested effects#7857kripken merged 4 commits intoWebAssembly:mainfrom
Conversation
|
Some basic questions:
|
|
The correctness issue is because we need the caching (as I forgot). The caching ensures that each Yes, the two modes are for when we replace and when we don't. When we want to replace an expression, we can't ignore side effects like |
|
I can understand that if we ignore side effect (like |
|
It is just that we need two separate maps so we don't "mix" things. Imagine we compute (struct.new (local.tee ))If we compute it when we don't care about effects, we can store PTR in the cache (the pointer to the allocation that represents it, in the interpreter). Then, later, if we do care about effects, if we read from that cache, we'd get PTR, but that is wrong: if we ignored the cache, we'd get the right result (when effects matter), which is to stop at the Basically, the two modes can get different results for computing the same expression. To preserve the property that the cache returns the value we would have computed anyhow, we need one cache for each. |
Were you gonna put something in
So in this case, if there is a |
aheejin
left a comment
There was a problem hiding this comment.
If #7857 (comment) is a correct interpretation of your intention, LGTM. (Please let me know if not 😅)
|
Yes, and yes, exactly. |
tlively
left a comment
There was a problem hiding this comment.
Any guesses as to why the fuzzer did not catch this? Do we need to emit more ref.eq?
|
I'm really not sure how this was missed by the fuzzer. Still thinking about it / concerned... |
…hich to keep (#7863) The main changes here are: * Rather than precompute sometimes while ignoring effects of tees etc. and sometimes not, do so in a single manner: while considering the effects carefully and deciding which children to keep. * This lets us remove the dual cache from #7857, as now there is a single mode. But really, this is a rewrite of that core logic from scratch in a cleaner and less hackish way, while fixing issues with the dual cache and even the earlier single cache that it fixed: * We must have a single cached object for each expression. Having a dual cache opened us up to bugs, because it turns out we might actually cache an object in the propagate phase, and use it in the main phase, and each used a different cache. Now both phases do the same thing, so there is no risk. * We must compute effects when there are effects, because they are state in the PrecomputingExpressionRunner, a source of bugs with all previous caches. I realized that the solution here is simple: note when there are effects, and if so, just compute them. This is fine, because the quadratic case happens in global objects, which have no effects anyhow (and even inside functions it is rare to have such effects). And, after computing the effects, we use the single cached heap location, keeping identity stable (a key fix here). Then, the main visitExpression is straightforward: compute in the most general manner: NOT trying to replace the entire expression, which requires no side effects, but allowing them, and looking at the children afterwards to see which are actually needed. This is necessary to avoid a regression in this PR, but it actually ends up as a progression, since we can handle more cases, like (ref.eq (tee) (get)). Before the tee would stop us, and propagate doesn't handle this if it isn't written to a local, but now we can just compute it, and keep the tee around. Also, I figured out how to avoid the monotonically increasing code size problem, which e.g. GUFA has, where you see expression A, figure out it evaluates to constant C, but has effects you must keep, so you emit (A, C). That lets the constant get optimized, but if you run twice you can add C twice, unless you carefully look at the parent, which is annoying. Here, that is avoided because while we may add such a constant, regressing size, we still make progress because we remove the main expression itself - we may keep some children, but never the parent, so the increase is bounded. This improves not just GC code but is an improvement in some Emscripten size tests. There are also some minor theoretical regressions, as a few tests show, but those are things other passes handle better (like (return (return ..))), so they only happen when running the pass by itself (production code using the full pipeline should only get better). This is also a slight improvement in compile times.
In the testcase here, the nested
teemake the outer struct not precomputablewhen we care about effects (
teeis an effect). But we could precompute itotherwise, and that ended up causing confusion, with no caching being done
of the canonical allocation for it, meaning each time we scanned it we saw
another struct, different in identity.
To fix this, implement the TODO to cache in both modes. I thought that would
be useful only for speed, but turns out it is necessary for correctness. Luckily
doing so is quite simple.