Peel the peeler #1519

ltratt · 2024-12-19T16:22:26Z

This PR gets more optimisations from peeling, in essence by propagating more constants from the header into the body. The easiest way to see this is that we can now write a trivial test for peeling, showing that a guard has been removed in the body (83c9f16).

The main enabling commit is 4858809, which rewrites all of an instruction's operands to pick up everything we have learnt from the analyser.

However, to get to the point that we can do that safely, we have to make a number of changes. The most fundamental change is ab09557, which in essence moves instruction-changing from the dup_and_remap method to the optimiser.

Not everything about this PR is perfect: I've left a couple of pub(crate) fields, I've introduced a RefCell (in a way that offends me to my core), and I've not worried about making the optimiser work very efficiently. However, overall, I think this nudges us further in the right direction, which is reflected in 4% boost to big_loop. More will come in the future, with luck.

This is the "main" function for this `impl`, so having it in the middle of other functions is confusing. No functional change.

This isn't a big deal on its own, but will make some things more obvious soon.

Whilst here, use `InstIdx::max()` as the dummy value: except in the unlucky case where we magically end up with exactly `InstIdx::max()` instructions after peeling, this will make errors more likely to stand out.

We *might* need to put this back (perhaps as `&mut`) later, but for now, this enables inlining.

Also make the doc string for `dup_and_remap` more explicit about its guarantees.

Previously for `TraceHeaderStart` and `TraceHeaderEnd` this function did both remapping and copying e.g.: ``` m.trace_body_start = m.trace_header_start ``` This meant that calling this function more than once did very surprising things. This commit doesn't fix all the surprises -- it's still not idempotent -- but it now has fewer surprises than before. `TraceHeaderStart|End` now does nothing surprising, and `TraceBodyStart|End` is where the non-idempotency now resides: however, in practise, it's now "benign" idempotency, even if I still don't like it. For the time being, I have chosen to make the relevant `Module` fields `pub(crate)` partly as a reminder that something fishy is going on here. I can't quite work out what the right long-term fix is here, so I'll come back to it.

Right now optimising each instruction as its copied probably works OK, but it's going to be fragile in the long-term. In particular, this puts a *very* strong constraint on `iidx_map` that I suspect we'll forget. This commit turns this into a stage two thing: first we peel, creating a body that's virtually identical to the header. Once that's completely done, we then run the optimiser over the complete body. This means that optimising the header and body now run in conceptually the same manner: it's a bit slower (though probably unmeasureably so!) but less likely for us to shoot ourselves in the foot.

We will soon need this.

Previously we only picked up results from the analyser if we performed other optimisations. So imagine we had this: ``` %3: i8 = ... %4: i1 = eq %3, 7 ... call f(%3) ``` then previously the call wouldn't become `f(7i8)` but would stay as `f(%3)`. This commit rewrites *every* instruction so that we pick up all the results of the analyser. We could do this more efficiently (e.g. it's not worth rewriting instructions we've already optimised), but this is good enough for now, and it leads to a meaningful difference e.g. about a 4% improvement on big_loop.

ykrt/src/compile/jitc_yk/opt/mod.rs

ptersilie · 2024-12-19T16:44:45Z

ykrt/src/compile/jitc_yk/opt/mod.rs

+        let skipping = self
+            .m
+            .iter_skipping_insts()
+            .skip_while(|(_, inst)| !matches!(inst, Inst::TraceBodyStart))


Didn't know about skip_while, but that's very neat.

ptersilie · 2024-12-19T16:46:37Z

ykrt/src/compile/jitc_yk/jit_ir/mod.rs

@@ -1800,6 +1795,16 @@ impl Inst {
                m.trace_body_end = m.trace_body_end.iter().map(|op| mapper(m, op)).collect();
                Inst::TraceBodyEnd
            }
+            Inst::TraceHeaderEnd => {
+                // Copy the header label into the body while remapping the operands.
+                m.trace_header_end = m.trace_header_end.iter().map(|op| mapper(m, op)).collect();


Now you've lost me. Why are we remapping the header? Shouldn't the header stay as it was before we peeled?

This is where we discover constants from the analyser. [Well, not here, but opt/mod.rs calls this, and finds out constants this way.]

Oh, this makes more sense after seeing the following commit. Just to check I got this right: you added this because you later use this function for other things than just peeling off a loop (e.g. to propagate analyser info). When peeling the loop, this function is never called with TraceHeaderEnd so we never remap this at that time.

ptersilie · 2024-12-19T18:30:55Z

Good stuff.

ltratt · 2024-12-19T19:20:26Z

The failure seems to be ASAN putting in extra code that makes our peeling test bork. AFAICS there is no way to write the (quite tight) test we want if ASAN does this, so I've just ignore-ifed that test under ASAN in the force push.

Previously it was impractical for us to write a test for peeling: limitations in the optimiser meant that you needed a complex set of things to go right for peeling to end up having visibly good effects. The previous commit to this has made it possible for us to write a small test that makes the effect of peeling obvious: in this case we are able to elide a guard in the body. Irritatingly, we have to have minor variants for hwt/swt, because the latter inserts a store.

ltratt · 2024-12-19T23:04:08Z

Groan, so swt generates different IR than hwt, so I've had to duplicate the test for each tracer with two lines differing both in ...... (i.e. "a line needs to be here but I don't care what"). Force pushed the update.

ltratt added 9 commits December 19, 2024 16:18

Move function.

446ae36

This is the "main" function for this `impl`, so having it in the middle of other functions is confusing. No functional change.

Make clear that we can't optimise a TraceHeaderStart.

9ece1e2

This isn't a big deal on its own, but will make some things more obvious soon.

Make iidx_map map to InstIdxs not usizes.

1805d8d

Whilst here, use `InstIdx::max()` as the dummy value: except in the unlucky case where we magically end up with exactly `InstIdx::max()` instructions after peeling, this will make errors more likely to stand out.

Remove &.

64799f5

We *might* need to put this back (perhaps as `&mut`) later, but for now, this enables inlining.

"Variable" is the correct SSA term for these functions.

da34ac0

Also make the doc string for `dup_and_remap` more explicit about its guarantees.

Have dup_and_remap_vars take a Module.

f3220fc

We will soon need this.

ltratt assigned ptersilie Dec 19, 2024

ptersilie reviewed Dec 19, 2024

View reviewed changes

ykrt/src/compile/jitc_yk/opt/mod.rs Show resolved Hide resolved

ptersilie reviewed Dec 19, 2024

View reviewed changes

ptersilie added this pull request to the merge queue Dec 19, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 19, 2024

ltratt force-pushed the peel_the_peeler branch from 83c9f16 to 65a8641 Compare December 19, 2024 19:20

ptersilie added this pull request to the merge queue Dec 19, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 19, 2024

ltratt force-pushed the peel_the_peeler branch from 65a8641 to a0da661 Compare December 19, 2024 22:46

ltratt force-pushed the peel_the_peeler branch from a0da661 to 4963d5b Compare December 19, 2024 22:50

vext01 added this pull request to the merge queue Dec 20, 2024

Merged via the queue into ykjit:master with commit 8a8e8a9 Dec 20, 2024
2 checks passed

ltratt deleted the peel_the_peeler branch December 20, 2024 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peel the peeler #1519

Peel the peeler #1519

ltratt commented Dec 19, 2024

ptersilie Dec 19, 2024

ptersilie Dec 19, 2024

ltratt Dec 19, 2024

ptersilie Dec 19, 2024

ltratt Dec 19, 2024

ptersilie commented Dec 19, 2024

ltratt commented Dec 19, 2024

ltratt commented Dec 19, 2024

Peel the peeler #1519

Peel the peeler #1519

Conversation

ltratt commented Dec 19, 2024

ptersilie Dec 19, 2024

Choose a reason for hiding this comment

ptersilie Dec 19, 2024

Choose a reason for hiding this comment

ltratt Dec 19, 2024

Choose a reason for hiding this comment

ptersilie Dec 19, 2024

Choose a reason for hiding this comment

ltratt Dec 19, 2024

Choose a reason for hiding this comment

ptersilie commented Dec 19, 2024

ltratt commented Dec 19, 2024

ltratt commented Dec 19, 2024