Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peel the peeler #1519

Merged
merged 10 commits into from
Dec 20, 2024
Merged

Peel the peeler #1519

merged 10 commits into from
Dec 20, 2024

Conversation

ltratt
Copy link
Contributor

@ltratt ltratt commented Dec 19, 2024

This PR gets more optimisations from peeling, in essence by propagating more constants from the header into the body. The easiest way to see this is that we can now write a trivial test for peeling, showing that a guard has been removed in the body (83c9f16).

The main enabling commit is 4858809, which rewrites all of an instruction's operands to pick up everything we have learnt from the analyser.

However, to get to the point that we can do that safely, we have to make a number of changes. The most fundamental change is ab09557, which in essence moves instruction-changing from the dup_and_remap method to the optimiser.

Not everything about this PR is perfect: I've left a couple of pub(crate) fields, I've introduced a RefCell (in a way that offends me to my core), and I've not worried about making the optimiser work very efficiently. However, overall, I think this nudges us further in the right direction, which is reflected in 4% boost to big_loop. More will come in the future, with luck.

This is the "main" function for this `impl`, so having it in the middle
of other functions is confusing. No functional change.
This isn't a big deal on its own, but will make some things more obvious
soon.
Whilst here, use `InstIdx::max()` as the dummy value: except in the
unlucky case where we magically end up with exactly `InstIdx::max()`
instructions after peeling, this will make errors more likely to stand
out.
We *might* need to put this back (perhaps as `&mut`) later, but for now,
this enables inlining.
Also make the doc string for `dup_and_remap` more explicit about its
guarantees.
Previously for `TraceHeaderStart` and `TraceHeaderEnd` this function did
both remapping and copying e.g.:

```
m.trace_body_start = m.trace_header_start
```

This meant that calling this function more than once did very surprising
things.

This commit doesn't fix all the surprises -- it's still not idempotent
-- but it now has fewer surprises than before. `TraceHeaderStart|End`
now does nothing surprising, and `TraceBodyStart|End` is where the
non-idempotency now resides: however, in practise, it's now "benign"
idempotency, even if I still don't like it.

For the time being, I have chosen to make the relevant `Module` fields
`pub(crate)` partly as a reminder that something fishy is going on here.
I can't quite work out what the right long-term fix is here, so I'll
come back to it.
Right now optimising each instruction as its copied probably works OK,
but it's going to be fragile in the long-term. In particular, this puts
a *very* strong constraint on `iidx_map` that I suspect we'll forget.

This commit turns this into a stage two thing: first we peel, creating a
body that's virtually identical to the header. Once that's completely
done, we then run the optimiser over the complete body. This means that
optimising the header and body now run in conceptually the same manner:
it's a bit slower (though probably unmeasureably so!) but less likely
for us to shoot ourselves in the foot.
Previously we only picked up results from the analyser if we performed
other optimisations. So imagine we had this:

```
%3: i8 = ...
%4: i1 = eq %3, 7
...
call f(%3)
```

then previously the call wouldn't become `f(7i8)` but would stay as
`f(%3)`. This commit rewrites *every* instruction so that we pick up all
the results of the analyser. We could do this more efficiently (e.g.
it's not worth rewriting instructions we've already optimised), but this
is good enough for now, and it leads to a meaningful difference e.g.
about a 4% improvement on big_loop.
let skipping = self
.m
.iter_skipping_insts()
.skip_while(|(_, inst)| !matches!(inst, Inst::TraceBodyStart))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about skip_while, but that's very neat.

@@ -1800,6 +1795,16 @@ impl Inst {
m.trace_body_end = m.trace_body_end.iter().map(|op| mapper(m, op)).collect();
Inst::TraceBodyEnd
}
Inst::TraceHeaderEnd => {
// Copy the header label into the body while remapping the operands.
m.trace_header_end = m.trace_header_end.iter().map(|op| mapper(m, op)).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you've lost me. Why are we remapping the header? Shouldn't the header stay as it was before we peeled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we discover constants from the analyser. [Well, not here, but opt/mod.rs calls this, and finds out constants this way.]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this makes more sense after seeing the following commit. Just to check I got this right: you added this because you later use this function for other things than just peeling off a loop (e.g. to propagate analyser info). When peeling the loop, this function is never called with TraceHeaderEnd so we never remap this at that time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

@ptersilie ptersilie added this pull request to the merge queue Dec 19, 2024
@ptersilie
Copy link
Contributor

Good stuff.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 19, 2024
@ltratt
Copy link
Contributor Author

ltratt commented Dec 19, 2024

The failure seems to be ASAN putting in extra code that makes our peeling test bork. AFAICS there is no way to write the (quite tight) test we want if ASAN does this, so I've just ignore-ifed that test under ASAN in the force push.

@ptersilie ptersilie added this pull request to the merge queue Dec 19, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 19, 2024
Previously it was impractical for us to write a test for peeling:
limitations in the optimiser meant that you needed a complex set of
things to go right for peeling to end up having visibly good effects.
The previous commit to this has made it possible for us to write a small
test that makes the effect of peeling obvious: in this case we are able
to elide a guard in the body. Irritatingly, we have to have minor
variants for hwt/swt, because the latter inserts a store.
@ltratt
Copy link
Contributor Author

ltratt commented Dec 19, 2024

Groan, so swt generates different IR than hwt, so I've had to duplicate the test for each tracer with two lines differing both in ...... (i.e. "a line needs to be here but I don't care what"). Force pushed the update.

@vext01 vext01 added this pull request to the merge queue Dec 20, 2024
Merged via the queue into ykjit:master with commit 8a8e8a9 Dec 20, 2024
2 checks passed
@ltratt ltratt deleted the peel_the_peeler branch December 20, 2024 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants