-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peel the peeler #1519
Peel the peeler #1519
Conversation
This is the "main" function for this `impl`, so having it in the middle of other functions is confusing. No functional change.
This isn't a big deal on its own, but will make some things more obvious soon.
Whilst here, use `InstIdx::max()` as the dummy value: except in the unlucky case where we magically end up with exactly `InstIdx::max()` instructions after peeling, this will make errors more likely to stand out.
Also make the doc string for `dup_and_remap` more explicit about its guarantees.
Previously for `TraceHeaderStart` and `TraceHeaderEnd` this function did both remapping and copying e.g.: ``` m.trace_body_start = m.trace_header_start ``` This meant that calling this function more than once did very surprising things. This commit doesn't fix all the surprises -- it's still not idempotent -- but it now has fewer surprises than before. `TraceHeaderStart|End` now does nothing surprising, and `TraceBodyStart|End` is where the non-idempotency now resides: however, in practise, it's now "benign" idempotency, even if I still don't like it. For the time being, I have chosen to make the relevant `Module` fields `pub(crate)` partly as a reminder that something fishy is going on here. I can't quite work out what the right long-term fix is here, so I'll come back to it.
Right now optimising each instruction as its copied probably works OK, but it's going to be fragile in the long-term. In particular, this puts a *very* strong constraint on `iidx_map` that I suspect we'll forget. This commit turns this into a stage two thing: first we peel, creating a body that's virtually identical to the header. Once that's completely done, we then run the optimiser over the complete body. This means that optimising the header and body now run in conceptually the same manner: it's a bit slower (though probably unmeasureably so!) but less likely for us to shoot ourselves in the foot.
We will soon need this.
Previously we only picked up results from the analyser if we performed other optimisations. So imagine we had this: ``` %3: i8 = ... %4: i1 = eq %3, 7 ... call f(%3) ``` then previously the call wouldn't become `f(7i8)` but would stay as `f(%3)`. This commit rewrites *every* instruction so that we pick up all the results of the analyser. We could do this more efficiently (e.g. it's not worth rewriting instructions we've already optimised), but this is good enough for now, and it leads to a meaningful difference e.g. about a 4% improvement on big_loop.
let skipping = self | ||
.m | ||
.iter_skipping_insts() | ||
.skip_while(|(_, inst)| !matches!(inst, Inst::TraceBodyStart)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know about skip_while
, but that's very neat.
@@ -1800,6 +1795,16 @@ impl Inst { | |||
m.trace_body_end = m.trace_body_end.iter().map(|op| mapper(m, op)).collect(); | |||
Inst::TraceBodyEnd | |||
} | |||
Inst::TraceHeaderEnd => { | |||
// Copy the header label into the body while remapping the operands. | |||
m.trace_header_end = m.trace_header_end.iter().map(|op| mapper(m, op)).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you've lost me. Why are we remapping the header? Shouldn't the header stay as it was before we peeled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we discover constants from the analyser. [Well, not here, but opt/mod.rs
calls this, and finds out constants this way.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this makes more sense after seeing the following commit. Just to check I got this right: you added this because you later use this function for other things than just peeling off a loop (e.g. to propagate analyser info). When peeling the loop, this function is never called with TraceHeaderEnd
so we never remap this at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
Good stuff. |
83c9f16
to
65a8641
Compare
The failure seems to be ASAN putting in extra code that makes our peeling test bork. AFAICS there is no way to write the (quite tight) test we want if ASAN does this, so I've just |
65a8641
to
a0da661
Compare
Previously it was impractical for us to write a test for peeling: limitations in the optimiser meant that you needed a complex set of things to go right for peeling to end up having visibly good effects. The previous commit to this has made it possible for us to write a small test that makes the effect of peeling obvious: in this case we are able to elide a guard in the body. Irritatingly, we have to have minor variants for hwt/swt, because the latter inserts a store.
a0da661
to
4963d5b
Compare
Groan, so swt generates different IR than hwt, so I've had to duplicate the test for each tracer with two lines differing both in |
This PR gets more optimisations from peeling, in essence by propagating more constants from the header into the body. The easiest way to see this is that we can now write a trivial test for peeling, showing that a guard has been removed in the body (83c9f16).
The main enabling commit is 4858809, which rewrites all of an instruction's operands to pick up everything we have learnt from the analyser.
However, to get to the point that we can do that safely, we have to make a number of changes. The most fundamental change is ab09557, which in essence moves instruction-changing from the
dup_and_remap
method to the optimiser.Not everything about this PR is perfect: I've left a couple of
pub(crate)
fields, I've introduced aRefCell
(in a way that offends me to my core), and I've not worried about making the optimiser work very efficiently. However, overall, I think this nudges us further in the right direction, which is reflected in 4% boost to big_loop. More will come in the future, with luck.