-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failure at omr/compiler/x/codegen/OMRCodeGenerator.cpp:2122: cursorInstruction->getEstimatedBinaryLength() >= self()->getBinaryBufferCursor() - instructionStart when PROD_WITH_ASSUMES is enabled #17442
Comments
So if I'm interpreting this (and the source code) correctly, this assertion occurs during a stage of code generation in which generated code is converted into binary in preparation for execution or storage in the shared class cache. The assertion appears inside a while loop that generates binary for all instructions after the interpreter entry point (which possibly means code that is interpreted rather than jitted?), and seems to be firing during this build because an estimation of the length of a binary encoded instruction (17) is shorter than the actual length of that binary encoded instruction (18). I'm unsure of why an assert would be firing in this situation. If the binary instruction has already been generated, why does it matter that a previous prediction of its length was wrong? I can see how that might be useful information to help us create better estimates in the future, but it doesn't seem like something that we would want to halt the compiler for. Of course it is very likely that I don't actually understand what's really going on here (in fact, I'm not sure why this length estimation happens at all), but since either the assert or the code it's testing must be at fault, my best guess is that the assert is incorrect. |
Generally, this kind of assert would appear if there is a mismatch between how many bytes of instructions we expected a method to consume versus how many it actually consumed when the binary was emitted (we need to "estimate" the size of the code buffer first in order to allocate space for it). Every instruction emitted conservatively estimates how many bytes it might need, and this should always be >= the number of bytes actually consumed. What this assert is saying is that for some instruction the running estimate got behind the number of actual bytes emitted (meaning we've started to overrun the code buffer). It would be helpful if this assert actually printed the instruction address (i.e., Once you know the instruction where the problem first appeared you can inspect its |
An 18-byte instruction is unusually long, I'm not sure I've ever seen that before. Seems to be tripping up on a LEA instruction, which in fact, is not 18 bytes.
|
Maximum instruction length on x86 is 15 bytes architecturally. Either something is clearly encoded wrong here, or this is a pseudo instruction (e.g., a vgnop) that is reporting a large length. |
I dumped the buffer (18 bytes), looks like an address load instruction was inserted, which explains why so many bytes is required.
@dylanjtuttle You can start by looking at the estimation logic here. |
Hi @BradleyWood, I'm a little confused by what you mean (I'm a new intern on the opt team and I'm still learning how the system works). Is this a situation in which an erroneous |
The LEA instruction itself looks normal. However, sometimes at codegen, the memory reference cannot be encoded in the instruction. When this happens, another instruction is inserted to help load the base address of the memory reference. In this case the estimate of 17 bytes (should be at least 18) is too small to account for the address load instruction. You need to investigate why the estimate is too low. |
To put what Brad said in more concrete terms, memory references can consolidate registers by inserting instructions. Here, for example: https://github.com/eclipse/omr/blob/8091cedda274b476e9e9d4da24128348a6c66aa4/compiler/x/codegen/OMRMemoryReference.cpp#L860 Did you get a log (even a partial one up to the point of assert) for the method this occurs in? |
If I'm understanding the situation correctly, I don't think I can run this particular method (at least with PROD_WITH_ASSUMES enabled) because since this assert happens while the code is building, so I don't end up with a java executable that I can run the method with. So far I've just been building the code repeatedly until the assert happens. Is there a way to get a log from that? |
You can guard the assert with an environment variable to get a build.
After your build, you can try to get a log of one of the crashing methods by defining the variable.
|
That's really helpful, thank you! I'll give that a try. |
I think I'm beginning to spin my tires a little, so I thought it would be a good time to ask for some more help. I've inserted a variety of debug print statements in different places to try and pin down where the estimation goes wrong, but none of them print the instruction address/length/estimate that actually triggers the assert. I have theories about why that might be, but ultimately I don't think getting the correct print statements would tell me enough to figure out where the problem is anyway, so I've sort of abandoned this approach. For the record, I have figured out how to get the logs working thanks to some help from @bhavanisn, so that's nice! I've also been trying to go through the estimation logic to see how we estimate the length for LEA instructions specifically. I looked through At the same time, I've been trying to learn how the code generation for LEA works and how it decides when to add the extra MOV. I feel like I'm beginning to understand it, but I'm still confused because I can't find any corresponding logic on the estimation side. Are we failing this assert because the estimation logic simply doesn't account for this case? Or does it account for it somewhere (incorrectly) and I haven't been able to find it yet? I've noticed that there are a lot of other methods called |
As we discussed, the estimates produced for the memory reference are correct at 15 bytes. It looks like count is short because it does not account for the REX prefix. I verified this be checking the calculated rex bits at the time of estimation, which happened to be 0. Obviously, the instruction has a rex prefix, so the key thing to investigate is why those bits are 0, and not correctly calculated when the length of the LEA instruction is estimated. I suspect it is a side effect of inserting the address load instruction. |
I suspect that the problem is here. _indexRegister changes at the time of code-generation which requires the REX prefix. At this point, the estimate has already been calculated. |
The assertion at
/home/jenkins/workspace/Build_JDK11_x86-64_linux_Personal/omr/compiler/x/codegen/OMRCodeGenerator.cpp:2122: cursorInstruction->getEstimatedBinaryLength() >= self()->getBinaryBufferCursor() - instructionStart
fires when building OpenJDK 11 on
x86-64_linux
with the-DPROD_WITH_ASSUMES
flag enabled inopenj9/runtime/compiler/CMakeLists.txt
.Note that this assertion fires during the build process and not while running tests like the other failures under
PROD_WITH_ASSUMES
as discovered last year.Link to the Jenkins Build.
Stack trace:
The text was updated successfully, but these errors were encountered: