-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failure at omr/compiler/il/OMRNode.cpp:4487: 0 #17559
Comments
This assertion is found in This assert's condition is always false, so its firing is dependent on the if/else statement it sits in. The assert is in the else block of an if statement with condition:
The
That means if we make it into the else block,
|
Just tagging @r30shah for his awareness. |
If this helps, that call to complex genericAnalyser is coming from |
Thanks @r30shah! A few updates:
note the offending node As I mentioned in my previous comment, |
|
I still haven't been able to find anything conclusive, but I thought I would give an update since I have made some interesting discoveries. I've been having issues reproducing this assert on my Z VM, it would fire during the build but not when I guarded it with an environment variable (a method suggested by Brad here that I've been using without issues until now). After spending a few days trying to fix this, I eventually went back to my old method of continuously rebuilding until the assert fires, but now I know that I can check the jitdump and add extra debug statements to it. Here is the result of that, with some extra comments I've added after the fact: [ 0x3ff42376f60] fence Relative [ 000003FF42070240 ] BBStart <block_5> (frequency 2775) (extension of previous block) (in loop 0)
============================================================
; Live regs: GPR=5 FPR=0 VRF=0 {GPR_ 0x3ff58574260, &GPR_ 0x3ff585741f0, &GPR_ 0x3ff58574180, &GPR_ 0x3ff58574110, &GPR_ 0x3ff585740a0}
------------------------------
n303n ( 0) ifacmpne --> block_15 BBStart at n299n (ProfiledGuard/VftTest ) [ 0x3ff58283e70] bci=[0,0,-] rc=0 vc=764 vn=- li=5 udi=- nc=3 flg=0x1020
n301n ( 1) aloadi <vft-symbol>[#322 Shadow] [flags 0x18607 0x0 ] (X!=0 ) [ 0x3ff58283dd0] bci=[-1,48,-] rc=1 vc=764 vn=- li=5 udi=- nc=1 flg=0x4
n631n ( 9) ==>aRegLoad (in &GPR_ 0x3ff58574180) (X!=0 X>=0 SeenRealReference )
n302n ( 1) aconst 0xdfd700 (java/util/WeakHashMap$Entry.class) (classPointerConstant X!=0 X>=0 X<=0 ) [ 0x3ff58283e20] bci=[0,0,-] rc=1 vc=764 vn=- li=5 udi=- nc=0 flg=0x10304
n636n ( 1) GlRegDeps () [ 0x3ff5828a680] bci=[0,0,-] rc=1 vc=764 vn=- li=5 udi=- nc=5 flg=0x20
n629n ( 5) ==>aRegLoad (in &GPR_ 0x3ff585740a0)
n630n ( 5) ==>aRegLoad (in &GPR_ 0x3ff58574110)
n631n ( 9) ==>aRegLoad (in &GPR_ 0x3ff58574180) (X!=0 X>=0 SeenRealReference )
n632n ( 8) ==>aRegLoad (in &GPR_ 0x3ff585741f0) (X>=0 SeenRealReference )
n633n ( 5) ==>iRegLoad (in GPR_ 0x3ff58574260) (cannotOverflow SeenRealReference )
------------------------------
<!--Evaluation of offending node-->
EVALUATING NODE n303n
<!--Evaluation of node's first child-->
EVALUATING NODE n301n
<!--Call to checkAndAllocateReferenceRegister() in Z TreeEvaluator-->
aload reg contains ref: 0
<!--Evaluation of node's first grandchild-->
EVALUATING NODE n631n
<!--Evaluation of node's second child-->
EVALUATING NODE n302n
<!--A second evaluation of node's first child?-->
EVALUATING NODE n301n
<!--Call to setEvaluationPriority on node's third child-->
NODE n636n IN setEvaluationPriority, _register: 0x00000000
isAssert is NOT NULL, assert should be enabled... node n636n
<!--A second call to setEvaluationPriority on node's third child-->
NODE n636n IN setEvaluationPriority, _register: 0x00000001
isAssert is NOT NULL, assert should be enabled... node n636n
<!--The offending call to setEvaluationPriority on node-->
whichChildToEvaluate: calling setEvaluationPriority on node n303n
NODE n303n IN setEvaluationPriority, _register: 0x582b38e0
isAssert is NOT NULL, assert should be enabled... node n303n
NODE n303n HAS ALREADY BEEN EVALUATED
</recompilation rc=0 queued=1>
</jitDump> There are a few things to take away from this. First, the offending node which causes the assertion has a different global index (
As it turns out (thanks you @bhavanisn for the help), this
an
|
Hi @dylanjtuttle with respect to node numbers, are you checking the glibal index from the jit dump and matchi git with the assert message? If this is the case, a jitdump is a different compilation than the failing one. As you fail with assert during jit compilation, it triggers recompilation of the method with tracing to provide more diagnostic information. Now coming to ASSERT, I did not get enough time to check out the jitdump, but on surface, calling on set evaluation priority on branch node is something I have difficult time understanding. I will checkout jitdump you provided tomorrow and touch base with you. |
Hey @r30shah, I was matching the global index from the jitdump to the assert message. Thanks for the clarification! As for
It is definitely odd that according to my debug statements, this In any case, I think the important thing to take away here is the value of I am now essentially certain that |
Yes, but I would like to know how it is getting set. One way it can get set is by calling a Btw I can get it failing on my environment with debug build and checking out the _register field when we fail with Assert, it is not NULL, but I am not sure it contains a valid data as well.
I created a small unit test that could have |
I think it's actually not getting set. The value returned by [1]. virtualGuardInfo() *** I was incorrect about this, this is not how unions work! |
Yes, So in failing case, it is reading the memory content that actually represents a valid |
I don't think changing the tests is the solution, because *** I was still incorrect about this, this is still not how unions work! |
@r30shah In this case where it fails is evaluating an |
@bhavanisn can you point where you want to set this to NULL? Actually now taking a detailed look at how Now, taking a step back before jumping into the fix, I would like to understand why there is a behavioral difference for profiled guard (Or virtual guards) evaluation between Z and X. Especially on Z, when it fails, bytes in the Union represents seemed a valid Virtual Guard. Looking at the #17559 (comment), it is actually NULL on X. We should understand this difference first. |
@r30shah I was thinking to set
|
@bhavanisn Ahh, I see, okk, that could work, though I would like to understand why on X it is NULL and on Z it is not. |
Adding this belatedly to the 0.41 release - it would be good to at least be able to build sanity.functional with PROD_WITH_ASSUMES, even if there are still crashes running it. |
The problem can also be reproduced on x - removing the arch:z label |
If this is reproducible on x as well, I think we should think in the direction @bhavanisn suggested in #17559 (comment). i.e, we should initialize the value before returning |
I'm worried that there is no guarantee |
True, looking at the code-gen helpers where we would need this information. Most of the code-gens would call Further thinking, if we should just check [1]. https://github.com/eclipse/omr/blob/9659cbcdcef670fda03609c058e5cb04c58c6511/compiler/compile/OMRCompilation.cpp#L1553C19-L1553C39 |
Sometime back @dylanjtuttle did try initializing the |
I'm going to move this to the 0.43 release, so you won't have to "double deliver" any fix. |
On my local x86 VM, I replaced the
The exact number of occurrences seems to vary based on how much code is being compiled during the JVM build, but the highest number I've gotten is 2,818 individual situations in which this assert "would" fire if it were enabled. There is a subset of repeat nodes that have It would probably be useful to do this analysis on Z as well just to make sure we see the same results, but it seems pretty conclusive that this problem occurs specifically and exclusively for tree top |
The nodes that trigger this assertion failure (on x86) follow the path:
Thus, I propose this is where we add the check to ensure That is, change ...
if (!node->getOpCode().isIf() && !node->getOpCode().isTreeTop())
{
node->setEvaluationPriority(nodePriority);
}
return bestChild; Is it worth trying to figure out if there are any other cases in which we shouldn't call |
It seems that the only real restriction on Although I had said earlier that setting the evaluation priority might not make sense for nodes for which |
That makes sense! I will test that change and report back. |
I've tested the change and found a variety of CRIU failures on power and Z. On Z, it seems like the tests are failing because CRIU is not enabled, which seems strange because not all of the CRIU tests are failing. On power, the failures seem to mostly involve the following error messages:
|
@dylanjtuttle, I think it's very unlikely that those failures would be related to your change. |
Daryl agreed when I spoke to him yesterday, but wanted to review the change. I'm going to open a PR and see what he thinks. |
From an off-line conversation, I understand @0xdaryl had some concerns about preventing I spent a little time looking at the history of the |
Fixed by OMR pull request eclipse-omr/omr#7492. Closing. |
Issue Number: 17559 |
The assertion at
/home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/omr/compiler/il/OMRNode.cpp:4487: 0
fires when building OpenJDK on
s390x_linux
with the-DPROD_WITH_ASSUMES
flag enabled inopenj9/runtime/compiler/CMakeLists.txt
.Link to the Jenkins build.
Stack trace:
The text was updated successfully, but these errors were encountered: