Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure at omr/compiler/il/OMRNode.cpp:4487: 0 #17559

Closed
dylanjtuttle opened this issue Jun 8, 2023 · 33 comments
Closed

Assertion failure at omr/compiler/il/OMRNode.cpp:4487: 0 #17559

dylanjtuttle opened this issue Jun 8, 2023 · 33 comments
Assignees
Labels

Comments

@dylanjtuttle
Copy link
Contributor

dylanjtuttle commented Jun 8, 2023

The assertion at

/home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/omr/compiler/il/OMRNode.cpp:4487: 0

fires when building OpenJDK on s390x_linux with the -DPROD_WITH_ASSUMES flag enabled in openj9/runtime/compiler/CMakeLists.txt.

Link to the Jenkins build.

Stack trace:

16:36:49  Compiling 4 files for BUILD_JIGSAW_TOOLS
16:36:49  Assertion failed at /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/omr/compiler/il/OMRNode.cpp:4487: 0
16:36:49  VMState: 0x0005ff04
16:36:49  	setEvaluationPriority cannot be called after the node has already been evaluated
16:36:49  compiling java/util/HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object; at level: warm
16:36:49  #0: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xaf2d0e) [0x20040af2d0e]
16:36:49  #1: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xb01a90) [0x20040b01a90]
16:36:49  #2: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x731660) [0x20040731660]
16:36:49  #3: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x7319cc) [0x200407319cc]
16:36:49  #4: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x710e9c) [0x20040710e9c]
16:36:49  #5: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x691e0a) [0x20040691e0a]
16:36:49  #6: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xb2ad72) [0x20040b2ad72]
16:36:49  #7: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xbf30d8) [0x20040bf30d8]
16:36:49  #8: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xbf3c38) [0x20040bf3c38]
16:36:49  #9: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xbfcd60) [0x20040bfcd60]
16:36:49  #10: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xb49ad4) [0x20040b49ad4]
16:36:49  #11: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x692962) [0x20040692962]
16:36:49  #12: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x14495e) [0x2004014495e]
16:36:49  #13: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x6a2164) [0x200406a2164]
16:36:49  #14: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x6a30ce) [0x200406a30ce]
16:36:49  #15: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x69f528) [0x2004069f528]
16:36:49  #16: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x6d43a2) [0x200406d43a2]
16:36:49  #17: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18d1cc) [0x2004018d1cc]
16:36:49  #18: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18e438) [0x2004018e438]
16:36:49  #19: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9prt29.so(+0x34546) [0x2003acb4546]
16:36:49  #20: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18b758) [0x2004018b758]
16:36:49  #21: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18be10) [0x2004018be10]
16:36:49  #22: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18a72e) [0x2004018a72e]
16:36:49  #23: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18ad12) [0x2004018ad12]
16:36:49  #24: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18adc2) [0x2004018adc2]
16:36:49  #25: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9prt29.so(+0x34546) [0x2003acb4546]
16:36:49  #26: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18b25e) [0x2004018b25e]
16:36:49  #27: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9thr29.so(+0x5a14) [0x2003ad85a14]
16:36:49  #28: /lib64/libpthread.so.0(+0x8312) [0x2003a188312]
16:36:49  #29: /lib64/libc.so.6(+0x10e232) [0x2003a40e232]
16:36:49  
16:36:49  JIT: crashed while compiling java/util/HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object; (recoverable 0)
16:36:49  #0: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xaf2d0e) [0x20040af2d0e]
16:36:49  #1: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xb01a90) [0x20040b01a90]
16:36:49  #2: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x1732ac) [0x200401732ac]
16:36:49  #3: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9prt29.so(+0x3365c) [0x2003acb365c]
16:36:49  #4: [0x2003bb8c2c0]
16:36:49  #5: /lib64/libpthread.so.0(raise+0x30) [0x2003a190bc0]
16:36:49  #6: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x7317fe) [0x200407317fe]
16:36:49  #7: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x7319d2) [0x200407319d2]
16:36:49  #8: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x710e9c) [0x20040710e9c]
16:36:49  #9: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x691e0a) [0x20040691e0a]
16:36:49  #10: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xb2ad72) [0x20040b2ad72]
16:36:49  #11: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xbf30d8) [0x20040bf30d8]
16:36:49  #12: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xbf3c38) [0x20040bf3c38]
16:36:49  #13: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xbfcd60) [0x20040bfcd60]
16:36:49  #14: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0xb49ad4) [0x20040b49ad4]
16:36:49  #15: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x692962) [0x20040692962]
16:36:49  #16: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x14495e) [0x2004014495e]
16:36:49  #17: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x6a2164) [0x200406a2164]
16:36:49  #18: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x6a30ce) [0x200406a30ce]
16:36:49  #19: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x69f528) [0x2004069f528]
16:36:49  #20: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x6d43a2) [0x200406d43a2]
16:36:49  #21: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18d1cc) [0x2004018d1cc]
16:36:49  #22: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18e438) [0x2004018e438]
16:36:49  #23: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9prt29.so(+0x34546) [0x2003acb4546]
16:36:49  #24: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18b758) [0x2004018b758]
16:36:49  #25: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18be10) [0x2004018be10]
16:36:49  #26: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18a72e) [0x2004018a72e]
16:36:49  #27: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18ad12) [0x2004018ad12]
16:36:49  #28: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9jit29.so(+0x18adc2) [0x2004018adc2]
16:36:49  #29: /home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/build/linux-s390x-normal-server-release/jdk/lib/default/libj9prt29.so(+0x34546) [0x2003acb4546]
16:36:49  Unhandled exception
16:36:49  Type=Unhandled trap vmState=0x0005ff04
16:36:49  J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=fffffffa
16:36:49  Handler1=000002003A9CCC18 Handler2=000002003ACB33D8
16:36:49  gpr0=0000000000000000 gpr1=000002003BB94910 gpr2=0000000000000000 gpr3=0000000000001B24
16:36:49  gpr4=0000000000000005 gpr5=0000020000001B1D gpr6=000002003BB8C888 gpr7=0000020000000001
16:36:49  gpr8=0000020040D67494 gpr9=0000020040CDF06C gpr10=0000000000001187 gpr11=0000020040FB40D0
16:36:49  gpr12=000002003A199000 gpr13=0000020040D6C600 gpr14=00000200407317FE gpr15=000002003BB8C748
16:36:49  psw=000002003A190BC0 mask=0705200180000000 fpc=0008fe00 bea=000002003A357DA0
16:36:49  fpr0 4095000000000000 (f: 0.000000, d: 1.344000e+03)
16:36:49  fpr1 4118000000000000 (f: 0.000000, d: 3.932160e+05)
16:36:49  fpr2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:36:49  fpr3 47b73e5c00000000 (f: 0.000000, d: 3.089603e+37)
16:36:49  fpr4 4095000000000000 (f: 0.000000, d: 1.344000e+03)
16:36:49  fpr5 00000000a000c000 (f: 2684403712.000000, d: 1.326272e-314)
16:36:49  fpr6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:36:49  fpr7 3e3a35ce00000000 (f: 0.000000, d: 6.102532e-09)
16:36:49  fpr8 000000000c5114a8 (f: 206640288.000000, d: 1.020939e-315)
16:36:49  fpr9 000003feec4c8c90 (f: 3964439808.000000, d: 2.170638e-311)
16:36:49  fpr10 000000000c511330 (f: 206639920.000000, d: 1.020937e-315)
16:36:49  fpr11 0000000000892df0 (f: 8990192.000000, d: 4.441745e-317)
16:36:49  fpr12 000000000c5114b8 (f: 206640320.000000, d: 1.020939e-315)
16:36:49  fpr13 0000000000000001 (f: 1.000000, d: 4.940656e-324)
16:36:49  fpr14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
16:36:49  fpr15 0000000000892de0 (f: 8990176.000000, d: 4.441737e-317)
16:36:49  Module=/lib64/libpthread.so.0
16:36:49  Module_base_address=000002003A180000 Symbol=raise
16:36:49  Symbol_address=000002003A190B90
16:36:49  
16:36:49  Method_being_compiled=java/util/HashMap.putVal(ILjava/lang/Object;Ljava/lang/Object;ZZ)Ljava/lang/Object;
16:36:49  Target=2_90_20230607_1546 (Linux 3.10.0-1160.90.1.el7.s390x)
16:36:49  CPU=s390x (4 logical CPUs) (0x1ec5df000 RAM)
16:36:49  ----------- Stack Backtrace -----------
16:36:49  raise+0x30 (0x000002003A190BC0 [libpthread.so.0+0x10bc0])
16:36:49  _ZN2TR4trapEv+0x5e (0x00000200407317FE [libj9jit29.so+0x7317fe])
16:36:49  _ZN2TR9assertionEPKciS1_S1_z+0xba (0x00000200407319D2 [libj9jit29.so+0x7319d2])
16:36:49  _ZN3OMR4Node21setEvaluationPriorityEi+0x64 (0x0000020040710E9C [libj9jit29.so+0x710e9c])
16:36:49  _ZN3OMR13CodeGenerator20whichChildToEvaluateEPN2TR4NodeE+0x1ba (0x0000020040691E0A [libj9jit29.so+0x691e0a])
16:36:49  _ZN32TR_S390BinaryCommutativeAnalyser15genericAnalyserEPN2TR4NodeEN3OMR10InstOpCode8MnemonicES5_S5_bPNS0_11LabelSymbolENS3_1Z10InstOpCode19S390BranchConditionESA_+0x82 (0x0000020040B2AD72 [libj9jit29.so+0xb2ad72])
16:36:49  _Z37generateS390CompareAndBranchOpsHelperPN2TR4NodeEPNS_13CodeGeneratorEN3OMR1Z10InstOpCode19S390BranchConditionES7_RS7_PNS_11LabelSymbolE+0x2100 (0x0000020040BF30D8 [libj9jit29.so+0xbf30d8])
16:36:49  _Z22generateS390CompareOpsPN2TR4NodeEPNS_13CodeGeneratorEN3OMR1Z10InstOpCode19S390BranchConditionES7_+0x20 (0x0000020040BF3C38 [libj9jit29.so+0xbf3c38])
16:36:49  _Z25generateS390CompareBranchPN2TR4NodeEPNS_13CodeGeneratorEN3OMR10InstOpCode8MnemonicENS4_1Z10InstOpCode19S390BranchConditionES9_b+0x8b8 (0x0000020040BFCD60 [libj9jit29.so+0xbfcd60])
16:36:49  _ZN3OMR1Z13TreeEvaluator17ificmpeqEvaluatorEPN2TR4NodeEPNS2_13CodeGeneratorE+0x20c (0x0000020040B49AD4 [libj9jit29.so+0xb49ad4])
16:36:49  _ZN3OMR13CodeGenerator8evaluateEPN2TR4NodeE+0x152 (0x0000020040692962 [libj9jit29.so+0x692962])
16:36:49  _ZN2J913CodeGenerator22doInstructionSelectionEv+0xd4e (0x000002004014495E [libj9jit29.so+0x14495e])
16:36:49  _ZN3OMR12CodeGenPhase32performInstructionSelectionPhaseEPN2TR13CodeGeneratorEPNS1_12CodeGenPhaseE+0x7c (0x00000200406A2164 [libj9jit29.so+0x6a2164])
16:36:49  _ZN3OMR12CodeGenPhase10performAllEv+0x10e (0x00000200406A30CE [libj9jit29.so+0x6a30ce])
16:36:49  _ZN3OMR13CodeGenerator12generateCodeEv+0x60 (0x000002004069F528 [libj9jit29.so+0x69f528])
16:36:49  _ZN3OMR11Compilation7compileEv+0xd02 (0x00000200406D43A2 [libj9jit29.so+0x6d43a2])
16:36:49  _ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x4e4 (0x000002004018D1CC [libj9jit29.so+0x18d1cc])
16:36:49  _ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x3e8 (0x000002004018E438 [libj9jit29.so+0x18e438])
16:36:49  omrsig_protect+0x366 (0x000002003ACB4546 [libj9prt29.so+0x34546])
16:36:49  _ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x350 (0x000002004018B758 [libj9jit29.so+0x18b758])
16:36:49  _ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x1e8 (0x000002004018BE10 [libj9jit29.so+0x18be10])
16:36:49  _ZN2TR24CompilationInfoPerThread14processEntriesEv+0x42e (0x000002004018A72E [libj9jit29.so+0x18a72e])
16:36:49  _ZN2TR24CompilationInfoPerThread3runEv+0xc2 (0x000002004018AD12 [libj9jit29.so+0x18ad12])
16:36:49  _Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x9a (0x000002004018ADC2 [libj9jit29.so+0x18adc2])
16:36:49  omrsig_protect+0x366 (0x000002003ACB4546 [libj9prt29.so+0x34546])
16:36:49  _Z21compilationThreadProcPv+0x1fe (0x000002004018B25E [libj9jit29.so+0x18b25e])
16:36:49  thread_wrapper+0x114 (0x000002003AD85A14 [libj9thr29.so+0x5a14])
16:36:49  start_thread+0xea (0x000002003A188312 [libpthread.so.0+0x8312])
16:36:49   (0x000002003A40E232 [libc.so.6+0x10e232])
16:36:49  ---------------------------------------
16:36:49  JVMDUMP039I Processing dump event "gpf", detail "" at 2023/06/07 13:36:48 - please wait.
16:36:49  JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Build_JDK11_s390x_linux_Personal/make/core.20230607.133648.6941.0001.dmp' in response to an event
@dylanjtuttle
Copy link
Contributor Author

This assertion is found in OMR::Node::setEvaluationPriority(), a method which is used to set a node's evaluation priority, which can be used during code generation to determine the order in which nodes are evaluated.

This assert's condition is always false, so its firing is dependent on the if/else statement it sits in. The assert is in the else block of an if statement with condition:

_unionA._register == 0 || ((uintptr_t)(_unionA._register) & 1)

The _unionA._register field of a node is 0 if the node has not been evaluated and its priority is unknown, odd (i.e. the low bit is set) if the node has not been evaluated and its priority is known, and even if the node has already been evaluated. Therefore:

  • _unionA._register == 0 is true if this node has not yet been evaluated and its priority is unknown.
  • (uintptr_t)(_unionA._register) & 1 is true if this node has not yet been evaluated, but its priority is known (notice the mask checking the low bit).

That means if we make it into the else block, _unionA._register is non-zero, but also does not have its low bit set, meaning it is even and therefore this node has already been evaluated. Of course, if a node has already been evaluated, it makes no sense to set its evaluation priority.

OMR::Node::setEvaluationPriority() is called from two places, OMR::CodeGenerator::whichChildToEvaluate() and OMR::Node::getEvaluationPriority(), which itself is called in many other places. I'm having trouble reproducing this failure on my local Z VM, but once I can figure that out I'll be able to get a much better idea of what is causing this failure.

@0xdaryl
Copy link
Contributor

0xdaryl commented Jun 15, 2023

Just tagging @r30shah for his awareness.

@r30shah
Copy link
Contributor

r30shah commented Jun 15, 2023

If this helps, that call to complex genericAnalyser is coming from generateS390CompareAndBranchOpsHelper probably one of the call from [1]. That helper function is used to generate branch instructions based on the comparison of opcodes (Like if compares). @dylanjtuttle The failure also generates jitdump, may be you have that tree captured in the jitdump. In any case, let me know if you need any help with debugging this on Z.

[1]. https://github.com/eclipse/omr/blob/ebd1c96347609b37244329f7f07525fef331e318/compiler/z/codegen/OMRTreeEvaluator.cpp#L5209-L5236

@dylanjtuttle
Copy link
Contributor Author

Thanks @r30shah! A few updates:

  • The node that causes this failure is n284n, which is an ifacmpne (see the jitdump)
  • The failure occurs during instruction selection: see the very end of the attached jitdump:
============================================================
; Live regs: GPR=5 FPR=0 VRF=0 {&GPR_     0x2008ef773a0, &GPR_     0x2008ef77330, GPR_     0x2008ef772c0, &GPR_     0x2008ef77250, &GPR_     0x2008ef771e0}
------------------------------
 n284n    (  0)  ifacmpne --> block_15 BBStart at n280n (ProfiledGuard/VftTest )                      [     0x2008ec84880] bci=[0,0,-] rc=0 vc=767 vn=- li=5 udi=- nc=3 flg=0x1020
 n282n    (  1)    aloadi  <vft-symbol>[#322  Shadow] [flags 0x18607 0x0 ] (X!=0 )                    [     0x2008ec847e0] bci=[-1,48,-] rc=1 vc=767 vn=- li=5 udi=- nc=1 flg=0x4
 n611n    (  9)      ==>aRegLoad (in &GPR_     0x2008ef771e0) (X!=0 X>=0 SeenRealReference )
 n283n    (  1)    aconst 0x11c700 (java/util/WeakHashMap$Entry.class) (classPointerConstant X!=0 X>=0 X<=0 )  [     0x2008ec84830] bci=[0,0,-] rc=1 vc=767 vn=- li=5 udi=- nc=0 flg=0x10304
 n618n    (  1)    GlRegDeps ()                                                                       [     0x2008ec8b0e0] bci=[0,0,-] rc=1 vc=767 vn=- li=5 udi=- nc=5 flg=0x20
 n611n    (  9)      ==>aRegLoad (in &GPR_     0x2008ef771e0) (X!=0 X>=0 SeenRealReference )
 n612n    (  8)      ==>aRegLoad (in &GPR_     0x2008ef77250) (X>=0 SeenRealReference )
 n613n    (  5)      ==>iRegLoad (in GPR_     0x2008ef772c0) (cannotOverflow SeenRealReference )
 n614n    (  5)      ==>aRegLoad (in &GPR_     0x2008ef77330)
 n615n    (  5)      ==>aRegLoad (in &GPR_     0x2008ef773a0)
------------------------------
aload reg contains ref: 0
</recompilation rc=0 queued=1>
</jitDump>

note the offending node n284n.

As I mentioned in my previous comment, setEvaluationPriority() is called from OMR::CodeGenerator::whichChildToEvaluate() and getEvaluationPriority(). We can eliminate getEvaluationPriority() as a suspect because we only reach the call if _unionA._register == 0, which we already know (from my previous comment) isn't true since we reached the assert. That means the error likely occurs either in whichChildToEvaluate() or something that calls it.

@dylanjtuttle
Copy link
Contributor Author

dylanjtuttle commented Jun 15, 2023

whichChildToEvaluate() loops through all of the children of a node and returns the index of the child with the highest priority. Right before it returns that index, it sets the priority of the node to one more than the priority of that highest priority child. Normally this wouldn't be an issue, since (I assume) we evaluate nodes in a post-order fashion. But in order for us to fail this assert, it seems like we must have gotten into a situation where we evaluated a node before at least one of its children. I think this assertion failure is trying to tell us that we're generating code for an ifacmpne instruction, but we haven't evaluated the values that we're supposed to compare or the address we're supposed to jump to yet?

@dylanjtuttle
Copy link
Contributor Author

I still haven't been able to find anything conclusive, but I thought I would give an update since I have made some interesting discoveries.

I've been having issues reproducing this assert on my Z VM, it would fire during the build but not when I guarded it with an environment variable (a method suggested by Brad here that I've been using without issues until now). After spending a few days trying to fix this, I eventually went back to my old method of continuously rebuilding until the assert fires, but now I know that I can check the jitdump and add extra debug statements to it. Here is the result of that, with some extra comments I've added after the fact:

[     0x3ff42376f60]                          fence   Relative [ 000003FF42070240 ] BBStart <block_5> (frequency 2775) (extension of previous block) (in loop 0)

============================================================
; Live regs: GPR=5 FPR=0 VRF=0 {GPR_     0x3ff58574260, &GPR_     0x3ff585741f0, &GPR_     0x3ff58574180, &GPR_     0x3ff58574110, &GPR_     0x3ff585740a0}
------------------------------
 n303n    (  0)  ifacmpne --> block_15 BBStart at n299n (ProfiledGuard/VftTest )                      [     0x3ff58283e70] bci=[0,0,-] rc=0 vc=764 vn=- li=5 udi=- nc=3 flg=0x1020
 n301n    (  1)    aloadi  <vft-symbol>[#322  Shadow] [flags 0x18607 0x0 ] (X!=0 )                    [     0x3ff58283dd0] bci=[-1,48,-] rc=1 vc=764 vn=- li=5 udi=- nc=1 flg=0x4
 n631n    (  9)      ==>aRegLoad (in &GPR_     0x3ff58574180) (X!=0 X>=0 SeenRealReference )
 n302n    (  1)    aconst 0xdfd700 (java/util/WeakHashMap$Entry.class) (classPointerConstant X!=0 X>=0 X<=0 )  [     0x3ff58283e20] bci=[0,0,-] rc=1 vc=764 vn=- li=5 udi=- nc=0 flg=0x10304
 n636n    (  1)    GlRegDeps ()                                                                       [     0x3ff5828a680] bci=[0,0,-] rc=1 vc=764 vn=- li=5 udi=- nc=5 flg=0x20
 n629n    (  5)      ==>aRegLoad (in &GPR_     0x3ff585740a0)
 n630n    (  5)      ==>aRegLoad (in &GPR_     0x3ff58574110)
 n631n    (  9)      ==>aRegLoad (in &GPR_     0x3ff58574180) (X!=0 X>=0 SeenRealReference )
 n632n    (  8)      ==>aRegLoad (in &GPR_     0x3ff585741f0) (X>=0 SeenRealReference )
 n633n    (  5)      ==>iRegLoad (in GPR_     0x3ff58574260) (cannotOverflow SeenRealReference )
------------------------------
<!--Evaluation of offending node-->
EVALUATING NODE n303n

<!--Evaluation of node's first child-->
EVALUATING NODE n301n

<!--Call to checkAndAllocateReferenceRegister() in Z TreeEvaluator-->
aload reg contains ref: 0

<!--Evaluation of node's first grandchild-->
EVALUATING NODE n631n

<!--Evaluation of node's second child-->
EVALUATING NODE n302n

<!--A second evaluation of node's first child?-->
EVALUATING NODE n301n

<!--Call to setEvaluationPriority on node's third child-->
NODE n636n IN setEvaluationPriority, _register: 0x00000000
isAssert is NOT NULL, assert should be enabled... node n636n

<!--A second call to setEvaluationPriority on node's third child-->
NODE n636n IN setEvaluationPriority, _register: 0x00000001
isAssert is NOT NULL, assert should be enabled... node n636n

<!--The offending call to setEvaluationPriority on node-->
whichChildToEvaluate: calling setEvaluationPriority on node n303n
NODE n303n IN setEvaluationPriority, _register: 0x582b38e0
isAssert is NOT NULL, assert should be enabled... node n303n
NODE n303n HAS ALREADY BEEN EVALUATED
</recompilation rc=0 queued=1>
</jitDump>

There are a few things to take away from this. First, the offending node which causes the assertion has a different global index (n303n) than the one I've identified from previous builds on Jenkins (n284n). More interestingly, it's also different from the one that shows up in the stack trace when I modify the assertion message:

Assertion failed at /root/code/openj9-openjdk-jdk11/omr/compiler/il/OMRNode.cpp:4505: isAssert == NULL
VMState: 0x0005ff04
	setEvaluationPriority cannot be called after the node (n284n) has already been evaluated
compiling java/util/WeakHashMap.get(Ljava/lang/Object;)Ljava/lang/Object; at level: warm

As it turns out (thanks you @bhavanisn for the help), this WeakHashMap.get method gets recompiled. During the first compilation, we have the one and only occurrence of the infamous n284n:

n55n      BBStart <block_5> (freq 2775) (extension of previous block) (in loop 0)             [     0x3ffa287f0f0] bci=[-1,47,-] rc=0 vc=764 vn=- li=5 udi=- nc=0
n284n     ifacmpne --> block_15 BBStart at n280n (ProfiledGuard/VftTest )                     [     0x3ffa2883880] bci=[0,0,-] rc=0 vc=764 vn=- li=5 udi=- nc=3 flg=0x1020
n282n       aloadi  <vft-symbol>[#322  Shadow] [flags 0x18607 0x0 ] (X!=0 )                   [     0x3ffa28837e0] bci=[-1,48,-] rc=1 vc=764 vn=- li=5 udi=- nc=1 flg=0x4
n612n         ==>aRegLoad
n283n       aconst 0x1ea1700 (java/util/WeakHashMap$Entry.class) (classPointerConstant X!=0 X>=0 X<=0 )  [     0x3ffa2883830] bci=[0,0,-] rc=1 vc=764 vn=- li=5 udi=- nc=0 flg=0x10304
n617n       GlRegDeps ()                                                                      [     0x3ffa288a090] bci=[0,0,-] rc=1 vc=764 vn=- li=5 udi=- nc=5 flg=0x20
n610n         ==>aRegLoad
n611n         ==>aRegLoad
n612n         ==>aRegLoad
n613n         ==>aRegLoad
n614n         ==>iRegLoad
n277n     BBEnd </block_5>                                                                    [     0x3ffa2883650] bci=[0,1,-] rc=0 vc=764 vn=- li=5 udi=- nc=0

an ifacmpne inside block 5 which may branch to block 15, just like in my previous comments. Our other infamous node n303n appears in the second compilation, and as you can see in the excerpt I pasted above, it is also an ifacmpne inside block 5 which may branch to block 15.

  • If the global index of a node is changing, why does traceMsg print the new index, but the assertion message print the old index?
  • n303n does get evaluated, but setEvaluationPriority doesn't get called twice (the line NODE n303n IN setEvaluationPriority only appears once in the jitdump). I can see other nodes getting evaluated without first having their priority set. If n303n was already evaluated without needing to have its priority set first, why would this function ever be called on this type of node in the first place?
  • Node n636n has two values of _unionA._register when setEvaluationPriority is called on it. First 0 (not yet evaluated, priority unknown), and then 1 (not yet evaluated, priority = 1). However, when setEvaluationPriority is called on n303n, it has a value of 1479227616 base 10 (evaluated). As far as I know, this could be a perfectly normal priority, but it doesn't follow the pattern of simple small integers, so I'm a little suspicious. Is it possible we're accessing uninitialized memory? The number does always end up being even (which is what causes the assertion to fail), so maybe not.

@r30shah
Copy link
Contributor

r30shah commented Jun 21, 2023

Hi @dylanjtuttle with respect to node numbers, are you checking the glibal index from the jit dump and matchi git with the assert message?

If this is the case, a jitdump is a different compilation than the failing one. As you fail with assert during jit compilation, it triggers recompilation of the method with tracing to provide more diagnostic information.

Now coming to ASSERT, I did not get enough time to check out the jitdump, but on surface, calling on set evaluation priority on branch node is something I have difficult time understanding.
One suggestion I would give to checkout in parallel is to check if similar analyzer is called on x86 on ifacmp... node.

I will checkout jitdump you provided tomorrow and touch base with you.

@dylanjtuttle
Copy link
Contributor Author

dylanjtuttle commented Jun 21, 2023

Hey @r30shah, I was matching the global index from the jitdump to the assert message. Thanks for the clarification!

As for setEvaluationPriority, I ran a few methods on x86 to see when it gets called and on what type of nodes. I learned that setEvaluationPriority can get called on ifacmpne, so that part probably isn't the problem:

============================================================
; Live regs: GPR=4 FPR=0 VRF=0 {&GPR_0146, &GPR_0130, GPR_0129, &GPR_0128}
------------------------------
 n131n    (  0)  ifacmpne --> block_9 BBStart at n129n ()                                             [0x7fb6ae4eb8b0] bci=[-1,81,632] rc=0 vc=1183 vn=- li=8 udi=- nc=3 flg=0x20
 n123n    (  2)    ==>l2a (in &GPR_0146)
 n648n    (  2)    ==>aRegLoad (in &GPR_0128) (SeenRealReference )
 n655n    (  1)    GlRegDeps ()                                                                       [0x7fb6ae4f5c70] bci=[-1,81,632] rc=1 vc=1183 vn=- li=8 udi=- nc=4 flg=0x20
 n654n    (  1)      PassThrough rdi                                                                  [0x7fb6ae4f5c20] bci=[-1,74,632] rc=1 vc=1183 vn=- li=8 udi=- nc=1
 n123n    (  2)        ==>l2a (in &GPR_0146)
 n648n    (  2)      ==>aRegLoad (in &GPR_0128) (SeenRealReference )
 n649n    (  1)      ==>iRegLoad (in GPR_0129) (cannotOverflow SeenRealReference )
 n650n    (  2)      ==>aRegLoad (in &GPR_0130) (X!=0 X>=0 SeenRealReference )
------------------------------
EVALUATING node n131n
setEvaluationPriority on node n655n, _unionA._register = 0x0000000000000000
Updated register for node n655n, _unionA._register = 0x0000000000000001
setEvaluationPriority on node n654n, _unionA._register = 0x0000000000000000
Updated register for node n654n, _unionA._register = 0x0000000000000001
setEvaluationPriority on node n654n, _unionA._register = 0x0000000000000001
Updated register for node n654n, _unionA._register = 0x0000000000000003
setEvaluationPriority on node n655n, _unionA._register = 0x0000000000000001
Updated register for node n655n, _unionA._register = 0x0000000000000005
Calling setEvaluationPriority on node n131n...
setEvaluationPriority on node n131n, _unionA._register = 0x0000000000000000
Updated register for node n131n, _unionA._register = 0x0000000000000007
EVALUATING node n648n
EVALUATING node n123n
EVALUATING node n655n
EVALUATING node n654n
EVALUATING node n123n
EVALUATING node n648n
EVALUATING node n649n
EVALUATING node n650n

It is definitely odd that according to my debug statements, this ifacmpne node n131n seems to get evaluated before its evaluation priority is set, but note that the value of _unionA._register is 0 when the call happens (which means that the node hasn't been evaluated and its priority is unknown) and then 5 after the method finishes (which means that the node hasn't been evaluated and its priority is 5). You could probably chalk up the weird order of the print statements to concurrency weirdness?

In any case, I think the important thing to take away here is the value of _unionA._register. I switched its format specifier from 0x%08x to 0x%p and cast it to a void pointer, just to see what happened (in case it was confusing why there were a different number of digits than in the last jitlog). Regardless of its format specifier, every value of _unionA._register that I've seen except for n303n has a value of zero or a simple, small odd number, like 1, 3, or 5. n303n has a different value every run, but it's always a large even number. With the new format specifier, it looks like 0x000003FF404B4930.

I am now essentially certain that _unionA._register is either getting set/initialized incorrectly, or isn't getting set/initialized at all, and this is the reason for the assertion failure.

@r30shah
Copy link
Contributor

r30shah commented Jun 21, 2023

I am now essentially certain that _unionA._register is either getting set/initialized incorrectly, or isn't getting set/initialized at all, and this is the reason for the assertion failure.

Yes, but I would like to know how it is getting set. One way it can get set is by calling a setRegister on the node [1], but if you see that function, in the beginning it has a Fatal Assert for someone calling it for branch node so definitely it is not set that way. Now when node is created, the _register field in the unionA should be NULL.

Btw I can get it failing on my environment with debug build and checking out the _register field when we fail with Assert, it is not NULL, but I am not sure it contains a valid data as well.

(gdb) p _unionA._register
$5 = (TR::Register *) 0x3ffda528550
(gdb) p *( _unionA._register)
$6 = {<OMR::Z::Register> = {<OMR::Register> = {_vptr.Register = 0x0, _flags = {_flags = 1023},
      _backingStorage = 0x100000007, _pinningArrayPointer = 0x403ff00000000, _assignedRegister = 0x3ffda45fa00,
      _siblingRegister = 0x3ffda49d320, _startOfRange = 0x0, _endOfRange = 0x400000000,
      _startOfRangeNode = 0x12c900, _totalUseCount = 0, _futureUseCount = 0, _outOfLineUseCount = 0,
      _association = 0, _kind = 1023, _index = 3679438272}, _liveRegisterInfo = {
      _liveRegister = 0x100000000000000, _interference = 16777216}, _memRef = 0x0}, <No data fields>}

I created a small unit test that could have ProfiledGuard/VftTest and was able to reproduce this 100% of the time. You can also create one (Or I can provide you one offline) by having a virtual calls in your unit test method can call that method enough times such that inliner inlines the method in higher opt level with profiled-guard test.

[1]. https://github.com/eclipse/omr/blob/59b55e309607f81e20ae068c62c06d9b36359d04/compiler/il/OMRNode.cpp#L4428

@dylanjtuttle
Copy link
Contributor Author

dylanjtuttle commented Jun 21, 2023

I think it's actually not getting set. The value returned by virtualGuardInfo() [1] is non-null, which means that _unionA is actually representing a guard, not a register. Therefore, trying to access _unionA._register would sensibly return garbage uninitialized memory.***

[1]. virtualGuardInfo()

*** I was incorrect about this, this is not how unions work!

@r30shah
Copy link
Contributor

r30shah commented Jun 21, 2023

Yes, So in failing case, it is reading the memory content that actually represents a valid TR_VirtualGuard* and reads it as TR::Register*. Looking at the test, I see that it check if low bit is tagged or not so if node is not evaluated or priority unknown, I think testing _unionA._register == 0 is incorrect in first place, and wonder it should be _unionA._register & 1 == 0? I incorrectly interpreted the comments in the code. This comparison would not work.

@dylanjtuttle
Copy link
Contributor Author

dylanjtuttle commented Jun 21, 2023

I don't think changing the tests is the solution, because _unionA._register isn't real data, it's just garbage (_unionA._guard has the real data in it). Maybe we could check if virtualGuardInfo() == NULL first, but*** I don't think we should be calling setEvaluationPriority on this type of node at all. setEvaluationPriority is getting called from OMR::CodeGenerator::whichChildToEvaluate, but I don't know where that is getting called from yet. That's probably the next thing to figure out.

*** I was still incorrect about this, this is still not how unions work!

@bhavanisn
Copy link
Contributor

@r30shah In this case where it fails is evaluating an if and in setRegister it is just returning NULL in case of if. Would it help if we set _unionA._register =NULL before the return. This way it will be initialized.

@r30shah
Copy link
Contributor

r30shah commented Jun 21, 2023

@r30shah In this case where it fails is evaluating an if and in setRegister it is just returning NULL in case of if. Would it help if we set _unionA._register =NULL before the return. This way it will be initialized.

@bhavanisn can you point where you want to set this to NULL?

Actually now taking a detailed look at how _unionA._register is used, I think even #17559 (comment) would not work. A 0 in the _register means, that node is not evaluated and priority is unknown. A low bit tagged 1 means, the node is not evaluated and this field contains a priority. In rest of the case, it would hold a valid TR::Register * meaning node is evaluated.

Now, taking a step back before jumping into the fix, I would like to understand why there is a behavioral difference for profiled guard (Or virtual guards) evaluation between Z and X. Especially on Z, when it fails, bytes in the Union represents seemed a valid Virtual Guard. Looking at the #17559 (comment), it is actually NULL on X. We should understand this difference first.

@bhavanisn
Copy link
Contributor

@r30shah I was thinking to set _unionA.register=NULL here.

TR::Register *
OMR::Node::setRegister(TR::Register *reg)
   {
   if (_opCode.isIf())
      {
      TR_ASSERT_FATAL_WITH_NODE(self(), reg == NULL, "if node with register");
  <<<<Set some initial value for _unionA.register here>>>>
      return NULL;
      }

@r30shah
Copy link
Contributor

r30shah commented Jun 21, 2023

@bhavanisn Ahh, I see, okk, that could work, though I would like to understand why on X it is NULL and on Z it is not.

@dylanjtuttle dylanjtuttle self-assigned this Jul 21, 2023
@hzongaro
Copy link
Member

Adding this belatedly to the 0.41 release - it would be good to at least be able to build sanity.functional with PROD_WITH_ASSUMES, even if there are still crashes running it.

@hzongaro
Copy link
Member

The problem can also be reproduced on x - removing the arch:z label

@hzongaro hzongaro removed the arch:z label Aug 28, 2023
@r30shah
Copy link
Contributor

r30shah commented Aug 28, 2023

If this is reproducible on x as well, I think we should think in the direction @bhavanisn suggested in #17559 (comment).

i.e, we should initialize the value before returning NULL. Though with what, that is something we should check as I see in the code base, interpretation of the _unionA.register is different in case if it not evaluated / evaluated.

@hzongaro
Copy link
Member

I'm worried that there is no guarantee _unionA._guard will not be used during code generation. Taking a look at this, I wonder whether a better approach would be to have OMR::Node::setEvaluationPriority test isIf() as well - or perhaps having it check ILOpCode::isTreeTop(). Would setting the evaluation priority ever make sense for a treetop node?

@r30shah
Copy link
Contributor

r30shah commented Aug 28, 2023

I'm worried that there is no guarantee _unionA._guard will not be used during code generation. Taking a look at this, I wonder whether a better approach would be to have OMR::Node::setEvaluationPriority test isIf() as well - or perhaps having it check ILOpCode::isTreeTop(). Would setting the evaluation priority ever make sense for a treetop node?

True, looking at the code-gen helpers where we would need this information. Most of the code-gens would call findVirtualGuardInfo [1], which would call the virtualGuardInfo [2]. So You are right, initializing to some value would still cause issue as code-gen do rely on that information shared in the union.

Further thinking, if we should just check isIf or actually skip setting evaluation priority on tree-top node. You are right, it does not make sense to set evaluation priority on the tree-top node.

[1]. https://github.com/eclipse/omr/blob/9659cbcdcef670fda03609c058e5cb04c58c6511/compiler/compile/OMRCompilation.cpp#L1553C19-L1553C39
[2]. https://github.com/eclipse/omr/blob/9659cbcdcef670fda03609c058e5cb04c58c6511/compiler/il/OMRNode.cpp#L6759

@bhavanisn
Copy link
Contributor

If this is reproducible on x as well, I think we should think in the direction @bhavanisn suggested in #17559 (comment).

i.e, we should initialize the value before returning NULL. Though with what, that is something we should check as I see in the code base, interpretation of the _unionA.register is different in case if it not evaluated / evaluated.

Sometime back @dylanjtuttle did try initializing the _unionA.register to NULL but it still caused the issue. It is right to look at other options as suggested by @r30shah

@hzongaro
Copy link
Member

I'm going to move this to the 0.43 release, so you won't have to "double deliver" any fix.

@dylanjtuttle
Copy link
Contributor Author

dylanjtuttle commented Sep 13, 2023

On my local x86 VM, I replaced the TR_ASSERT with a print statement that prints the opcode, the global index, and whether the node is a treetop, since if this assert condition is satisfied more than once. As it turns out, it is. Here's a small excerpt from the console log:

setEvaluationPriority: node (n96n), opcode is ifacmpne, node is a tree top
setEvaluationPriority: node (n739n), opcode is ifacmpne, node is a tree top
setEvaluationPriority: node (n758n), opcode is ifacmpne, node is a tree top
setEvaluationPriority: node (n219n), opcode is ifacmpne, node is a tree top
setEvaluationPriority: node (n389n), opcode is ifacmpne, node is a tree top
setEvaluationPriority: node (n351n), opcode is ifacmpne, node is a tree top

The exact number of occurrences seems to vary based on how much code is being compiled during the JVM build, but the highest number I've gotten is 2,818 individual situations in which this assert "would" fire if it were enabled. There is a subset of repeat nodes that have setEvaluationPriority called on them two, three, or more times, but the majority are unique. I wrote a python script which verifies that every single instance involves an ifacmpne node which is a tree top.

It would probably be useful to do this analysis on Z as well just to make sure we see the same results, but it seems pretty conclusive that this problem occurs specifically and exclusively for tree top ifacmpne nodes. With that in mind, not calling setEvaluationPriority if isIf and isTreeTop are true like Rahil @r30shah was suggesting seems like a plausible solution.

@dylanjtuttle
Copy link
Contributor Author

dylanjtuttle commented Sep 13, 2023

The nodes that trigger this assertion failure (on x86) follow the path:

  • OMR::X86::TreeEvaluator::integerIfCmpneEvaluator
  • OMR::X86::TreeEvaluator::compareIntegersForEquality
  • TR_X86CompareAnalyser::integerCompareAnalyser
  • OMR::CodeGenerator::whichChildToEvaluate
  • OMR::Node::setEvaluationPriority

whichChildToEvaluate is called from integerCompareAnalyser because both children of the compare node need to be evaluated in order to compare them, so we need to decide which child to evaluate first. That makes perfect sense. setEvaluationPriority is called on the compare node from whichChildToEvaluate because once the evaluation priority of the children is set, we set the evaluation priority of their parent to be one more than the priority of the highest priority child. That makes perfect sense if the parent is, for example, an arithmetic or boolean operator, but doesn't make sense if the parent is a tree top ifacmpne node.

Thus, I propose this is where we add the check to ensure isIf and isTreeTop are false before we call setEvaluationPriority on the parent.

That is, change OMR::CodeGenerator::whichChildToEvaluate, line 298 to:

...
if (!node->getOpCode().isIf() && !node->getOpCode().isTreeTop())
   {
   node->setEvaluationPriority(nodePriority);
   }
return bestChild;

Is it worth trying to figure out if there are any other cases in which we shouldn't call setEvaluationPriority?

@hzongaro
Copy link
Member

Is it worth trying to figure out if there are any other cases in which we shouldn't call setEvaluationPriority?

It seems that the only real restriction on _unionA._register is that it not be used by nodes that have an opcode for which isIf() is true. I don't think it's worth checking for other cases.

Although I had said earlier that setting the evaluation priority might not make sense for nodes for which isTreeTop() is true, there are places where a node that has isTreeTop() might not be at the top of a tree - for instance, such a node can appear as a child of a compressedrefs node. I don't know whether there are — or could be in the future — cases where a node for which isTreeTop() is true could appear as a child with sibling nodes. In such a case, setting evaluation priority might make sense. Given that, I would suggest erring on the side of caution by testing only for isIf() before calling setEvaluationPriority.

@dylanjtuttle
Copy link
Contributor Author

That makes sense! I will test that change and report back.

@dylanjtuttle
Copy link
Contributor Author

I've tested the change and found a variety of CRIU failures on power and Z. On Z, it seems like the tests are failing because CRIU is not enabled, which seems strange because not all of the CRIU tests are failing. On power, the failures seem to mostly involve the following error messages:

  • Could not dump the JVM processes
  • java.io.IOException: Cannot run program "gdb"
  • Assertion failed at /home/jenkins/workspace/Build_JDK11_ppc64le_linux_Personal/openj9/runtime/compiler/control/CompilationThread.cpp:2929: !isCheckpointInProgress()

@hzongaro
Copy link
Member

@dylanjtuttle, I think it's very unlikely that those failures would be related to your change.

@dylanjtuttle
Copy link
Contributor Author

Daryl agreed when I spoke to him yesterday, but wanted to review the change. I'm going to open a PR and see what he thinks.

@hzongaro
Copy link
Member

hzongaro commented Sep 22, 2023

From an off-line conversation, I understand @0xdaryl had some concerns about preventing setEvaluationPriority from being called for any node for which isIf() is true.

I spent a little time looking at the history of the _unionA field, and learned that the _guard field was only added to _unionA last year. I had been assuming that it had a much longer history, and that it really was expected that the evaluation priority shouldn't be used for in "if" node. I see I was mistaken, so perhaps a different reworking would be more appropriate - though I don't want to suggest something drastic that would involve pulling _guard out of _unionA. . . .

@hzongaro
Copy link
Member

Fixed by OMR pull request eclipse-omr/omr#7492. Closing.

Copy link

Issue Number: 17559
Status: Closed
Actual Components: comp:jit
Actual Assignees: No one :(
PR Assignees: No one :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants