Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IProfiler assertion failure while building JDK with compiler in debug mode: some entries were still locked on shutdown #20344

Closed
cjjdespres opened this issue Oct 15, 2024 · 4 comments · Fixed by #20375

Comments

@cjjdespres
Copy link
Contributor

cjjdespres commented Oct 15, 2024

I tried building the compiler in debug mode, with the environment variable OPENJ9_JAVA_OPTIONS='-Xjit:ignoreAssert -Xaot:ignoreAssert' set so that the full JDK would at least be built, and got the following assertion failure:

=== Output from failing command(s) repeated here ===
* For target support_images_jmods__create_java.sql.rowset.jmod_exec:
Assertion failed at /home/despresc/dev/openj9-openjdk-jdk21/openj9/runtime/compiler/control/HookedByTheJit.cpp:4164: lockedEntries == 0
	some entries were still locked on shutdown

Unhandled exception
Type=Unhandled trap vmState=0x00040000
J9Generic_Signal_Number=00000108 Signal_Number=00000005 Error_Value=00000000 Signal_Code=fffffffa
Handler1=00007FB78330DE30 Handler2=00007FB782E80480
RDI=0000000000000002 RSI=00007FB7894C8AB0 RAX=0000000000000000 RBX=0000000000000005
RCX=00007FB788C7FBAF RDX=0000000000000000 R8=0000000000000000 R9=00007FB7894C8AB0
R10=0000000000000008 R11=0000000000000246 R12=00007FB7894CA248 R13=00007FB7894CA270
R14=00007FB7894CA270 R15=00007FB78847D018
RIP=00007FB788C7FBAF GS=0000 FS=0000 RSP=00007FB7894C8AB0
EFlags=0000000000000246 CS=0033 RBP=0000000000000000 ERR=0000000000000000
TRAPNO=0000000000000000 OLDMASK=0000000000000000 CR2=0000000000000000
xmm0=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1=0000003000000010 (f: 16.000000, d: 1.018558e-312)
xmm2=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm3=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm4=00000000000000ff (f: 255.000000, d: 1.259867e-321)
xmm5=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm6=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm7=00007fb7894ca810 (f: 2303502336.000000, d: 6.937979e-310)
xmm8=4022000000000000 (f: 0.000000, d: 9.000000e+00)
xmm9=4022aaaaaaaaaaab (f: 2863311616.000000, d: 9.333333e+00)
xmm10=0014195b1403141f (f: 335746080.000000, d: 2.795116e-308)
xmm11=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14=0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15=0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/lib64/libpthread.so.0
Module_base_address=00007FB788C6D000 Symbol=raise
Symbol_address=00007FB788C7FAA0
Target=2_90_20241008_000000 (Linux 4.18.0-553.22.1.el8_10.x86_64)
CPU=amd64 (8 logical CPUs) (0x7c7916000 RAM)
----------- Stack Backtrace -----------
raise+0x10f (0x00007FB788C7FBAF [libpthread.so.0+0x12baf])
_ZN2TR4trapEv+0x40 (0x00007FB78156DD40 [libj9jit29.so+0x942d40])
 (0x00007FB78156DECC [libj9jit29.so+0x942ecc])
 (0x00007FB78156DE8A [libj9jit29.so+0x942e8a])
JitShutdown+0x2fe (0x00007FB780E11E7E [libj9jit29.so+0x1e6e7e])
J9VMDllMain+0x20a0 (0x00007FB780E0CEE0 [libj9jit29.so+0x1e1ee0])
runJ9VMDllMain+0x121 (0x00007FB78332AFA1 [libj9vm29.so+0x5bfa1])
runShutdownStage+0x10d (0x00007FB7833230CD [libj9vm29.so+0x540cd])
exitJavaVM+0x1fd (0x00007FB78332214D [libj9vm29.so+0x5314d])
JVM_Halt_Impl+0x63 (0x00007FB77FDC28D3 [libjclse29.so+0x6a8d3])
ffi_call_unix64+0x52 (0x00007FB78348803A [libj9vm29.so+0x1b903a])
ffi_call_int+0x39a (0x00007FB78348790A [libj9vm29.so+0x1b890a])
_ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x162cb (0x00007FB78338AEBB [libj9vm29.so+0xbbebb])
bytecodeLoopCompressed+0xd8 (0x00007FB783374BE8 [libj9vm29.so+0xa5be8])
 (0x00007FB78341CEC2 [libj9vm29.so+0x14dec2])

I thought it was a little strange that there was an assertion failure at all, but it seems that in the definition in OMR:

https://github.com/eclipse/omr/blob/46f74f6599f42ec07c72791f752738d10f814fd1/compiler/infra/Assert.cpp#L142

only the options of the current TR::Compilation are checked, and this assert evidently happened at shutdown, outside of any compilation. That function should probably check the top-level TR::Options::getCmdLineOptions() as a fallback if the comp doesn't exist, but that's a separate issue.

This is the first time I've rebased onto the latest master in a few weeks, so this may have arisen due to some commit in that time frame. Though, I've not built with -Xjit:ignoreAssert -Xaot:ignoreAssert before, so this may be older.

Copy link

Issue Number: 20344
Status: Open
Recommended Components: comp:vm, comp:gc, comp:build
Recommended Assignees: jasonfengj9, tajila, keithc-ca

@cjjdespres
Copy link
Contributor Author

My guess is that this is very rare, and I can't reproduce it. The core that I got from the failure is unhelpful as well, since the assert is located just after the releaseAllEntries() call, which releases all the locks in the entries, making it so that I can't even tell what entries were still locked at shutdown.

@cjjdespres
Copy link
Contributor Author

It definitely seems like the locks on entries are always released if acquired. The only time a lock isn't released in the scope of a lockEntry() is in TR_IPBCDataCallGraph::canBePersisted() (and canBeSerialized, but that should only run if JITServer is enabled), and that will have happened only if canBePersisted returned IPBC_ENTRY_CAN_PERSIST.

So maybe what had to have happened was:

  1. Thread A calls findOrCreateEntry, looking for a profiling entry for a particular pc. One is not found.
  2. Thread B calls findOrCreateEntry for the same pc, and again a profiling entry is not found.
  3. Thread A creates an entry for pc and adds it to the hash table.
  4. Thread C calls TR_IProfiler::persistIprofileInfo, finds pc and the entry from (3) in walkILTreeForEntries, sees that it is valid and can be persisted, and adds pc to the pcEntries array it's accumulating. At this point the entry from (3) is locked.
  5. Thread D creates an entry for a PC in the same bucket as pc.
  6. Thread B creates a new entry for pc and adds it to the hash table.
  7. Thread C gets to the point where the entries in pcEntries are released and uses profilingSample to look up the entry for pc. It finds the entry from (6) and "releases" it, which of course does nothing since that entry isn't locked. The entry from (3) remains locked.

If that's the case, the assert probably isn't an indication that anything is too wrong. At least, the situation is no worse than the race between threads A and B above, which is known and documented in a comment in findOrCreateEntry. I think the lock remaining set will only mean that entry (3) will never be eligible for storage in the SCC or serialization for delivery to a JITServer. Also, that entry will be shadowed by entry (6) for any search of the hashtable that happens after (6) gets added, so this race will not be visible for very long.

Copy link

Issue Number: 20344
Status: Closed
Actual Components: None :(
Actual Assignees: No one :(
PR Assignees: cjjdespres

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant