Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX SIGQUIT heap dump trigger failed and caused crash with 'Invalid JIT return address' #20036

Closed
RobMcCreadyJHA opened this issue Aug 22, 2024 · 13 comments

Comments

@RobMcCreadyJHA
Copy link

Java -version output

java 11.0.21 2023-10-17
IBM Semeru Runtime Certified Edition 11.0.21.0 (build 11.0.21+9)
Eclipse OpenJ9 VM 11.0.21.0 (build openj9-0.41.0, JRE 11 AIX ppc64-64-Bit Compressed References 20231122_711 (JIT enabled, AOT enabled)
OpenJ9 - 461bf3c
OMR - 5eee6ad9d
JCL - 6f4cc08025 based on jdk-11.0.21+9)

AIX LPAR on a Power10
M/C Model Name : 9105-22A
Firmware ver. : FW1030.30 (ML1030_065)
AIX version : 7200-05-07-2346

Summary of problem

Java service is configured with '-Xdump:java+heap:events=user'. While troubleshooting an issue we executed a kill -3 to get a heap dump and the service instead crashed with the following output. We have used kill -3 in the past without issue on this service.

JVMDUMP039I Processing dump event "user", detail "" at 2024/08/21 16:31:11 - please wait.
JVMDUMP027W The requested heapdump has not been produced because another component is holding the VM exclusive lock.
JVMDUMP032I JVM requested Java dump using 'javacore.*****.txt' in response to an event

*** Invalid JIT return address 00000007C2958870 in 00000001100A79A0

Diagnostic files

From the core dump file:

+3530 NULL
+3531 3XMTHREADINFO Anonymous native thread
+3532 3XMTHREADINFO1 (native thread ID:0x698024D, native priority: 0x0, native policy:UNKNOWN)
+3533 3XMTHREADINFO3 Native callstack:
+3534 4XENATIVESTACK omrintrospect_threads_startDo_with_signal+0x678 (0x0900000010F18C7C [libj9prt29.so+0x7ec7c])
+3535 4XENATIVESTACK protectedStartDo+0x2c (0x0900000010FBE890 [libj9dmp29.so+0x2a890])
+3536 4XENATIVESTACK omrsig_protect+0x4fc (0x0900000010EF91E0 [libj9prt29.so+0x5f1e0])
+3537 4XENATIVESTACK _ZN18JavaCoreDumpWriter28writeThreadsWithNativeStacksEv+0xc7c (0x0900000010FBE380 [libj9dmp29.so+0x2a380])
+3538 4XENATIVESTACK protectedWriteThreadsWithNativeStacks+0x10 (0x0900000010FBBAD4 [libj9dmp29.so+0x27ad4])
+3539 4XENATIVESTACK omrsig_protect+0x4fc (0x0900000010EF91E0 [libj9prt29.so+0x5f1e0])
+3540 4XENATIVESTACK _ZN18JavaCoreDumpWriter18writeThreadSectionEv+0x468 (0x0900000010FB73AC [libj9dmp29.so+0x233ac])
+3541 4XENATIVESTACK protectedWriteSection+0x28 (0x0900000010FB3D4C [libj9dmp29.so+0x1fd4c])
+3542 4XENATIVESTACK omrsig_protect+0x4fc (0x0900000010EF91E0 [libj9prt29.so+0x5f1e0])
+3543 4XENATIVESTACK _ZN18JavaCoreDumpWriterC2EPKcP16J9RASdumpContextP14J9RASdumpAgent+0x5d0 (0x0900000010FB2754 [libj9dmp29.so+0x1e754])
+3544 4XENATIVESTACK runJavadump+0x20 (0x0900000010FC7C04 [libj9dmp29.so+0x33c04])
+3545 4XENATIVESTACK 0x0900000010F976D4
+3546 4XENATIVESTACK 0x0900000010F9CDF4
+3547 4XENATIVESTACK 0x0900000010EF91E0
+3548 4XENATIVESTACK 0x0900000010F9CA40
+3549 4XENATIVESTACK 0x0900000010FE9BE0
+3550 4XENATIVESTACK 0x0900000012E2852C
+3551 4XENATIVESTACK 0x0900000010EF91E0
+3552 4XENATIVESTACK 0x0900000012E282AC
+3553 4XENATIVESTACK 0x0900000010EFDD54
+3554 4XENATIVESTACK 0x0900000010F63590
+3555 4XENATIVESTACK 0x090000000056204C
+3556 NULL

@hangshao0
Copy link
Contributor

Might be the same issue as https://www.ibm.com/support/pages/apar/IJ50275.

@tajila
Copy link
Contributor

tajila commented Aug 26, 2024

@RobMcCreadyJHA Does this issue still occur with newer builds?

@RobMcCreadyJHA
Copy link
Author

@hangshao0 @tajila - We have not been able to reproduce the issue. We have hundreds of installs and this is the only time we have ever seen this crash behavior on a kill -3. Is there any way to confirm from the stack trace that this is the same issue from the APAR and fixed in the latest release?

@tajila
Copy link
Contributor

tajila commented Aug 26, 2024

@manqingl Can you verify

@manqingl
Copy link

Yes. this does look like the known issue (as @hangshao0 mentioned earlier): https://www.ibm.com/support/pages/apar/IJ50275.

@RobMcCreadyJHA If you have the system core file and send it to me, I will double check.

@RobMcCreadyJHA
Copy link
Author

@manqingl Can I email the core dump to you or another secure method? I can't post them publicly to GitHub.

@manqingl
Copy link

@RobMcCreadyJHA : A system core file is normally too big to share with email.

If you are part of IBM, then you can open a Salesforce ticket according to this website.

If the system core file contains external customer information, please open a Salesforce ticket (according to the standard way).

Once you have the Salesforce ticket, the system core file can be uploaded to our customer service host system in Germany.

@tajila
Copy link
Contributor

tajila commented Aug 27, 2024

@manqingl Can I email the core dump to you or another secure method? I can't post them publicly to GitHub.

You can also host it on googledrive, dropbox or something similar.

@manqingl
Copy link

If it contains external customer info, I choose not to get the file to my local machine at all. For materials containing external customer info, we need to use IBM customer service system. For internal materials, we can use IBM BOX. @RobMcCreadyJHA @tajila

@RobMcCreadyJHA
Copy link
Author

I believe we have access to IBM Box, I'm checking with one of our contacts to confirm and I'll upload there if possible,

@manqingl
Copy link

manqingl commented Sep 5, 2024

@RobMcCreadyJHA : The javacore file shared from BOX was from a user event, and it is not enough for the OOME issue (even though the OOME was triggered by handling a user event). A system core file from the OOME would provide more conclusive evidence. If you collected the system core file from the OOME, please open a sales force case for us and upload the system core files to EcuRep (not to directly share with me please because the system core file may contain external customer's data).

Currently this does look like a known issue (https://www.ibm.com/support/pages/apar/IJ50275) which shows the following key observations:

  1. Issue happened while a javacore file is being generated.
  2. The message "*** Invalid JIT return address 00000007C2958870 in 00000001100A79A0"
  3. Native OOM error.

@RobMcCreadyJHA
Copy link
Author

We did not have any other dump files found on the host. We have not seen a reproduction of the issue and we are rolling out a JDK update to customers. Closing the issue and hoping the updated JDK includes the fix as expected. thanks.

Copy link

Issue Number: 20036
Status: Closed
Actual Components: comp:jit, userRaised
Actual Assignees: No one :(
PR Assignees: No one :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants