OpenJDK java/lang/Thread/virtual/stress/TimedGet Invalid JIT return address #19249

pshipton · 2024-04-01T14:33:22Z

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Release_testList_0/16/ - p8-java1-ibm07
jdk_lang_j9_0
java/lang/Thread/virtual/stress/TimedGet.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Release_testList_0/16/openjdk_test_output.tar.gz

18:33:51  STDOUT:
18:33:51  STDERR:
18:33:51  
18:33:51  
18:33:51  *** Invalid JIT return address 00000100114ABE44 in 0000010023C8BA70
18:33:51  
18:33:51  22:19:06.319 0x10023c92e00    j9vm.249    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK21_ppc64_aix_Release/openj9/runtime/vm/swalk.c:1629: ((0 ))

The text was updated successfully, but these errors were encountered:

pshipton · 2024-04-01T14:35:00Z

20x x 5 grinder on single test https://openj9-jenkins.osuosl.org/job/Grinder/3426/ - passed

1x x 8 grinder on jdk_lang_j9_0 https://openj9-jenkins.osuosl.org/job/Grinder/3427/ - passed
1x x 8 grinder on jdk_lang_j9_0 https://openj9-jenkins.osuosl.org/job/Grinder/3432/ - 1 failure in java/lang/Thread/virtual/stress/Skynet.java#default

hzongaro · 2024-04-01T14:52:12Z

@BradleyWood, may I ask you to take a look at this failure? It's targeted for the 0.44 release, so it's high priority.

pshipton · 2024-04-01T15:15:55Z

It's targeted for the 0.44 release, so it's high priority.

I added it while we investigate since the failure occurred in a release build. If the frequency of the failure is low, we can move it out.

BradleyWood · 2024-04-01T18:56:16Z

@pshipton Has there been any failures on platforms other than PPC AIX?

pshipton · 2024-04-01T19:00:34Z

Not recently. There is also the closed issue #17163

pshipton · 2024-04-01T20:11:26Z

@tajila fyi

pshipton · 2024-04-01T23:22:32Z

Perhaps related to #18910

BradleyWood · 2024-04-02T16:46:47Z

No success yet on reproducing the problem locally

pshipton · 2024-04-11T12:59:49Z

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/179 - p8-java1-ibm02
jdk_lang_j9_0
java/lang/Thread/virtual/stress/Skynet.java#default

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/179/openjdk_test_output.tar.gz

20:37:28  *** Invalid JIT return address 00000100114AE880 in 00000100236FBA70
20:37:28  
20:37:28  00:22:45.178 0x10023703100    j9vm.249    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK21_ppc64_aix_Nightly/openj9/runtime/vm/swalk.c:1629: ((0 ))

pshipton · 2024-04-11T17:46:03Z

Trying a 5x x 5 jdk_lang_j9_0 internal grinder - passed, all machines are 7.2 p8 or p9
4x x 5 on 7.1 machines grinder - no 7.1 machines available for test

pshipton · 2024-04-18T12:16:16Z

https://openj9-jenkins.osuosl.org/job/Test_openjdk22_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/44
jdk_lang_j9_0
java/lang/Thread/virtual/stress/TimedGet.java

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk22_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/44/openjdk_test_output.tar.gz

JamesKingdon · 2024-07-10T21:40:08Z

Matching symptoms have been reported by a customer (on x86)

pshipton · 2024-08-27T19:03:17Z

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_OpenJDK21_testList_0/28

pshipton · 2024-08-27T19:03:45Z

Added the userRaised label due to #19249 (comment)

vij-singh · 2024-09-24T12:44:45Z

@BradleyWood Any new updates to this one?

pshipton · 2024-09-24T15:03:39Z

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/289

BradleyWood · 2024-09-24T17:38:22Z

This issue looks specific to AIX PPC. @JamesKingdon If you have any details that might indicate the customer issue is caused by the same problem as this issue, please let me know. Otherwise, I would like to ask @zl-wang to assign this to someone on the power team.

JamesKingdon · 2024-09-24T18:08:15Z

Hi Brad, I'm going to have to start putting internal case numbers on comments like the one above. I'm currently not able to locate the case that prompted that comment.

vij-singh · 2024-09-26T15:14:02Z

@zl-wang Hi Julian - could you assign someone to work on this one? (0.48 target)

zl-wang · 2024-09-26T15:16:45Z

@rmnattas could you take this up?

zl-wang · 2024-10-02T13:32:22Z

@tajila @dmitripivkine see the virtual-thread's stack back trace above: there is no interpreter frame at all ... every frame is with JIT-ed code.

@rmnattas i am wondering if there were code changes in JIT metadata look-up (jitGetMapsFromPC and jitGetExceptionTable etc) recently ... causing jitInfo NULL.

rmnattas · 2024-10-09T18:58:36Z

As before, using the continuation J9VMThread (generated on the stack for the purposes of stack-walking), we find the following stack-trace

0x0000010011b3aadc {java/lang/VirtualThread.park} JIT [0x100315fbeb0]
0x000001001192c7a4 {java/util/concurrent/LinkedTransferQueue$DualNode.await} JIT [0x100315fbfb0]
0x0000010011b2f760 {java/util/concurrent/SynchronousQueue$Transferer.xferLifo} JIT [0x100315fc0e0]
0x0000010011b2f068 {java/util/concurrent/SynchronousQueue.xfer} JIT [0x100315fc1a0]
0x0000010011b34790 {java/util/concurrent/SynchronousQueue.take} JIT [0x100315fc1f0]
0x000001001192b2fc {Skynet$Channel.receive} JIT [0x100315fc280]
0x00000100117458fc {Skynet.skynet} JIT [0x100315fc2b0]
0x0000010011b341ec {Skynet.lambda$skynet$1} JIT [0x100315fc320]      <-- walkState->pc / Invalid JIT return address
0x0000010011b34044 {Skynet$$Lambda/0x00000000243074b8.run} JIT [0x100315fc370]
0x0000010011742834 {java/lang/Thread.runWith} JIT [0x100315fc3a0]
0x0000010011931b9c {java/lang/VirtualThread.run} JIT [0x100315fc450]
0x0000010011b337e0 {jdk/internal/vm/Continuation.enter} JIT [0x100315fc520]
0x090000001ac61334 {libj9jit29.so}{} [0x100315fbeb0]

The method {Skynet.lambda$skynet$1} is recompiled and exist twice

Searching for JIT'ed methods for the J9Method 0x0000010024325848 {Skynet.lambda$skynet$1}
    J9Class            J9Method               Start          Len {ClassPath/Name.MethodName}
--------------------------------------------------------------------------------------------
(0x0000010024325600 0x0000010024325848) 0x0000010011b34104    87 {Skynet.lambda$skynet$1(LSkynet$Channel;III)V}
(0x0000010024325600 0x0000010024325848) 0x0000010011b47f04   538 {Skynet.lambda$skynet$1(LSkynet$Channel;III)V}

The first compilation entry-point was patched to the pre-prologue but the parked virtual-thread still have it live in its stack.

0x10011b34154 {Skynet.lambda$skynet$1} -7                 00000000 invalid instruction
0x10011b34158 {Skynet.lambda$skynet$1} -6            >    7c0802a6 mflr      r0 <<< ^+4
0x10011b3415c {Skynet.lambda$skynet$1} -5            |    481efff5 bl        0x10011d24150 Trampoline {libj9jit29.so}{_samplingPatchCallSite} +0   
0x10011b34160 {Skynet.lambda$skynet$1} -4            |    00000100 invalid instruction
0x10011b34164 {Skynet.lambda$skynet$1} -3            |    11494670 maddhd    r10, r9, r8, r25
0x10011b34168 {Skynet.lambda$skynet$1} -2            |    e96f0050 ld        r11, 0x50(r15) J9VMThread.stackOverflowMark // stack or async check 
0x10011b3416c {Skynet.lambda$skynet$1} -1            |    00100250 invalid instruction
0x10011b34170 {Skynet.lambda$skynet$1} +0            |    e86e0018 ld        r3, 0x18(r14)
0x10011b34174 {Skynet.lambda$skynet$1} +1            |    808e0010 lwz       r4, 0x10(r14)
0x10011b34178 {Skynet.lambda$skynet$1} +2            |    80ae0008 lwz       r5, 8(r14)
0x10011b3417c {Skynet.lambda$skynet$1} +3            |    80ce0000 lwz       r6, 0(r14)
0x10011b34180 {Skynet.lambda$skynet$1} +4            *    4bffffd8 b         0x10011b34158 U>> ^-6   // <-- jitEntry

When the parked virtual thread stack is walked it crashes with *** Invalid JIT return address 0000010011B341EC in 0000010024004A80.

Given that the first body metadata still exists (one indication is it being found by KCA walking the AVLTree), I don't see why it does crash.
There's a cache that's used instead of walking the AVLTree, but its not used for walking the continuation VThreads hence its not the cause.

Looking at the core in a different way,

To find the VThreadContinuation, I looked through the stack frames.

0x090000001a2ef578 {libj9vm29.so}{invalidJITReturnAddress} [0x10024003c00]
0x090000001ac7cd64 {libj9jit29.so}{jitWalkStackFrames} [0x10024003c70]
0x090000001a2ed058 {libj9vm29.so}{walkStackFrames} [0x10024003d00]
0x090000001a392a8c {libj9vm29.so}{walkContinuationStackFrames} [0x10024003df0]
0x090000001c8b8f9c {libj9gc_full29.so}{_ZN28GC_VMThreadStackSlotIterator21scanContinuationSlotsEP10J9VMThreadP8J9ObjectPvPFvP8J9JavaVMPS3_S4_P16J9StackWalkStatePKvEbb} [0x10024004a10]
0x090000001cb17830 {libj9gc_full29.so}{_ZN22MM_GlobalMarkingScheme10scanObjectEP19MM_EnvironmentVLHGCP8J9ObjectNS_10ScanReasonE} [0x10024004d40]
0x090000001cb18234 {libj9gc_full29.so}{_ZN22MM_GlobalMarkingScheme19markLiveObjectsScanEP19MM_EnvironmentVLHGC} [0x10024004e10]
...

The caller markLiveObjectsScan have the objectPtr (VThreadContinuation in this case) in r27, the callee stores r27 in 0xa8(r1). Using the SP of the callee we find the VThreadContinuation.

(kca) what (0x10024004d40+0xa8)
0x10024004de8: 0x0a00000007060f20 Obj - {java/lang/VirtualThread$VThreadContinuation}

Finding the VirtualThread

> !vthreads | grep -i 7060f20
!continuationstack 0x00000100315fabd0 !j9vmcontinuation 0x00000100315fabd0 !j9object 0x0A00000007060F20 (Continuation) !j9object 0x0A0000000829B110 (VThread) -

Showing the same stack-trace found before for the unmounted thread

> !continuationstack 0x00000100315fabd0
<100315fabd0>   !j9method 0x0000010023251020   jdk/internal/vm/Continuation.yieldImpl(Z)Z
<100315fabd0>   !j9method 0x0000010023250EC0   jdk/internal/vm/Continuation.yield0()Z
<100315fabd0>   !j9method 0x0000010023250EA0   jdk/internal/vm/Continuation.yield(Ljdk/internal/vm/ContinuationScope;)Z
<100315fabd0>   !j9method 0x000001002323F5E0   java/lang/VirtualThread.yieldContinuation()Z
<100315fabd0>   !j9method 0x000001002323F6C0   java/lang/VirtualThread.park()V
<100315fabd0>   !j9method 0x0000010023267C80   java/lang/Access.parkVirtualThread()V
<100315fabd0>   !j9method 0x00000100244FC218   jdk/internal/misc/VirtualThreads.park()V
<100315fabd0>   !j9method 0x00000100232369D0   java/util/concurrent/locks/LockSupport.park()V
<100315fabd0>   !j9method 0x00000100243ED0A8   java/util/concurrent/LinkedTransferQueue$DualNode.await(Ljava/lang/Object;JLjava/lang/Object;Z)Ljava/lang/Object;
<100315fabd0>   !j9method 0x00000100240E5F38   java/util/concurrent/SynchronousQueue$Transferer.xferLifo(Ljava/lang/Object;J)Ljava/lang/Object;
<100315fabd0>   !j9method 0x0000010024331BD8   java/util/concurrent/SynchronousQueue.xfer(Ljava/lang/Object;J)Ljava/lang/Object;
<100315fabd0>   !j9method 0x0000010024331C98   java/util/concurrent/SynchronousQueue.take()Ljava/lang/Object;
<100315fabd0>   !j9method 0x0000010024325BF8   Skynet$Channel.receive()Ljava/lang/Object;
<100315fabd0>   !j9method 0x0000010024325828   Skynet.skynet(LSkynet$Channel;III)V
<100315fabd0>   !j9method 0x0000010024325848   Skynet.lambda$skynet$1(LSkynet$Channel;III)V
<100315fabd0>   !j9method 0x0000010024325F68   Skynet$$Lambda/0x00000000243074b8.run()V
<100315fabd0>   !j9method 0x000001002321E480   java/lang/Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V
<100315fabd0>   !j9method 0x000001002323F520   java/lang/VirtualThread.run(Ljava/lang/Runnable;)V
<100315fabd0>   !j9method 0x00000100244F0D50   java/lang/VirtualThread$VThreadContinuation$1.run()V
<100315fabd0>   !j9method 0x0000010023250E60   jdk/internal/vm/Continuation.enter(Ljdk/internal/vm/Continuation;)V
<100315fabd0>                           JNI call-in frame
<100315fabd0>                           Native method frame

And no carrier thread:

!j9object 0x0A0000000829B110 (VThread)
...
        Ljava/lang/Thread; carrierThread = !fj9object 0x0 (offset = 208) (java/lang/VirtualThread)
                I state = 0x00000004 (offset = 224) (java/lang/VirtualThread)  // PARKED

@babsingh, any suggestions given similarity to #18910 , possibly duplicate?

babsingh · 2024-10-09T20:00:42Z

@babsingh, any suggestions given similarity to #18910 , possibly duplicate?

@rmnattas Can you provide me a link to the core file, which was used during the above analysis?

rmnattas · 2024-10-09T20:07:11Z

Build got auto-removed but uploaded it here: https://ibm.box.com/s/jx79twimu30p1gm181ku1mymzxz0ozwm
Core in: aqa-tests/TKG/output_17271688367563/jdk_lang_j9_0/work/java/lang/Thread/virtual/stress/Skynet_default

babsingh · 2024-10-09T21:11:22Z

@rmnattas It's not a duplicate of #18910. Unlike #18910, the virtual thread is completed unmounted and not in an intermediate unsteady state in the above core file.

> !j9object 0x0A0000000829B110
!J9Object 0x0A0000000829B110 {
	struct J9Class* clazz = !j9class 0x1002323FF00 // java/lang/VirtualThread
	...
	Ljdk/internal/vm/Continuation; cont = !fj9object 0xa00000007060f20 (offset = 192) (java/lang/VirtualThread) <--- Contains the unmounted VThread's stack
	Ljava/lang/Thread; carrierThread = !fj9object 0x0 (offset = 208) (java/lang/VirtualThread) <--- NULL means the VThread is unmounted
	J inspectorCount = 0x0000000000000000 (offset = 176) (java/lang/VirtualThread) <hidden> <--- 0 means the VThread is in a steady state

> !fj9object 0xa00000007060f20
!J9Object 0x0A00000007060F20 {
	struct J9Class* clazz = !j9class 0x100244F1800 // java/lang/VirtualThread$VThreadContinuation
	J vmRef = 0x00000100315FABD0 (offset = 8) (jdk/internal/vm/Continuation) <--- Struct that stores the unmounted VThread's stack
 	...
}

> !j9vmcontinuation 0x00000100315FABD0
J9VMContinuation at 0x100315fabd0 { <--- Is JIT walking the stack through using walkContinuationStackFrames?
  Fields for J9VMContinuation:
	0x0: UDATA* arg0EA = !j9x 0x00000100315FBEB0
	0x8: UDATA* bytecodes = !j9x 0x0000000000000000
	0x10: UDATA* sp = !j9x 0x00000100315FBE88
	0x18: U8* pc = !j9x 0x0000000000000003
	0x20: struct J9Method* literals = !j9method 0x0000000000000000
	0x28: UDATA* stackOverflowMark = !j9x 0x00000100315FBDB0
	0x30: UDATA* stackOverflowMark2 = !j9x 0x00000100315FBDB0
	0x38: struct J9JavaStack* stackObject = !j9javastack 0x00000100315FADB0
	0x40: struct J9JITDecompilationInfo* decompilationStack = !j9jitdecompilationinfo 0x0000000000000000
	0x48: UDATA* j2iFrame = !j9x 0x0000000000000000
	0x50: struct J9JITGPRSpillArea jitGPRs = !j9jitgprspillarea 0x00000100315FAC20
	0x160: struct J9I2JState i2jState = !j9i2jstate 0x00000100315FAD30
	0x180: struct J9VMEntryLocalStorage* oldEntryLocalStorage = !j9vmentrylocalstorage 0x0000000000000000
	0x188: UDATA dropFlags = 0x0000000000000000 (0)

When the parked virtual thread stack is walked it crashes with ...

Can you confirm if the JIT is walking the unmounted vthread's stack using walkContinuationStackFrames?

Here is some existing code which walks the stack of an unmounted virtual thread:

openj9/runtime/jvmti/jvmtiHelpers.cpp

Lines 2101 to 2109 in 96ba473

    
           if (IS_JAVA_LANG_VIRTUALTHREAD(currentThread, threadObject)) { 
        
           	if (NULL != targetThread) { 
        
           		walkState->walkThread = targetThread; 
        
           		rc = vm->walkStackFrames(currentThread, walkState); 
        
           	} else { 
        
           		j9object_t contObject = (j9object_t)J9VMJAVALANGVIRTUALTHREAD_CONT(currentThread, threadObject); 
        
           		J9VMContinuation *continuation = J9VMJDKINTERNALVMCONTINUATION_VMREF(currentThread, contObject); 
        
           		vm->internalVMFunctions->walkContinuationStackFrames(currentThread, continuation, threadObject, walkState); 
        
           	}

If walkContinuationStackFrames is being used by the JIT, can you verify if the JIT specific fields in J9VMContinuation are valid?
J9VMContinuation contains fields from J9VMThread in order to store the vthread's state in an unmounted with minimal data. Are any JIT specific fields from J9VMThread missing in J9VMContinuation, which are needed to preserve the vthread's state?

zl-wang · 2024-10-09T22:24:32Z

we indeed can see walkContinuationStackFrames is being used by GC above in #19249 (comment) as below:

0x090000001a2ef578 {libj9vm29.so}{invalidJITReturnAddress} [0x10024003c00]
0x090000001ac7cd64 {libj9jit29.so}{jitWalkStackFrames} [0x10024003c70]
0x090000001a2ed058 {libj9vm29.so}{walkStackFrames} [0x10024003d00]
0x090000001a392a8c {libj9vm29.so}{walkContinuationStackFrames} [0x10024003df0]
0x090000001c8b8f9c {libj9gc_full29.so}{_ZN28GC_VMThreadStackSlotIterator21scanContinuationSlotsEP10J9VMThreadP8J9ObjectPvPFvP8J9JavaVMPS3_S4_P16J9StackWalkStatePKvEbb} [0x10024004a10]
0x090000001cb17830 {libj9gc_full29.so}{_ZN22MM_GlobalMarkingScheme10scanObjectEP19MM_EnvironmentVLHGCP8J9ObjectNS_10ScanReasonE} [0x10024004d40]
0x090000001cb18234 {libj9gc_full29.so}{_ZN22MM_GlobalMarkingScheme19markLiveObjectsScanEP19MM_EnvironmentVLHGC} [0x10024004e10]
...

now, with comment in #19249 (comment), it points to potential problems in the faked J9VMThread in order to walk the VThread. who set up that faked J9VMThread? walkContinuationStackFrames? has it make sure everything needed for the walk captured/copied in the set-up?

zl-wang · 2024-10-10T13:59:57Z

@TobiAjila one thing we noticed early. the faked J9VMThread is not 256 aligned as below:

struct J9VMThread* walkThread = 0x0000010024003ec0

however, all low-order 8 bits have other meanings for locking (at least). has anybody made sure there are no interferences in stack walk from these bits set (effectively)?

zl-wang · 2024-10-10T14:26:59Z

@babsingh @TobiAjila from dissecting the core file and observing the failure, the difference is as below:

it asserted in the run with "Invalid JIT return address" because metaData cannot be retrieved for code addr: 0000010011B341EC
while ((walkState->jitInfo = jitGetExceptionTable(walkState)) != NULL) { // i.e. returning NULL on this line
however, KCA can retrieve the metaData in core without problems on the same address as below (assembly listing with GCMap metaData inserted):

0x10011b341e8 GCMap  Bytecode -1:6  StackMap: 
0x10011b341e8 {Skynet.lambda$skynet$1} +30   -1:6    |||| 4bc11599 bl        0x10011745780 ^{Skynet.skynet} +4    <<< +28 
0x10011b341ec {Skynet.lambda$skynet$1} +31           |||| e80e0048 ld        r0, 0x48(r14) (SP+72) = 0x10011b34044

in case, this difference dawns some ideas on you ...

babsingh · 2024-10-10T14:43:47Z

++ @fengxue-IS for insights.

fengxue-IS · 2024-10-10T15:42:38Z

in walkContinuationStackFrames vm uses J9_STACKWALK_NO_JIT_CACHE to disable thread cache to avoid having to manage a single use cache.

@zl-wang @rmnattas is there any dependency on vmThread->jitArtifactSearchCache for such recompiled methods?
Looking at the code path, the carrierThread which the vthread mounted to before unmount could be still holding the jitInfo in its jitArtifactSearchCache but the continuation stack will not have any reference to that cache since we are using a temp stack thread for walking.

zl-wang · 2024-10-10T16:06:05Z

is there any dependency on vmThread->jitArtifactSearchCache for such recompiled methods?

@fengxue-IS i don't think so. from walker perspective, there is no difference between recompiled and non-recompiled method. in the background, once a method is recompiled, the space occupied by the previous compilation is queued to the FaintBlockList (i.e. candidate to be freed as early as next GC, when stack-walking doesn't find it active on any thread. when it is freed, the associated metaData range is adjusted as well). in this case, obviously the FaintBlock hasn't been freed yet since it is still active on that VThread at least and KCA can still find its metaData.

JamesKingdon · 2024-10-10T16:28:21Z

We have a new case TS017553893 with an assert from swalk.c:1629, is there anything I should look for in the corefiles to increase confidence that it's the same issue as being explored here?

zl-wang · 2024-10-10T17:05:14Z

@JamesKingdon curious on which platform this new case is ...
that assert can be called from two places: jitWalkStackFrames and jitWalkResolveMethodFrame both appeared to call that assert upon failing to look up the metaData for a jit-ed code address. so, at least, you can identify in the core file below to increase the confidence it is a dup to this one:

the complained jit-ed code address is still valid (the containing compilation might have been recompiled as in this one)
KCA at least can still find its metaData
not sure if this is significant to the happening though: the java application is using virtual-thread

JamesKingdon · 2024-10-10T17:31:05Z

The problem is happening on amd64, unfortunately I think the corefile is probably truncated - I'm having a lot of trouble making sense of it. The stack was

0x00007f2c63888aff {libj9vm29.so}{invalidJITReturnAddress} [0x7f2c63fe3390]
0x00007f2c629ff87c {libj9jit29.so}{jitWalkStackFrames} [0x7f2c63fe33b0]
0x00007f2c63887836 {libj9vm29.so}{walkStackFrames} [0x7f2c63fe3410]
0x00007f2c63842870 {libj9vm29.so}{Fast_java_lang_Throwable_fillInStackTrace} [0x7f2c63fe3490]

I can't identify the jit return address, but it looks like code. One thing that is catching my eye is that there are signs of native crypto code on the failing stack.

rmnattas · 2024-10-10T20:19:21Z

@babsingh as mentioned walkContinuationStackFrames is being used, and I don't see the stackWalking code needing other values from J9VMThread.
Re valid values, the J9VMThreads has vmThread.pc = 0x03 which can be similar to the mac/x86 issue, but can't reproduce failure to log the value. Not sure what would changes it here if its walking JITed frames, aka should continue being in jitWalkStackFrames.

All non-zero values:

(kca) s J9VMThread 0x0000010024003ec0
J9VMThread (2832 bytes)
                             struct J9JavaVM*  javaVM = 0x0000010010131e10 (offset: 8)
                                       UDATA*  arg0EA = 0x00000100315fbeb0 (offset: 16)
                                           UDATA*  sp = 0x00000100315fbe88 (offset: 32)
                                              U8*  pc = 0x0000000000000003 (offset: 40) // Warning: ptr is not correctly aligned!
                            UDATA*  stackOverflowMark = 0x00000100315fbdb0 (offset: 80)
                           UDATA*  stackOverflowMark2 = 0x00000100315fbdb0 (offset: 88)
                     struct J9JavaStack*  stackObject = 0x00000100315fadb0 (offset: 312)
     struct J9VMEntryLocalStorage*  entryLocalStorage = 0x0000010024003e60 (offset: 600)
        struct J9VMGCSublistFragment  gcRememberedSet   (addr: 0x0000010024004128) (offset: 616)
 struct MM_GCRememberedSetFragment  sATBBarrierRememberedSetFragment   (addr: 0x0000010024004158) (offset: 664)
        struct J9StackWalkState  inlineStackWalkState   (addr: 0x00000100240041c0) (offset: 768)
 struct J9ModronThreadLocalHeap  allocateThreadLocalHeap   (addr: 0x0000010024004480) (offset: 1472)
 struct J9ModronThreadLocalHeap  nonZeroAllocateThreadLocalHeap   (addr: 0x00000100240044b0) (offset: 1520)
               struct J9DLTInformationBlock  dltBlock   (addr: 0x0000010024004590) (offset: 1744)
        j9objectmonitor_t[]  objectMonitorLookupCache   (addr: 0x0000010024004790) (offset: 2256)
                        void*  jitArtifactSearchCache = 0x0000000000000001 (offset: 2600) // Warning: ptr is not correctly aligned!
                  struct J9GSParameters  gsParameters   (addr: 0x0000010024004938) (offset: 2680)

Also, not sure if it was suppose to be for another field, but was told that it indicates type of what's being executed?

                        J9SF_FRAME_TYPE_NATIVE_METHOD = 0x0 / 0x3 (constant)
                      J9SF_FRAME_TYPE_JIT_JNI_CALLOUT = 0x0 / 0x6 (constant)
                          J9SF_FRAME_TYPE_JIT_RESOLVE = 0x0 / 0x5 (constant)

rmnattas · 2024-10-11T15:35:36Z

Suggesting moving to 0.49.
Given that this is not a duplicate of the other POWER DAA issues, it's not as frequent or high-risk failure.
Continuing investigating in the meantime.

JamesKingdon · 2024-10-21T11:49:45Z

The asserts from swalk.c in TS017553893 were caused by the new memory disclaiming feature. The backing files were being written to an nfs mounted directory and the majority of the jit data cache was reading as 0. Using -Xjit:disableDataCacheDisclaiming,disableIprofilerDataDisclaiming avoided the crashes.

zl-wang · 2024-10-21T11:55:09Z

The backing files were being written to an nfs mounted directory and the majority of the jit data cache was reading as 0.

is that a defined NFS behaviour?

JamesKingdon · 2024-10-21T11:59:29Z

Not as far as I could find, it looks like mmap is supported on nfs with some caveats about distributed sharing. I'm wondering if there's some unexpected behaviour around us immediately unlinking the backing files. It took me a little longer to recognize the problem because the files didn't have our omrvmem* names, but had the special .nfs* names for pending deletes.
Another complexity, in case it's relevant; the nfs client and server were colocated, so while we would have been writing to an nfs mounted directory the core file was showing the underlying physical files in the server filesystem.

hzongaro · 2024-12-10T03:26:01Z

@rmnattas, I am assuming this should move out to the 0.51 release.

JasonFengJ9 · 2025-01-10T22:58:55Z

openjdk21_j9_sanity.openjdk_ppc64le_linux(sles15le-rtp-rt7-1)

[2025-01-06T10:12:23.959Z] variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage Mode501 -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10
[2025-01-06T10:12:23.959Z] JVM_OPTIONS:  -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -Xjit -Xgcpolicy:balanced -Xnocompressedrefs -XXgc:fvtest_forceCopyForwardHybridMarkCompactRatio=10 -Xverbosegclog 

[2025-01-06T10:49:06.182Z] TEST: java/lang/Thread/virtual/stress/TimedGet.java

[2025-01-06T10:49:06.183Z] *** Invalid JIT return address 00007FFF292007FC in 00007FFF1275BA68
[2025-01-06T10:49:06.183Z] 
[2025-01-06T10:49:06.183Z] 10:48:50.819 0x7ffed8002400    j9vm.249    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/build-scripts/jobs/jdk21u/jdk21u-linux-ppc64le-openj9-IBM/workspace/build/src/openj9/runtime/vm/swalk.c:1633: ((0 ))

[2025-01-06T10:49:06.184Z] TEST RESULT: Failed. Unexpected exit from test [exit code: 255]
[2025-01-06T10:49:06.184Z] --------------------------------------------------
[2025-01-06T10:54:34.133Z] Test results: passed: 936; failed: 1
[2025-01-06T10:55:09.930Z] Report written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64le_linux_testList_0/aqa-tests/TKG/output_17361557429618/jdk_lang_j9_1/report/html/report.html
[2025-01-06T10:55:09.930Z] Results written to /home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64le_linux_testList_0/aqa-tests/TKG/output_17361557429618/jdk_lang_j9_1/work
[2025-01-06T10:55:09.930Z] Error: Some tests failed or other problems occurred.
[2025-01-06T10:55:09.930Z] -----------------------------------
[2025-01-06T10:55:09.930Z] jdk_lang_j9_1_FAILED

pshipton · 2025-01-17T13:11:41Z

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_2/371

pshipton added comp:jit test failure labels Apr 1, 2024

pshipton added this to the Release 0.44 (Java 8, 11, 17, 21) Apr refresh milestone Apr 1, 2024

hzongaro assigned BradleyWood Apr 1, 2024

pshipton modified the milestones: Release 0.44 (Java 8, 11, 17, 21) Apr refresh, Release 0.46 (Java 8, 11, 17, 21, 22) July refresh Apr 15, 2024

hzongaro modified the milestones: Release 0.46 (Java 8, 11, 17, 21, 22) July refresh, Release 0.48 (Java 8, 11, 17, 21, 23) October refresh Jun 14, 2024

pshipton added the userRaised label Aug 27, 2024

hangshao0 mentioned this issue Sep 18, 2024

DaaLoadTest_daa1_5m_0_FAILED Invalid JIT return address && j9vm.249 ASSERTION FAILED at openj9/runtime/vm/swalk.c:1633: ((0 )) #20180

Closed

0xdaryl assigned rmnattas and unassigned BradleyWood Oct 8, 2024

pshipton modified the milestones: Release 0.48 (Java 8, 11, 17, 21, 23) October refresh, Release 0.49 (Java 8, 11, 17, 21, 23) January refresh Oct 11, 2024

a7ehuo mentioned this issue Nov 8, 2024

Clean up references to old opcodes #20521

Merged

1 task

hzongaro mentioned this issue Nov 28, 2024

Stop recognizing UTF16_Encoder.encodeUTF16 methods #20613

Merged

hzongaro modified the milestones: Release 0.49 (Java 8, 11, 17, 21, 23) January refresh, Release 0.51 (Java 8, 11, 17, 21, 24) Apr refresh Dec 10, 2024

OpenJDK java/lang/Thread/virtual/stress/TimedGet Invalid JIT return address #19249

OpenJDK java/lang/Thread/virtual/stress/TimedGet Invalid JIT return address #19249

Comments

pshipton commented Apr 1, 2024

pshipton commented Apr 1, 2024 • edited Loading

hzongaro commented Apr 1, 2024

pshipton commented Apr 1, 2024

BradleyWood commented Apr 1, 2024

pshipton commented Apr 1, 2024

pshipton commented Apr 1, 2024

pshipton commented Apr 1, 2024

BradleyWood commented Apr 2, 2024

pshipton commented Apr 11, 2024 • edited Loading

pshipton commented Apr 11, 2024 • edited Loading

pshipton commented Apr 18, 2024

JamesKingdon commented Jul 10, 2024 • edited Loading

pshipton commented Aug 27, 2024

pshipton commented Aug 27, 2024

vij-singh commented Sep 24, 2024

pshipton commented Sep 24, 2024

BradleyWood commented Sep 24, 2024

JamesKingdon commented Sep 24, 2024

vij-singh commented Sep 26, 2024

zl-wang commented Sep 26, 2024

zl-wang commented Oct 2, 2024

rmnattas commented Oct 9, 2024 • edited Loading

babsingh commented Oct 9, 2024

rmnattas commented Oct 9, 2024

babsingh commented Oct 9, 2024 • edited Loading

zl-wang commented Oct 9, 2024

zl-wang commented Oct 10, 2024

zl-wang commented Oct 10, 2024

babsingh commented Oct 10, 2024

fengxue-IS commented Oct 10, 2024

zl-wang commented Oct 10, 2024

JamesKingdon commented Oct 10, 2024

zl-wang commented Oct 10, 2024

JamesKingdon commented Oct 10, 2024 • edited Loading

rmnattas commented Oct 10, 2024

rmnattas commented Oct 11, 2024

JamesKingdon commented Oct 21, 2024

zl-wang commented Oct 21, 2024

JamesKingdon commented Oct 21, 2024 • edited Loading

hzongaro commented Dec 10, 2024

JasonFengJ9 commented Jan 10, 2025

pshipton commented Jan 17, 2025

pshipton commented Apr 1, 2024 •

edited

Loading

pshipton commented Apr 11, 2024 •

edited

Loading

pshipton commented Apr 11, 2024 •

edited

Loading

JamesKingdon commented Jul 10, 2024 •

edited

Loading

rmnattas commented Oct 9, 2024 •

edited

Loading

babsingh commented Oct 9, 2024 •

edited

Loading

JamesKingdon commented Oct 10, 2024 •

edited

Loading

JamesKingdon commented Oct 21, 2024 •

edited

Loading