Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MiniMix_3h_0 FAILED j9mm.107 ASSERTION FAILED at src/openj9/runtime/gc_base/OwnableSynchronizerObjectBuffer.cpp:75: ((false && (object != _head))) #20395

Closed
JasonFengJ9 opened this issue Oct 22, 2024 · 12 comments

Comments

@JasonFengJ9
Copy link
Member

Failure link

From internal Test_openjdk17_j9_special.system_x86-64_mac_testList_0 (osxrt12)

java version "17.0.13-beta" 2024-10-15
IBM Semeru Runtime Certified Edition 17.0.13+11-202410191506 (build 17.0.13-beta+11-202410191506)
Eclipse OpenJ9 VM 17.0.13+11-202410191506 (build master-dc934fae56, JRE 17 Mac OS X amd64-64-Bit Compressed References 20241019_796 (JIT enabled, AOT enabled)
OpenJ9   - dc934fae56
OMR      - 18d8727b6
JCL      - 04522b3643 based on jdk-17.0.13+11)

Rerun in Grinder - Change TARGET to run only the failed test targets

Optional info

Failure output (captured from console output)

[2024-10-19T17:44:05.174Z] variation: Mode551
[2024-10-19T17:44:05.174Z] JVM_OPTIONS:  -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced 

[2024-10-19T20:03:52.367Z] LT  16:03:48.896 - Completed 77.6%. Number of tests started=1554400 (+7204)
[2024-10-19T20:04:08.716Z] STF 16:04:07.145 - Heartbeat: Process LT  is still running
[2024-10-19T20:04:09.492Z] LT  16:04:08.940 - Completed 77.8%. Number of tests started=1560417 (+6017)
[2024-10-19T20:04:15.714Z] LT  stderr 20:04:14.636 0x19067200    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-mac-x64-openj9-IBM/workspace/build/src/openj9/runtime/gc_base/OwnableSynchronizerObjectBuffer.cpp:75: ((false && (object != _head)))
[2024-10-19T20:04:15.714Z] LT  stderr JVMDUMP039I Processing dump event "traceassert", detail "" at 2024/10/19 16:04:14 - please wait.

[2024-10-19T20:05:04.431Z] MiniMix_3h_0_FAILED

50x internal Grinder

Copy link

Issue Number: 20395
Status: Open
Recommended Components: comp:gc, comp:vm, comp:test
Recommended Assignees: babsingh, jasonfengj9, pshipton

@dmitripivkine
Copy link
Contributor

An attempt to add Ownable Synchronizer object to the list second time.

@dmitripivkine
Copy link
Contributor

@hzongaro FYI: most likely this is JIT issue. Most of Own. Sync. objects added by JIT. I am going to triage to find more information.

@dmitripivkine
Copy link
Contributor

@hzongaro FYI: most likely this is JIT issue. Most of Own. Sync. objects added by JIT. I am going to triage to find more information.

Sorry for premature call, looks like this is GC internal issue.

@pshipton
Copy link
Member

FYI the grinder is on a 3h test, That's 30+ hours on each of 5 machines.

@dmitripivkine
Copy link
Contributor

dmitripivkine commented Oct 22, 2024

I believe we see the problem, and it is very marginal scenario. I don't think it is virtually reproducible:
Copy Forward Abort has been discovered during scan of Own. Sync. object and coping objects it references. This object has been added to the list already. However during abort and transition to Mark Compact Continuation we remember this object in the work packet to be rescanned. And during such rescan we attempted to add this object to the buffer for the second time triggering assert. This code has not been changed for a few years, this one doesn't look as a regression.

@JasonFengJ9
Copy link
Member Author

Aborted the grinder as per #20395 (comment) to free up the machines.

@pshipton
Copy link
Member

FYI that didn't really stop the child jobs, just the parent job. The child jobs just started the next iteration. I've stopped them all. Also confirmed the first iteration all passed, we didn't replicate the problem in 5 runs.

@JasonFengJ9
Copy link
Member Author

So Click here to forcibly terminate running steps doesn't abort the child jobs.
Do we have to log in to the machine manually to kill the processes?

@pshipton
Copy link
Member

Continue any discussion about the grinder in #20396 so we're not adding many more comments to this issue about a GC crash.

dmitripivkine added a commit to dmitripivkine/openj9 that referenced this issue Nov 13, 2024
If Ownable Synchronizer object scan has not been successful (caused Copy
Forward abort) it should not be aadded to the list right away. This
object is remembered in the work packet and is going to be rescanned for
the second time.

Relates eclipse-openj9#20395

Signed-off-by: Dmitri Pivkine <[email protected]>
dmitripivkine added a commit to dmitripivkine/openj9 that referenced this issue Nov 13, 2024
If Ownable Synchronizer object scan has not been successful (caused Copy
Forward abort) it should not be aadded to the list right away. This
object is remembered in the work packet and is going to be rescanned for
the second time.

Relates eclipse-openj9#20395

Signed-off-by: Dmitri Pivkine <[email protected]>
@dmitripivkine
Copy link
Contributor

fixed via #20589

Copy link

Issue Number: 20395
Status: Closed
Actual Components: comp:gc, test failure
Actual Assignees: No one :(
PR Assignees: dmitripivkine

theresa-m pushed a commit to theresa-m/openj9 that referenced this issue Nov 15, 2024
If Ownable Synchronizer object scan has not been successful (caused Copy
Forward abort) it should not be aadded to the list right away. This
object is remembered in the work packet and is going to be rescanned for
the second time.

Relates eclipse-openj9#20395

Signed-off-by: Dmitri Pivkine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants