Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize recycled continuations in createContinuation before usage #18649

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

babsingh
Copy link
Contributor

Currently, recycled continuations are reset and initialized in
freeContinuation at the end of their use. There are also gaps where
the recycled continuation might retain old values and not be properly
reset.

The above approach leads to corruption and segfaults as seen in
the runJavaThread failure reported in #16728.

To fix the above crash, all continuations (new and recycled) are
initialized and reset in createContinuation just before the
continuation begins execution.

Related: #16728

Currently, recycled continuations are reset and initialized in
freeContinuation at the end of their use. There are also gaps where
the recycled continuation might retain old values and not be properly
reset.

The above approach leads to corruption and segfaults as seen in
the runJavaThread failure reported in eclipse-openj9#16728.

To fix the above crash, all continuations (new and recycled) are
initialized and reset in createContinuation just before the
continuation begins execution.

Related: eclipse-openj9#16728

Signed-off-by: Babneet Singh <[email protected]>
@babsingh
Copy link
Contributor Author

jenkins test sanity.functional,sanity.openjdk zlinux,win,aix jdk21

@tajila
Copy link
Contributor

tajila commented Dec 19, 2023

There are also gaps where
the recycled continuation might retain old values and not be properly
reset.

How do the gaps arise? Were the recycled continuations not being fully reset?

@babsingh
Copy link
Contributor Author

How do the gaps arise? Were the recycled continuations not being fully reset?

Yes, continuations not being fully reset is part of the reason. In freeContinuation -> recycleContinuation, memset for the continuation was missing, and we were not setting all the continuation fields that are being set in createContinuation. Also, few fields in the continuation->stackObject were not being reset after the old stacks were freed. The most likely assumption for this approach was that those fields either stay unchanged or don't impact a continuation's functioning.

I also tried to mirror the continuation init code from createContinuation in freeContinuation -> recycleContinuation. This didn't help. There is more at play, which I couldn't figure out. The missing piece is AIX specific because the current implementation works on all other platforms.

Further, the runJavaThread crash started to occur more frequently after we reduced the stack increment size from 16KB (default) to 2KB in openj9-openjdk-jdk21#82. The crash was previously hidden behind the OutOfMemoryError, and showed up once stack grow was triggered more often.

@babsingh
Copy link
Contributor Author

babsingh commented Dec 19, 2023

JDK21 win sanity.openjdk build is taking too much time. Since all other builds have passed, it might not be worth to wait for its completion.

Ran a 100x Skynet grinder (25x on 4 different AIX machines): https://openj9-jenkins.osuosl.org/job/Grinder/3148. Ran the grinder with -Xgcpolicy:optthruput since #18637 wasn't merged when the PR builds were launched.

@pshipton Can you please review/merge this PR?

babsingh added a commit to babsingh/aqa-tests that referenced this pull request Dec 19, 2023
@babsingh babsingh requested a review from pshipton December 19, 2023 20:44
babsingh added a commit to babsingh/aqa-tests that referenced this pull request Dec 19, 2023
babsingh added a commit to babsingh/aqa-tests that referenced this pull request Dec 19, 2023
babsingh added a commit to babsingh/aqa-tests that referenced this pull request Dec 19, 2023
babsingh added a commit to babsingh/aqa-tests that referenced this pull request Dec 19, 2023
babsingh added a commit to babsingh/aqa-tests that referenced this pull request Dec 19, 2023
@tajila
Copy link
Contributor

tajila commented Dec 19, 2023

I had noticed that recycled stacks where appear beneath current stacks, in some cases the stack walker would walk both. Also, the carrier sp would change between vthread mount and unmount. I think some of that can be explained by not resetting all the fields.

Given the issue is not reproduceable, and the proposed change is correct, I say we go ahead with this.

@babsingh
Copy link
Contributor Author

babsingh commented Dec 19, 2023

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_x86-64_windows_Personal/20/

JDK21 win sanity.openjdk build job was aborted by the infra scripts because the timeout limit was hit.

16:39:25  Cancelling nested steps due to timeout
16:39:25  Sending interrupt signal to process

@tajila
Copy link
Contributor

tajila commented Dec 19, 2023

No failures in windows testing

@pshipton pshipton merged commit 81c115a into eclipse-openj9:master Dec 19, 2023
10 of 12 checks passed
JasonFengJ9 pushed a commit to adoptium/aqa-tests that referenced this pull request Dec 20, 2023
JasonFengJ9 pushed a commit to adoptium/aqa-tests that referenced this pull request Dec 20, 2023
JasonFengJ9 pushed a commit to adoptium/aqa-tests that referenced this pull request Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants