Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve static field ref in CP if possible #20770

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

cjjdespres
Copy link
Contributor

Some constant pool entries may be unresolved when attempting to relocate a method. If the classes involved are all loaded or initialized properly and the entry is a static field ref, then resolveStaticFieldRef() will be able to resolve the entry, allowing relocation to proceed.

@cjjdespres cjjdespres requested a review from dsouzai as a code owner December 6, 2024 15:18
@cjjdespres
Copy link
Contributor Author

Attn @mpirvu. This seems to eliminate the staticClassFromCPValidationFailure relo failures, at least in my testing.

@mpirvu mpirvu self-assigned this Dec 6, 2024
// This CP entry could still be unresolved, which would cause
// getClassOfStaticFromCP() to fail. Attempt to resolve it manually to reduce
// the chances of failure.
if (cpIndex != -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call TR_ResolvedJ9Method::getClassOfStaticFromCP() first and only if that fails try to resolve at compile time?
I am assuming that most of the time getClassOfStaticFromCP() will succeed so we don't pay the price of getting VM access.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TR_ResolvedJ9Method::getClassOfStaticFromCP acquires vmaccess, so as the code stands, we'd be doing it twice. Three times if the approach is going to be:

if (!TR_ResolvedJ9Method::getClassOfStaticFromCP) {
  resolveStaticFieldRef;
  getClassOfStaticFromCP;
}

it would be nice if J9::VMAccessCriticalSection checked if we already have vmaccess and do an early return, though that would have the side effect of not checking if the compilation should be interrupted as often.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put this in getClassOfStaticFromCP instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm that is a question for @jdmpapin :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code already has VM access it won't acquire it again, so we can do

if (cp != -1)
   {
   TR::VMAccessCriticalSection getClassFromConstantPool(_fej9);
   TR_OpaqueClassBlock * clazz = TR_ResolvedJ9Method::getClassOfStaticFromCP(_fej9, beholderCP, cpIndex);
    if (!clazz)
       {
        _fej9->_vmFunctionTable->resolveStaticFieldRef(_fej9->vmThread(), NULL, beholderCP, cpIndex, J9_RESOLVE_FLAG_JIT_COMPILE_TIME, NULL);
        clazz = TR_ResolvedJ9Method::getClassOfStaticFromCP(_fej9, beholderCP, cpIndex);
       }
   } 
return validateSymbol(classID, clazz);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to put this in getClassOfStaticFromCP instead?

I think we might as well. I don't know of any reason why we shouldn't try to compile-time resolve if we call getClassOfStaticFromCP() from somewhere else

@mpirvu
Copy link
Contributor

mpirvu commented Dec 6, 2024

@dsouzai I would appreciate your review on this on this PR as well.

@cjjdespres
Copy link
Contributor Author

There is also a chance these failures are really due to #20730 (comment), which I will double-check.

if (cpIndex != -1)
{
TR::VMAccessCriticalSection getClassFromConstantPool(_fej9);
_fej9->_vmFunctionTable->resolveStaticFieldRef(_fej9->vmThread(), NULL, beholderCP, cpIndex, J9_RESOLVE_FLAG_JIT_COMPILE_TIME, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code, it looks like we call jitCTResolveStaticFieldRefWithMethod to do compile time resolution of static fields. Ideally we'd call the same thing, but we do have the problem of needing to poof up a J9Method, so resolveStaticFieldRef seems ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me worry about whether the J9Method could affect the result, but I think it's fine.

The method is only used to determine whether it's OK to store to a static final field. If at compile-time we generated a resolved static field store, then the store was allowed at the time, so one of the following will hold:

  • It was non-final. On (successful) load, it will still be non-final because we validate the class chain of the defining class.
  • It was final and defined by the same class as the method containing the putstatic. On (successful) load, this will still be the case because the class ref in the static field ref CP entry names the containing class (i.e. the class that owns the CP entry), and we validate the class chain of the defining class of each inlined method. In this case the class file version must be <53 or the putstatic must occur within <clinit>, both of which are also captured in the class chain.

Also, when generating IL for putstatic, all of the above is independent of how we validate classOfStatic(), which is only used for generating a write barrier

@cjjdespres
Copy link
Contributor Author

There is also a chance these failures are really due to #20730 (comment), which I will double-check.

The failures are still present with the corrected version of that PR.

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Comment for this one and for TR_ResolvedJ9Method::getClassOfStaticFromCP needs to be changed/added saying that if field is not resolved, the code will attempt to resolve at compile time.
I was worried about using this method to determine which blocks are cold based on seeing unresolved references, but I don't see such usage.

@mpirvu mpirvu added the comp:vm label Dec 9, 2024
@mpirvu
Copy link
Contributor

mpirvu commented Dec 9, 2024

@gacholio Could you please review this PR as well since it touches VM code? Thanks

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mpirvu
Copy link
Contributor

mpirvu commented Dec 10, 2024

I will start testing proactively while while waiting for other potential reviewer.

@mpirvu
Copy link
Contributor

mpirvu commented Dec 10, 2024

jenkins test sanity all jdk21

@mpirvu
Copy link
Contributor

mpirvu commented Dec 10, 2024

Compilations errors:

11:06:47  [100%] Linking CXX shared library ../libj9jit29.so
11:07:55  ../lib/libj9jit_vm.a(ctsupport.cpp.o): In function `jitGetClassOfFieldFromCP':
11:07:55  /home/jenkins/workspace/Build_JDK21_aarch64_linux_Personal/openj9/runtime/jit_vm/ctsupport.cpp:103: undefined reference to `resolveStaticFieldRef'
11:07:55  collect2: error: ld returned 1 exit status

@mpirvu
Copy link
Contributor

mpirvu commented Dec 10, 2024

jenkins test sanity all jdk21

@mpirvu
Copy link
Contributor

mpirvu commented Dec 11, 2024

jenkins test extended,sanity.system,extended.system xlinux jdk21

@mpirvu
Copy link
Contributor

mpirvu commented Dec 11, 2024

aarch64 failed jdk_vector_double128_j9_0

jdk_vector_double128_j9_0/work" not found: creating
...
4:01:39  java.lang.NullPointerException: Cannot throw exception because the return value of "java.lang.Class.newInternalError(java.lang.Exception)" is null

On AIX we have an infra problem:

13:21:15  Running test jdk_security2_1 ...
13:21:15  ===============================================
13:21:15  jdk_security2_1 Start Time: Tue Dec 10 18:03:06 2024 Epoch Time (ms): 1733853786469
13:21:15  variation: Mode650
13:21:15  JVM_OPTIONS:  -XX:-UseCompressedOops -Xverbosegclog 
13:21:15  { \
13:21:15  echo "";	echo "TEST SETUP:"; \
13:21:15  "/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Personal_testList_2/jdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Personal_testList_2/jdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done"; \
13:21:15  mkdir -p "/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Personal_testList_2/aqa-tests/TKG/../TKG/output_17338527993068/jdk_security2_1"; \
13:21:15  cd "/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Personal_testList_2/aqa-tests/TKG/../TKG/output_17338527993068/jdk_security2_1"; \
...
13:29:22  Cannot contact p8-java1-ibm12: java.lang.InterruptedException
13:34:30  gmake[4]: *** [autoGen.mk:169: jdk_security2_1] Killed
13:34:30  gmake[4]: Leaving directory '/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Personal_testList_2/aqa-tests/openjdk'
13:34:30  gmake[3]: *** [/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Personal_testList_2/aqa-tests/TKG/../TKG/settings.mk:361: testList-openjdk] Error 2
13:34:30  gmake[3]: Leaving directory '/home/jenkins/workspace/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Personal_testList_2/aqa-tests'
13:34:30  gmake[2]: *** [settings.mk:361: testList-..] Error 2

@cjjdespres
Copy link
Contributor Author

That aarch64 error in jdk_vector_double128_j9_0:

14:01:39  test Double128VectorTests.unsliceDouble128VectorTestsMasked(double[-i * 5], double[i + 1], mask[false]): failure
14:01:39  java.lang.NullPointerException: Cannot throw exception because the return value of "java.lang.Class.newInternalError(java.lang.Exception)" is null
14:01:39  	at java.base/java.lang.Class.copyFields(Class.java:4580)
14:01:39  	at java.base/java.lang.Class.lookupCachedFields(Class.java:4594)
14:01:39  	at java.base/java.lang.Class.getDeclaredFields(Class.java:1044)
14:01:39  	at java.base/jdk.internal.vm.vector.VectorSupport.isNonCapturingLambda(VectorSupport.java:732)
14:01:39  	at java.base/jdk.internal.vm.vector.VectorSupport.shuffleToVector(VectorSupport.java:262)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.Double128Vector$Double128Shuffle.toVector(Double128Vector.java:800)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.Double128Vector$Double128Shuffle.toVector(Double128Vector.java:764)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.AbstractShuffle.checkIndexes(AbstractShuffle.java:126)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.DoubleVector.rearrangeTemplate(DoubleVector.java:2245)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.Double128Vector.rearrange(Double128Vector.java:439)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.Double128Vector.rearrange(Double128Vector.java:41)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.DoubleVector.sliceTemplate(DoubleVector.java:2131)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.DoubleVector.unsliceTemplate(DoubleVector.java:2194)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.Double128Vector.unslice(Double128Vector.java:424)
14:01:39  	at jdk.incubator.vector/jdk.incubator.vector.Double128Vector.unslice(Double128Vector.java:41)
14:01:39  	at Double128VectorTests.unsliceDouble128VectorTestsMasked(Double128VectorTests.java:3743)

looks like #20389.

@mpirvu mpirvu merged commit e832477 into eclipse-openj9:master Dec 11, 2024
25 of 28 checks passed
@mpirvu
Copy link
Contributor

mpirvu commented Dec 17, 2024

FYI: I compared nightly builds OpenJ9-JDK17-x86-64_linux-20241211-000850 and OpenJ9-JDK17-x86-64_linux-20241212-001148 from AcmeAirEE8 rampup point of view. The latter (which includes this change) had a significantly worse rampup for the first warm run. The subsequent warm runs were good and even slightly better than the baseline from Dec11. Looking at compilation stats I observed that the warm run that had bad rampup also had many AOT compilations. This happens because my runs are short (2 minutes) and with just 1P there isn't enough time to perform all the needed AOT compilations in the cold run.
The hypothesis is that AOT compilations have become more expensive with this change. I ran a batch of 10 cold runs with no recompilation and CompCPU is about 5% larger.

Results for jdk: /team/mpirvu/sdks/OpenJ9-JDK17-x86-64_linux-20241211-000850 and opts: -Xjit:inhibitRecompilation -Xaot:forceaot,inhibitRecompilation -Xms256m -Xmx256m -Xscmx150m -Xshareclasses:name=acmeairscc,cacheDir=/tmp
Throughput stats: Avg=12545.3  StdDev=   83.8  Min=12452.4  Max=12724.5  Max/Min=   2% CI95=    0.5% numSamples= 10
Footprint stats:  Avg=  327.1  StdDev=    7.8  Min=  315.8  Max=  338.3  Max/Min=   7% CI95=    1.7% numSamples= 10
Comp CPU stats:   Avg=   75.7  StdDev=    2.0  Min=   71.9  Max=   78.7  Max/Min=  10% CI95=    1.9% numSamples= 10
StartupTime stats:Avg= 7454.5  StdDev=  317.5  Min= 6815.0  Max= 7935.0  Max/Min=  16% CI95=    3.0% numSamples= 10

Results for jdk: /team/mpirvu/sdks/OpenJ9-JDK17-x86-64_linux-20241212-001148 and opts: -Xjit:inhibitRecompilation -Xaot:forceaot,inhibitRecompilation -Xms256m -Xmx256m -Xscmx150m -Xshareclasses:name=acmeairscc,cacheDir=/tmp
Throughput stats: Avg=12468.2  StdDev=  152.2  Min=12295.6  Max=12760.1  Max/Min=   4% CI95=    0.9% numSamples=  9
Footprint stats:  Avg=  330.3  StdDev=    4.7  Min=  322.3  Max=  336.3  Max/Min=   4% CI95=    1.0% numSamples= 10
Comp CPU stats:   Avg=   79.8  StdDev=    1.9  Min=   77.7  Max=   83.6  Max/Min=   8% CI95=    1.7% numSamples= 10
StartupTime stats:Avg= 7438.6  StdDev=  374.8  Min= 6821.0  Max= 7864.0  Max/Min=  15% CI95=    3.6% numSamples= 10

In another experiment I ran with no AOT and just warm compilations, but I didn't see an increase in CompCPU, so the effecr is limited to AOT compilations.
Maybe we should consider doing the CP resolution only for AOT loads and not for all compilations.

@cjjdespres
Copy link
Contributor Author

I can move it back to the SVM, changed to use a single VM access. I don't think it's critical to have it be active during code generation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants