Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to enforce/disable IProfiler during startup phase #18381

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

rmnattas
Copy link
Contributor

@rmnattas rmnattas commented Oct 31, 2023

Making a new option, -XX:[+/-]IProfileDuringStartupPhase, that enforce/disable IProfiler during the startup phase.

If the option is not used the default existing algorithm will determine the behaviour.
This option will replaces the TR_DisableNoIProfilerDuringStartupPhase environment variable and overrides -Xjit:noIProfilerDuringStartupPhase.

This option will be used in cold-runs/populate_scc scripts to ensure more IProfiler information are being collected and stored in the SCC.

Docs: eclipse-openj9/openj9-docs#1204

@rmnattas
Copy link
Contributor Author

@vijaysun-omr @mpirvu

@rmnattas rmnattas marked this pull request as draft October 31, 2023 13:22
@rmnattas rmnattas force-pushed the alwaysProfileXXOption branch from e3489bd to d46f9a4 Compare October 31, 2023 13:42
@mpirvu mpirvu self-assigned this Oct 31, 2023
Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both -XX:+ and -XX:- versions of the option. Not only this is consistent with the rest of the options, but allows the user more flexibility:

  • If nothing is specified, the JVM will disable the IProfiler collection for warm runs that use a non-bootstrap SCC and enable for all other runs
  • If -XX:+ is specified, the JVM will always collect IProfiler information during start-up, including warm runs with an explicit SCC
  • If -XX:- is specified, the JVM will never collect IProfiler information during start-up, even if it's a cold run or if it's a bootstrap SCC

For reference, see how the JIT processes the other XX options using the ExternalOptions enum and _externalOptionStrings array.
Options may need to have a second look after a CRIU restore.

runtime/oti/jvminit.h Outdated Show resolved Hide resolved
@rmnattas
Copy link
Contributor Author

@mpirvu Would it also be called -XX:[+/-]alwaysIProfileDuringStartupPhase? As -XX:-alwaysIProfileDuringStartupPhase can be understood differently than "never allow". And just -XX:[+/-]iProfileDuringStartupPhase doesn't completely imply the enforcement in the + version. I'm more inclined to the -XX:[+/-]iProfileDuringStartupPhase naming.

Also in regards to the -Xjit:noIProfilerDuringStartupPhase, would that option remain or be removed, as an equivalent to the -XX:-alwaysIProfileDuringStartupPhase?

@mpirvu
Copy link
Contributor

mpirvu commented Oct 31, 2023

I'm more inclined to the -XX:[+/-]iProfileDuringStartupPhase naming.

I am ok with that name.

Also in regards to the -Xjit:noIProfilerDuringStartupPhase, would that option remain or be removed, as an equivalent to the -XX:-alwaysIProfileDuringStartupPhase?

We can remove it later. Being a JIT option, we don't expect people to use in (except as a temporary workaround at our suggestion).

@vijaysun-omr
Copy link
Contributor

I skimmed the code and agree with Marius' feedback about +/- being needed. Otherwise it looks okay to me.

@rmnattas rmnattas force-pushed the alwaysProfileXXOption branch 2 times, most recently from 8b939d0 to 53d0c8b Compare October 31, 2023 20:07
@rmnattas rmnattas changed the title Add option to enforce IProfiler during startup phase Add option to enforce/disable IProfiler during startup phase Oct 31, 2023
@rmnattas
Copy link
Contributor Author

This PR is ready to be merged

@rmnattas rmnattas marked this pull request as ready for review October 31, 2023 20:29
Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation takes away 2 bit flags from the vm->extendedRuntimeFlags2 for something that the VM is not interested in.
I am suggesting the following:

  • add two new entries to the ExternalOptions enum corresponding to the 2 new options
  • add the names of the two new options to the _externalOptionsString
  • in fePostProcess() when we set TR_NoIProfilerDuringStartupPhase, use FIND_ARG_IN_VMARGS(EXACT_MATCH, ...) to detect which scenario we are in and set TR_NoIProfilerDuringStartupPhase appropriately. Search for similar examples in the code.
  • modify OptionsPostRestore::iterateOverExternalOptions() to process the new options post CRIU restore.

This approach does not require any modification to the VM code.

@rmnattas
Copy link
Contributor Author

rmnattas commented Nov 1, 2023

modify OptionsPostRestore::iterateOverExternalOptions() to process the new options post CRIU restore.

I'll have it do the same behaviour on the CRIU restore side as a non-CRIU run, which what would a user expect.
Alternatively we can have it optimizes the effect of the option, and whether TR_NoIProfilerDuringStartupPhase is set, for a CRIU restore context. A question here is if a JVM can still be in the startup phase after restore.

New option: `-XX:[+/-]IProfileDuringStartupPhase`
Overrides `-Xjit:noIProfilerDuringStartupPhase`

Imitate the behaviour for CRIU restore

Signed-off-by: Abdulrahman Alattas <[email protected]>
@rmnattas rmnattas force-pushed the alwaysProfileXXOption branch from 53d0c8b to a92489d Compare November 2, 2023 14:11
@rmnattas
Copy link
Contributor Author

rmnattas commented Nov 2, 2023

Latest commit changes:

  • Use ExternalOptions instead of a vm->extendedRuntimeFlags2
  • Change option name from -XX:[+/-]iProfileDuringStartupPhase to -XX:[+/-]IProfileDuringStartupPhase (capitalize the "i") to be more consistent with the rest of the codebase.
  • Imitate the -XX:[+/-]IProfileDuringStartupPhase option and TR_NoIProfilerDuringStartupPhase default behaviours in CRIU restore.

Possible TODO:
The variable naming in feLatePostProcess() is not consistent. Causing the need of a separate scope to match the expected variable naming from the macro. Alternatively a change to the naming of the variables in the method is needed.

{
// The FIND_ARG_IN_VMARGS macro expect the J9JavaVM to be in the `vm` variable, instead of `javaVM`
// The method uses the `vm` variable for the TR_J9VMBase
J9JavaVM * vm = javaVM;
const char *xxIProfileDuringStartupPhase = J9::Options::_externalOptionStrings[J9::ExternalOptions::XXplusIProfileDuringStartupPhase];
const char *xxDisableIProfileDuringStartupPhase = J9::Options::_externalOptionStrings[J9::ExternalOptions::XXminusIProfileDuringStartupPhase];
int32_t xxIProfileDuringStartupPhaseArgIndex = FIND_ARG_IN_VMARGS(EXACT_MATCH, xxIProfileDuringStartupPhase, 0);
int32_t xxDisableIProfileDuringStartupPhaseArgIndex = FIND_ARG_IN_VMARGS(EXACT_MATCH, xxDisableIProfileDuringStartupPhase, 0);
if (xxIProfileDuringStartupPhaseArgIndex > xxDisableIProfileDuringStartupPhaseArgIndex)
self()->setOption(TR_NoIProfilerDuringStartupPhase, false); // Override -Xjit:noIProfilerDuringStartupPhase
else if (xxDisableIProfileDuringStartupPhaseArgIndex >= 0)
self()->setOption(TR_NoIProfilerDuringStartupPhase);
}

Other than that the PR should be ready to merge.

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mpirvu
Copy link
Contributor

mpirvu commented Nov 2, 2023

jenkins compile all jdk17

@mpirvu
Copy link
Contributor

mpirvu commented Nov 2, 2023

jenkins test sanity zlinux jdk17

@mpirvu
Copy link
Contributor

mpirvu commented Nov 3, 2023

There were 2 failures on zLinux, but I don't see how they are related to this PR

Testing: Create CRIU checkpoint image and restore once - testMXBeanUpTime
Test start time: 2023/11/02 21:59:20 Coordinated Universal Time
Running command: bash /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_1/jdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+JVMPortableRestoreMode  -Xtrace:print={j9criu,j9jcl.533} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED" org.openj9.criu.TimeChangeTest testMXBeanUpTime 1 false false
Time spent starting: 1 milliseconds
Time spent executing: 27556 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_1/jdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -Xjit:count=0 -XX:+JVMPortableRestoreMode  -Xtrace:print={j9criu,j9jcl.533} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED -cp /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.TimeChangeTest testMXBeanUpTime 1
 [OUT] 21:59:38.611*0x1781800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698962378609) vmStartTime (1698962360320) criuTimeDeltaMillis (0)
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Thu Nov 02 21:59:40 UTC 2023, System.currentTimeMillis(): 1698962380826, System.nanoTime(): 1698962380822026076
 [OUT] 21:59:43.630 0x1781800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 21:59:43.630 0x1781800          j9criu.18       - Taking a checkpoint with active clinit
 [OUT] 21:59:43.644 0x1781800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1698962383640066984, checkpointNanoUTCTime = 1698962383644254473
 [OUT] 21:59:44.388 0x1781800          j9criu.12       - Current syslogOptions: error,vital
 [OUT] 21:59:44.388 0x1781800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1698962384384886938, j9time_current_time_nanos() returns 1698962384389073960
 [OUT] 21:59:47.357 0x1781800          j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1698962387353212625, j9time_current_time_nanos() returns 1698962387357409669
 [OUT] 21:59:47.357 0x1781800          j9criu.14       - After restore, restoreNanoUTCTime = 1698962387357409669, checkpointNanoUTCTime = 1698962383644254473, checkpointRestoreTimeDelta = 3713155196, restoreNanoTimeMonotonic = 1698962387353212625, checkpointNanoTimeMonotonic = 1698962383640066984, nanoTimeMonotonicClockDelta = 3713145641
 [OUT] JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
 [OUT] 21:59:47.425 0x1781800          j9criu.10       < Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 21:59:47.428 0x1781800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698962387428) vmStartTime (1698962360320) criuTimeDeltaMillis (3713)
 [OUT] FAILED: testMXBeanUpTime() - uptimeAfterCheckpoint 23395 can't be greater than uptimeBeforeCheckpoint 18289 + 1500
esting: Create CRIU checkpoint image and restore once - testMXBeanUpTime
Test start time: 2023/11/02 21:54:16 Coordinated Universal Time
Running command: bash /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuScript.sh /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_0/jdkbinary/j2sdk-image/bin/java " -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu,j9jcl.533} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED" org.openj9.criu.TimeChangeTest testMXBeanUpTime 1 false false
Time spent starting: 123 milliseconds
Time spent executing: 17099 milliseconds
Test result: FAILED
Output from test:
 [OUT] start running script
 [OUT] export GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVEC,-XSAVE,-AVX2,-ERMS,-AVX,-AVX_Fast_Unaligned_Load
 [OUT] export LD_BIND_NOT=on
 [OUT] /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_0/jdkbinary/j2sdk-image/bin/java -XX:+EnableCRIUSupport  -Xjit:count=0 -XX:+CRIURestoreNonPortableMode  -Xtrace:print={j9criu,j9jcl.533} --add-exports java.base/openj9.internal.criu=ALL-UNNAMED -cp /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_s390x_linux_Personal_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criu.jar org.openj9.criu.TimeChangeTest testMXBeanUpTime 1
 [OUT] 21:54:29.300*0x1e44800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698962069298) vmStartTime (1698962056740) criuTimeDeltaMillis (0)
 [OUT] Performing CRIUSupport.checkpointJVM(), current thread name: main, Thu Nov 02 21:54:29 UTC 2023, System.currentTimeMillis(): 1698962069526, System.nanoTime(): 1698962069523936975
 [OUT] 21:54:30.806 0x1e44800          j9criu.9        > Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 21:54:30.806 0x1e44800          j9criu.18       - Taking a checkpoint with active clinit
 [OUT] 21:54:30.821 0x1e44800          j9criu.13       - Before checkpoint, checkpointNanoTimeMonotonic = 1698962070818375088, checkpointNanoUTCTime = 1698962070822044197
 [OUT] 21:54:31.034 0x1e44800          j9criu.12       - Current syslogOptions: error,vital
 [OUT] 21:54:31.035 0x1e44800          j9criu.7        - Before checkpoint criu_dump(), j9time_nano_time() returns 1698962071032047212, j9time_current_time_nanos() returns 1698962071035716204
 [OUT] 21:54:33.504 0x1e44800          j9criu.15       - After checkpoint criu_dump(), j9time_nano_time() returns 1698962073501723152, j9time_current_time_nanos() returns 1698962073505402123
 [OUT] 21:54:33.504 0x1e44800          j9criu.14       - After restore, restoreNanoUTCTime = 1698962073505402123, checkpointNanoUTCTime = 1698962070822044197, checkpointRestoreTimeDelta = 2683357926, restoreNanoTimeMonotonic = 1698962073501723152, checkpointNanoTimeMonotonic = 1698962070818375088, nanoTimeMonotonicClockDelta = 2683348064
 [OUT] JVMJITM044W Some or all compiled code in the code cache invalidated post restore.
 [OUT] 21:54:33.587 0x1e44800          j9criu.10       < Java_org_eclipse_openj9_criu_CRIUSupport_checkpointJVMImpl
 [OUT] 21:54:33.587 0x1e44800           j9jcl.533      - RuntimeMXBeanImpl_getUptimeImpl timeNow (1698962073587) vmStartTime (1698962056740) criuTimeDeltaMillis (2683)
 [OUT] FAILED: testMXBeanUpTime() - uptimeAfterCheckpoint 14164 can't be greater than uptimeBeforeCheckpoint 12558 + 1500

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants