Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializing and caching results of AOT compilations at the JITServer #13962

Merged

Conversation

AlexeyKhrabrov
Copy link
Contributor

This PR implements serialization of AOT methods compiled at the JITServer and storing them in its AOT cache.

@AlexeyKhrabrov
Copy link
Contributor Author

@mpirvu This PR is ready for review.
@dsouzai Could you please review it as well since it touches quite a bit of AOT compiler code?
Thanks.

@mpirvu mpirvu self-assigned this Nov 19, 2021
@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Nov 19, 2021
Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I only have some minor comments

runtime/compiler/runtime/JITServerAOTDeserializer.cpp Outdated Show resolved Hide resolved
runtime/compiler/control/JITServerCompilationThread.cpp Outdated Show resolved Hide resolved
runtime/compiler/control/MethodToBeCompiled.hpp Outdated Show resolved Hide resolved
@mpirvu
Copy link
Contributor

mpirvu commented Nov 23, 2021

In a follow on PR we may want to add a filter through client command line options, about which methods should be stored in the AOT server side cache: -Xaot:{methodname}(dontStoreInAOTServerCache)

@mpirvu
Copy link
Contributor

mpirvu commented Nov 23, 2021

There is a merge conflict due to my latest fix.

@dsouzai
Copy link
Contributor

dsouzai commented Nov 23, 2021

Please don't force push, in the middle of a review right now :)

@AlexeyKhrabrov
Copy link
Contributor Author

In a follow on PR we may want to add a filter through client command line options, about which methods should be stored in the AOT server side cache: -Xaot:{methodname}(dontStoreInAOTServerCache)

Yes, that would be helpful for diagnostics etc., I'll add the option in a future PR. Should there be separate options for store and load or just a single one for bypassing AOT cache altogether for a given method?

@AlexeyKhrabrov
Copy link
Contributor Author

There is a merge conflict due to my latest fix.

I guess we also need to wait until #13973 is merged because JITServer tests would currently fail.

@mpirvu
Copy link
Contributor

mpirvu commented Nov 23, 2021

Yes, I will not run tests before #13973 is merged and this code rebased.

Copy link
Contributor

@dsouzai dsouzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable; however I have a few questions/suggestions. Also, you should add doxygen comments for all the new APIs you added (including the helper methods).

void
J9::Compilation::addSerializationRecord(const AOTCacheRecord *record, uintptr_t reloDataOffset)
{
TR_ASSERT(_aotCacheStore, "Trying to add serialization record for compilation that is not an AOT cache store");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this TR_ASSERT_FATAL; TR_ASSERT isn't on by default.

@@ -40,9 +40,11 @@ namespace J9 { typedef J9::Compilation CompilationConnector; }
#include "infra/Statistics.hpp"
#include "env/CompilerEnv.hpp"
#include "env/OMRMemory.hpp"
#include "env/PersistentCollections.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to guarded by #ifdef J9VM_OPT_JITSERVER.

Comment on lines +245 to +248
acaRecord->setClassChainIdentifyingLoaderOffsetInSharedCache(reloTarget, classChainIdentifyingLoaderOffsetInSharedCache,
self(), classChainRecord);
acaRecord->setClassChainForInlinedMethod(reloTarget, classChainOffsetInSharedCache, self(), classChainRecord);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you have to use two different Class Chain records here? Because these are two different class chains.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for all other places below that have setClassChainIdentifyingLoaderOffsetInSharedCache and setClassChain*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AOT cache identifies class loaders by name of the 1st loaded class, not by its class chain. This class loader ID is obtained from a class chain record as the class loader ID of the root class of the chain. The setter implementations in relocation records do the right thing and call addClassChainSerializationRecord() for class chains and addClassLoaderSerializationRecord() for class chains identifying loaders. I'll explain this in the doc comments that I'll add for the add*SerializationRecord() methods.

J9::AheadOfTimeCompile::addClassSerializationRecord(const AOTCacheClassChainRecord *classChainRecord,
const uintptr_t *romClassOffsetAddr)
{
const AOTCacheClassRecord *record = classChainRecord ? classChainRecord->records()[0] : NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain how this is all laid out? classChainRecord is of type

class AOTCacheClassChainRecord final : public AOTCacheListRecord<ClassChainSerializationRecord, AOTCacheClassRecord>

which means
const R *const *records() const { return (const R *const *)_data.end(); }
becomes
const AOTCacheClassRecord *const *records() const { return (const AOTCacheClassRecord *const *)_data.end(); }
where D _data is now ClassChainSerializationRecord _data.

Searching for end() takes me to AOTSerializationRecord::end(), which is

const uint8_t *end() const { return (const uint8_t *)this + size(); }

I don't really understand how the data is laid out, because at first glance this looks like we're going out of bounds.. I think this needs a bunch of documentation to describe how all of this fits together,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layout of AOTCacheClassChainRecord is as follows:

  • AOTCacheClassChainRecord header
  • ClassChainSerializationRecord header
  • ClassSerializationRecord data: array of N class IDs
  • array of N AOTCacheClassRecord pointers

I'll add comments in JITServerAOTCache.hpp to better describe the layout. I'll also move classChainRecord->records()[0] to a separate helper method in AOTCacheClassChainRecord.

Comment on lines 5535 to 5540
{
uint32_t locationSize = 1; // see OMR::RuntimeAssumption::isForAddressMaterializationSequence
if (reloFlags(reloTarget) & needsFullSizeRuntimeAssumption)
locationSize = sizeof(uintptr_t);
uint32_t locationSize = 1; // see OMR::RuntimeAssumption::isForAddressMaterializationSequence
if (reloFlags(reloTarget) & needsFullSizeRuntimeAssumption)
locationSize = sizeof(uintptr_t);
createClassRedefinitionPicSite((void*)-1, (void*)reloLocation, locationSize, true, getMetadataAssumptionList(reloRuntime->exceptionTable()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well fix up the position of the brackets as well. Also maybe add brackets for the if clause.

runtime/compiler/control/JITClientCompilationThread.cpp Outdated Show resolved Hide resolved
runtime/compiler/control/JITClientCompilationThread.cpp Outdated Show resolved Hide resolved
@mpirvu
Copy link
Contributor

mpirvu commented Nov 23, 2021

This PR is ready for a rebase because #13973 was merged.

@AlexeyKhrabrov AlexeyKhrabrov force-pushed the jitserver_aot_serialization branch from 80c116c to 428c643 Compare November 24, 2021 04:16
@AlexeyKhrabrov
Copy link
Contributor Author

Addressed the comments and rebased.

@mpirvu
Copy link
Contributor

mpirvu commented Nov 24, 2021

Jenkins compile all jdk8

@mpirvu
Copy link
Contributor

mpirvu commented Nov 24, 2021

Let's wait for the compilation tests. Frankly, I thought they would be done by now.

@mpirvu
Copy link
Contributor

mpirvu commented Nov 24, 2021

aarch64_linux compilation is failing with

09:00:38  /home/jenkins/workspace/Build_JDK8_aarch64_linux_Personal/openj9/runtime/compiler/aarch64/codegen/J9AheadOfTimeCompile.cpp: In member function 'virtual void J9::ARM64::AheadOfTimeCompile::processRelocations()':
09:00:38  /home/jenkins/workspace/Build_JDK8_aarch64_linux_Personal/openj9/runtime/compiler/aarch64/codegen/J9AheadOfTimeCompile.cpp:75:45: error: invalid use of member function 'TR::Compilation* OMR::AheadOfTimeCompile::comp()' (did you forget the '()' ?)
09:00:38            TR::SymbolValidationManager *svm = comp->getSymbolValidationManager();
09:00:38                                               ^~~~
09:00:38  /home/jenkins/workspace/Build_JDK8_aarch64_linux_Personal/openj9/runtime/compiler/aarch64/codegen/J9AheadOfTimeCompile.cpp:75:49: error: base operand of '->' is not a pointer
09:00:38            TR::SymbolValidationManager *svm = comp->getSymbolValidationManager();
09:00:38                                                   ^~
09:00:38  runtime/compiler/CMakeFiles/j9jit.dir/build.make:2476: recipe for target 'runtime/compiler/CMakeFiles/j9jit.dir/aarch64/codegen/J9AheadOfTimeCompile.cpp.o' failed
09:00:38  gmake[4]: *** [runtime/compiler/CMakeFiles/j9jit.dir/aarch64/codegen/J9AheadOfTimeCompile.cpp.o] Error 1
09:00:38  gmake[4]: *** Waiting for unfinished jobs....

comp needs to be replaced with self()->comp()

@AlexeyKhrabrov
Copy link
Contributor Author

comp needs to be replaced with self()->comp()

Oops, copy-paste mistake during refactoring. Will push the fix shortly.

For each client SCC offset written into AOT relocation data during
out-of-process AOT compilations, we remember the corresponding AOT
cache record (creating it on demand if necessary) and the offset
into AOT relocation data where the SCC offset is stored. The list of
<AOT cache record, relocation data offset> pairs is stored in the
compilation object. This list will be used to store the serialized
AOT method in the JITServer AOT cache at the end of the compilation.

Signed-off-by: Alexey Khrabrov <[email protected]>
Check that the AOT method header version is correct and that each
SCC offset being updated is located within AOT relocation data.

Signed-off-by: Alexey Khrabrov <[email protected]>
Information from the client needed to create the record (class chain,
etc.) and method index are included with the compilation request.

Signed-off-by: Alexey Khrabrov <[email protected]>
The code previously incorrectly assumed that the only case when the
client can abort the compilation early is when it replies to the message
requesting the whole CHTable, at which point the last processed critical
sequence number is not yet updated in the client session.

Aborts can also occur when requesting uncached classes or VMInfo from
the client, which can happen after the sequence number is already
updated. To distinguish these two cases, we add a flag to keep track
of whether the sequence number was already updated or not.

Signed-off-by: Alexey Khrabrov <[email protected]>
Signed-off-by: Alexey Khrabrov <[email protected]>
…dRemote

Move the newly added fields used by out-of-process compilations stored
in AOT cache to CompilationInfoPerThreadRemote to reduce the memory
overhead - there is one instance per server compilation thread rather
than per compilation queue entry on both the server and the client.

Signed-off-by: Alexey Khrabrov <[email protected]>
@AlexeyKhrabrov AlexeyKhrabrov force-pushed the jitserver_aot_serialization branch from 428c643 to 287721e Compare November 24, 2021 16:08
@AlexeyKhrabrov
Copy link
Contributor Author

Done. By the way, most of the code in AheadOfTimeCompile::processRelocations() is the same across architectures and can be factored out.

@mpirvu
Copy link
Contributor

mpirvu commented Nov 24, 2021

Jenkins compile all jdk8

@mpirvu
Copy link
Contributor

mpirvu commented Nov 24, 2021

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk11

@mpirvu
Copy link
Contributor

mpirvu commented Nov 24, 2021

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk11

@mpirvu
Copy link
Contributor

mpirvu commented Nov 25, 2021

Tests on zLinux passed.
Tests on pLinux failed due to infra
Tests on xLinux timedout

13:26:30  Found a total of 20 nodes with the 'ci.role.test&&hw.arch.x86&&sw.os.linux' label
[Pipeline] echo
13:26:30  machine limit is 20
[Pipeline] timeout
13:26:30  Timeout set to expire in 1 hr 0 min
[Pipeline] {
[Pipeline] copyArtifacts
[Pipeline] }
[Pipeline] // timeout
[Pipeline] echo
13:26:31  Cannot get cached TRSS JSON data. Skipping copyArtifacts...
[Pipeline] timeout
13:26:31  Timeout set to expire in 2 hr 0 min
[Pipeline] {
[Pipeline] copyArtifacts
15:26:31  Cancelling nested steps due to timeout
[Pipeline] }
[Pipeline] // timeout
[Pipeline] echo
15:26:31  Cannot run copyArtifacts from test.getDependency. Skipping copyArtifacts...
[Pipeline] sh
[Pipeline] }
[Pipeline] // timestamps
[Pipeline] }
[Pipeline] // stage
[Pipeline] echo
terminateJavaProcesses iteration: 1
[Pipeline] echo
PROCESSCATCH: Checking for any leftover java processes on the machine
[Pipeline] sh
[Pipeline] }
[Pipeline] // timeout
[Pipeline] echo
Exception: java.lang.InterruptedException
[Pipeline] }
[Pipeline] // node
[Pipeline] }
[Pipeline] // stage
[Pipeline] End of Pipeline
Finished: FAILURE

@mpirvu
Copy link
Contributor

mpirvu commented Nov 25, 2021

Actually, the tests on xLinux appear to have passed now: https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/1241/

@mpirvu
Copy link
Contributor

mpirvu commented Nov 25, 2021

Jenkins compile all jdk8

@mpirvu
Copy link
Contributor

mpirvu commented Nov 25, 2021

All compilation tests have passed as well: https://openj9-jenkins.osuosl.org/job/PullRequest-OpenJ9/1247/

@mpirvu
Copy link
Contributor

mpirvu commented Nov 25, 2021

@dsouzai If you agree, please approve this PR. Thanks

@mpirvu mpirvu merged commit 73dac1a into eclipse-openj9:master Nov 25, 2021
@AlexeyKhrabrov AlexeyKhrabrov deleted the jitserver_aot_serialization branch November 25, 2021 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants