-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spark350emr shim layer [EMR] #10202
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -595,6 +595,27 @@ | |
<module>delta-lake/delta-stub</module> | ||
</modules> | ||
</profile> | ||
<profile> | ||
<id>release350emr</id> | ||
<activation> | ||
<property> | ||
<name>buildver</name> | ||
<value>350emr</value> | ||
</property> | ||
</activation> | ||
<properties> | ||
<buildver>350emr</buildver> | ||
<spark.version>${spark350emr.version}</spark.version> | ||
<spark.test.version>${spark350emr.version}</spark.test.version> | ||
<parquet.hadoop.version>1.13.1</parquet.hadoop.version> | ||
<iceberg.version>${spark330.iceberg.version}</iceberg.version> | ||
<slf4j.version>2.0.7</slf4j.version> | ||
</properties> | ||
<modules> | ||
<module>shim-deps/emr</module> | ||
<module>delta-lake/delta-stub</module> | ||
</modules> | ||
</profile> | ||
<profile> | ||
<id>release351</id> | ||
<activation> | ||
|
@@ -782,6 +803,7 @@ | |
<spark332db.version>3.3.2-databricks</spark332db.version> | ||
<spark341db.version>3.4.1-databricks</spark341db.version> | ||
<spark350.version>3.5.0</spark350.version> | ||
<spark350emr.version>3.5.0-amzn-0</spark350emr.version> | ||
<spark351.version>3.5.1-SNAPSHOT</spark351.version> | ||
<mockito.version>3.12.4</mockito.version> | ||
<scala.plugin.version>4.3.0</scala.plugin.version> | ||
|
@@ -791,6 +813,7 @@ | |
<guava.cdh.version>30.0-jre</guava.cdh.version> | ||
<arrow.cdh.version>2.0.0</arrow.cdh.version> | ||
<slf4j.version>1.7.30</slf4j.version> | ||
<log4j.version>2.20.0</log4j.version> | ||
<flatbuffers.java.version>1.11.0</flatbuffers.java.version> | ||
<hadoop.client.version>3.3.1</hadoop.client.version> | ||
<iceberg.version>0.13.2</iceberg.version> | ||
|
@@ -831,7 +854,8 @@ | |
340, | ||
341, | ||
342, | ||
350 | ||
350, | ||
350emr | ||
</noSnapshot.buildvers> | ||
<snapshot.buildvers> | ||
351 | ||
|
@@ -933,6 +957,34 @@ | |
<artifactId>jucx</artifactId> | ||
<version>${ucx.version}</version> | ||
</dependency> | ||
<!-- --> | ||
<dependency> | ||
<groupId>org.slf4j</groupId> | ||
<artifactId>slf4j-api</artifactId> | ||
<version>${slf4j.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.logging.log4j</groupId> | ||
<artifactId>log4j-slf4j2-impl</artifactId> | ||
<version>${log4j.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.logging.log4j</groupId> | ||
<artifactId>log4j-api</artifactId> | ||
<version>${log4j.version}</version> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.logging.log4j</groupId> | ||
<artifactId>log4j-core</artifactId> | ||
<version>${log4j.version}</version> | ||
</dependency> | ||
<dependency> | ||
<!-- API bridge between log4j 1 and 2 --> | ||
<groupId>org.apache.logging.log4j</groupId> | ||
<artifactId>log4j-1.2-api</artifactId> | ||
<version>${log4j.version}</version> | ||
</dependency> | ||
<!-- --> | ||
Comment on lines
+960
to
+987
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curious why all of these were added? Especially wondering about the log4j dependencies since we normally want to use slf4j APIs when logging instead of hitting log4j. Normally we leave these sort of dependencies out because we explicitly want to pull in whatever version Spark is using and compile against that. The risk of explicitly pulling this in is that it might conflict with the version used by the particular Spark version we're compiling against and instead of a compile time problem we get a runtime problem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added this because I encountered errors related to log4j & slf4j such as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems to imply there's an issue with the test classpath where the APIs that were available at compile-time aren't present at runtime. I'd have Maven dump the compile and test classpaths and see if you can find jars that are missing in test vs. compile that would explain it. We should be picking these up as transitive dependencies from the Spark jars. Maybe there's an issue with the EMR Spark dependencies where that's somehow not the case (but only at test runtime?!). |
||
<dependency> | ||
<groupId>org.slf4j</groupId> | ||
<artifactId>jul-to-slf4j</artifactId> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<!-- | ||
Copyright (c) 2023, NVIDIA CORPORATION. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<parent> | ||
<groupId>com.nvidia</groupId> | ||
<artifactId>rapids-4-spark-parent_2.12</artifactId> | ||
<version>24.02.0-SNAPSHOT</version> | ||
<relativePath>../../pom.xml</relativePath> | ||
</parent> | ||
<artifactId>rapids-4-spark-emr-bom</artifactId> | ||
<packaging>pom</packaging> | ||
<description>EMR Shim Dependencies</description> | ||
<version>24.02.0-SNAPSHOT</version> | ||
|
||
<properties> | ||
<rapids.module>../shim-deps/emr</rapids.module> | ||
</properties> | ||
|
||
<!-- | ||
This module is going to be used as a provided dependency. | ||
The dependencies below are compile-scope so that they are propagated to dependents as provided. | ||
--> | ||
<dependencies> | ||
<dependency> | ||
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-sql_${scala.binary.version}</artifactId> | ||
<scope>compile</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-hive_${scala.binary.version}</artifactId> | ||
<scope>compile</scope> | ||
<exclusions> | ||
<exclusion> | ||
<groupId>org.apache.arrow</groupId> | ||
<artifactId>*</artifactId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
</dependencies> | ||
</project> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<!-- | ||
Copyright (c) 2023, NVIDIA CORPORATION. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<parent> | ||
<groupId>com.nvidia</groupId> | ||
<artifactId>rapids-4-spark-parent_2.12</artifactId> | ||
<version>24.02.0-SNAPSHOT</version> | ||
<relativePath>../../pom.xml</relativePath> | ||
</parent> | ||
<artifactId>rapids-4-spark-emr-bom</artifactId> | ||
<packaging>pom</packaging> | ||
<description>EMR Shim Dependencies</description> | ||
<version>24.02.0-SNAPSHOT</version> | ||
|
||
<properties> | ||
<rapids.module>../shim-deps/emr</rapids.module> | ||
</properties> | ||
|
||
<!-- | ||
This module is going to be used as a provided dependency. | ||
The dependencies below are compile-scope so that they are propagated to dependents as provided. | ||
--> | ||
<dependencies> | ||
<dependency> | ||
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-sql_${scala.binary.version}</artifactId> | ||
<scope>compile</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.apache.spark</groupId> | ||
<artifactId>spark-hive_${scala.binary.version}</artifactId> | ||
<scope>compile</scope> | ||
<exclusions> | ||
<exclusion> | ||
<groupId>org.apache.arrow</groupId> | ||
<artifactId>*</artifactId> | ||
</exclusion> | ||
</exclusions> | ||
</dependency> | ||
</dependencies> | ||
</project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we actually should have the fix in our spark351 shim (for the Spark 3.5.1 SNAPSHOT), so you should be able to incorporate the multiplication fix into the EMR shim layer. See #9859 and #9962.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll try to incorporate the mentioned fix.