Example Code Throws Null Pointer Exception with 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12) #285
Replies: 15 comments 1 reply
-
Hyperspace currently supports only Spark 2.4 and support for Spark 3.0 is on the way: #85. |
Beta Was this translation helpful? Give feedback.
-
@sezruby Thanks! I also suspect the runtime version issue. Will create a new cluster with 2.4. |
Beta Was this translation helpful? Give feedback.
-
One more comment: @sezruby Spark 2.4 is scale-2.11. By default, Hyperspace seems compiled with Scala-2.12. |
Beta Was this translation helpful? Give feedback.
-
Yes it's 2.12 by default, so you need to change the version as 2.11 in |
Beta Was this translation helpful? Give feedback.
-
Or if you are using
, or do the following to build both:
|
Beta Was this translation helpful? Give feedback.
-
I switched to Databricks runtime 6.4 (includes Apache Spark 2.4.5, Scala 2.11) with ADLS Gen2 as my storage (also tried DBFS). It throws new exception if I try to create index.
|
Beta Was this translation helpful? Give feedback.
-
@sezruby I will first make it work for Spark 2.4 and rebuild runtime for Spark 3.0 later. |
Beta Was this translation helpful? Give feedback.
-
I believe |
Beta Was this translation helpful? Give feedback.
-
Glad to know it is not my problem. I can spend some time to see if we can walk-around or get some insights from DB engineering. I am closely working with them on other projects. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your interest @wangmiao1981! We are sorry you are hitting into this issue. We tested Hyperspace against open-source Apache Spark so there maybe a few minor modifications to make it work with Databricks. If you manage to find any workarounds or the changes needed to make it work after your discussion with DB engineering, please feel free to open a PR! Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Response from Databricks: I am following up with them how to import it in the source code, because serializableStatuses is a Private class afaik. I may be able to submit a PR once I get more details from Databricks. |
Beta Was this translation helpful? Give feedback.
-
@rapoth our team member Andrei made Hyperspace working with 5.5 LTS, Spark 2.4.3 last night. He will submit a PR soon. |
Beta Was this translation helpful? Give feedback.
-
This is great news! Thank you for the update and look forward to the PR! |
Beta Was this translation helpful? Give feedback.
-
Fixed by #303. |
Beta Was this translation helpful? Give feedback.
-
Describe the issue
I try to run some example code in a Spark notebook with 7.3 LTS (includes Apache Spark 3.0.1, Scala 2.12).
To Reproduce
Create a notebook and copy & paste the following code:
`
import org.apache.spark.sql.SparkSession
import com.microsoft.hyperspace._
import com.microsoft.hyperspace.Hyperspace
import com.microsoft.hyperspace.index.IndexConfig
val spark = SparkSession
.builder()
.appName("Hyperspace example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
val deptLocation = "dbfs:/departments/"
departments
.toDF("deptId", "deptName", "location")
.write
.mode("overwrite")
.parquet(deptLocation)
val departments = Seq(
(10, "Accounting", "New York"),
(20, "Research", "Dallas"),
(30, "Sales", "Chicago"),
(40, "Operations", "Boston"))
// Save example data records as Parquet.
import spark.implicits._
val empLocation = "dbfs:/employees/"
employees
.toDF("empId", "empName", "deptId")
.write
.mode("overwrite")
.parquet(empLocation)
val hyperspace = new Hyperspace(spark)
val deptDF = spark.read.parquet(deptLocation)
val empDF = spark.read.parquet(empLocation)
val deptIndexConfig = IndexConfig("deptIndex", Seq("deptId"), Seq("deptName"))
val empIndexConfig = IndexConfig("empIndex", Seq("empId"), Seq("empName"))
Exceptions:
java.lang.NullPointerExceptionat com.microsoft.hyperspace.index.sources.FileBasedSourceProviderManager.run(FileBasedSourceProviderManager.scala:96)
at com.microsoft.hyperspace.index.sources.FileBasedSourceProviderManager.signature(FileBasedSourceProviderManager.scala:80)
at com.microsoft.hyperspace.index.FileBasedSignatureProvider.$anonfun$fingerprintVisitor$1(FileBasedSignatureProvider.scala:53)
at com.microsoft.hyperspace.index.FileBasedSignatureProvider.$anonfun$fingerprintVisitor$1$adapted(FileBasedSignatureProvider.scala:51)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:190)
at com.microsoft.hyperspace.index.FileBasedSignatureProvider.fingerprintVisitor(FileBasedSignatureProvider.scala:51)
at com.microsoft.hyperspace.index.FileBasedSignatureProvider.signature(FileBasedSignatureProvider.scala:40)
at com.microsoft.hyperspace.index.IndexSignatureProvider.signature(IndexSignatureProvider.scala:45)
at com.microsoft.hyperspace.actions.CreateActionBase.getIndexLogEntry(CreateActionBase.scala:64)
at com.microsoft.hyperspace.actions.CreateAction.logEntry(CreateAction.scala:38)
at com.microsoft.hyperspace.actions.Action.begin(Action.scala:50)
at com.microsoft.hyperspace.actions.Action.run(Action.scala:90)
at com.microsoft.hyperspace.actions.Action.run$(Action.scala:83)
at com.microsoft.hyperspace.actions.CreateAction.run(CreateAction.scala:30)
at com.microsoft.hyperspace.index.IndexCollectionManager.create(IndexCollectionManager.scala:43)
at com.microsoft.hyperspace.index.CachingIndexCollectionManager.create(CachingIndexCollectionManager.scala:77)
at com.microsoft.hyperspace.Hyperspace.createIndex(Hyperspace.scala:43)
at line04b8f3115af7439a97e3ee6d9355b81c80.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(command-1356655024485756:1)
at line04b8f3115af7439a97e3ee6d9355b81c80.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(command-1356655024485756:70)
at line04b8f3115af7439a97e3ee6d9355b81c80.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(command-1356655024485756:72)
at line04b8f3115af7439a97e3ee6d9355b81c80.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(command-1356655024485756:74)`
Expected behavior
Environment
Beta Was this translation helpful? Give feedback.
All reactions