Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrades to Numpy 2 but Numpy 1.x needed for Databricks #41

Open
gdubs89 opened this issue Sep 26, 2024 · 3 comments
Open

Upgrades to Numpy 2 but Numpy 1.x needed for Databricks #41

gdubs89 opened this issue Sep 26, 2024 · 3 comments

Comments

@gdubs89
Copy link

gdubs89 commented Sep 26, 2024

Hi ,
When I use the suggested init script:

#!/bin/bash
/databricks/python/bin/pip install --upgrade dask[complete] dask-databricks
dask databricks run

once the cluster has started up and run the script, when I open a notebook, I literally cannot do anything. Before even importing anything, if I just try to run print(1+2), I get:

Failure starting repl. Try detaching and re-attaching the notebook.

at com.databricks.spark.chauffeur.ExecContextState.processInternalMessage(ExecContextState.scala:347)

Have restarted the cluster multiple times, cleared state, detaching and re-attaching, etc.

Databricks assistant thinks it's to do with numpy versions, and indeed checking through the driver logs, there's a number of references to

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.1 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

I'm using databricks runtime 15.4 LTS. I tried using the innit script as-is, I've also tried pinning to earlier releases of dask-databricks, but always get the same issue

@jacobtomlinson
Copy link
Collaborator

What happens if you limit it to numpy 1?

/databricks/python/bin/pip install --upgrade dask[complete] dask-databricks "numpy==1.*"

@gdubs89
Copy link
Author

gdubs89 commented Sep 26, 2024

yup, that fixes it, thanks

@jacobtomlinson jacobtomlinson changed the title init script corrupts the cluster Upgrades to Numpy 2 but Numpy 1.x needed for Databricks Sep 26, 2024
@jacobtomlinson
Copy link
Collaborator

My guess is that Databricks will adopt Numpy 2 at some point. But in the meantime we should avoid upgrading it. I'll leave this issue open so we can make a docs fix for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants