-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GA-1534] interp_hybrid_to_pressure failing on large datasets #633
Comments
Hi, It Seems the same thing is happening to me. I can't go for the workaround with smaller data. I would greatly appreciate any solution to this. Original post: [https://discourse.pangeo.io/t/code-hangs-while-saving-dataset-to-disk-using-to-netcdf/4413]https://discourse.pangeo.io/t/code-hangs-while-saving-dataset-to-disk-using-to-netcdf/4413 I am trying to save MIROC6_historical gs://cmip6/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/6hrLev/ta/gn/v20191114/ after reading this above as ds1, I interpolate to get one pressure level import geocat.comp as gc ta = ds1.ta ps = ds1.ps hyam = ds1.a hybm = ds1.b p0 = ds1.p0 new_levels = np.array([85000]) ta_850 = gc.interpolation.interp_hybrid_to_pressure(ta, ps, hyam, hybm, p0, new_levels=new_levels,lev_dim=None, method=‘linear’, extrapolate=False, variable=None, t_bot=None, phi_sfc=None) Now I want to save the dataset to my local cluster. The code runs forever and the process is not complete. Any help in solving this will be appreciated. Thank you so much for your attention and participation. |
Hi, @sudhansu-s-rath thanks for the note! It's interesting that you're seeing this with Could you share what version of geocat-comp you're using and a bit more info about where you're running this? If you're looking for a workaround, you might try working on smaller temporal subsets of the data. If you're up for some deeper troubleshooting, I'd take a look at some of the diagnostics and task graphs from Dask if you haven't already. I hope to do some additional profiling of this function this week (it really hasn't been tested at scale) and might have some better suggestions then as well. |
Hi @kafitzgerald , Thanks for your response.
|
@sudhansu-s-rath I'm still looking into this, but wanted to get back to you with a few suggestions at least. It appears the dataset you're working with is quite large and when processing the full dataset the memory usage is quite significant (often spilling to disk in my case, which slows things down significantly) and the Dask task graph rather large. It's possible data transfer from cloud storage is playing a role here as well. I did have some luck using It sounds like you're interested in running this for the full dataset/simulation, but breaking this down into processing steps and then appending to a file or writing multiple intermediate files might be a good approach here (the calculations themselves are largely independent in time at least) and avoid a lot of the memory related issues. I think that's probably the most efficient currently available approach especially if subsetting to a specific area / time of interest first isn't an option. You may also find some relevant tips/resources for optimization and profiling here: https://docs.xarray.dev/en/latest/user-guide/dask.html#optimization-tips |
Noting this here for tracking purposes.
While some performance improvements were made in #592, we've gotten another report about problems with
interp_hybrid_to_pressure
. In particular, it's failing while operating on a larger dataset on Casper. I've been able to replicate the issue and it seems like it has to do with a very large Dask task graph. I'm guessing it's a combination of our internal usage of Dask in geocat-comp and the size of the dataset. We didn't see this failure while testing before because the dataset was much smaller and indeed when you subset the dataset temporally and run the function again it executes successfully.I've followed up with a temporary workaround, but it'd be good to prioritize this. This function gets a good bit of use and it's likely to come up again. I also suspect there's still a lot of performance improvements to be made here.
The text was updated successfully, but these errors were encountered: