diff --git a/README.md b/README.md index 1445dba..530f8e2 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,12 @@ # Scalable -[v0.6.1](https://github.com/JGCRI/scalable/tree/0.6.1) +[v0.6.2](https://github.com/JGCRI/scalable/tree/0.6.2) Scalable is a Python library which aids in running complex workflows on HPCs by orchestrating multiple containers, requesting appropriate HPC jobs to the scheduler, and providing a python environment for distributed computing. It's designed to be primarily used with JGCRI Climate Models but can be easily adapted for any arbitrary uses. +## Documentation + +The documentation for Scalable is hosted on [readthedocs](https://scalable.readthedocs.io). + ## Installation Use the package manager [pip](https://pip.pypa.io/en/stable/) to install scalable. diff --git a/docs/conf.py b/docs/conf.py index 5caa4f0..f2b35c7 100755 --- a/docs/conf.py +++ b/docs/conf.py @@ -15,7 +15,7 @@ project = 'Scalable' copyright = '2024, Joint Global Change Research Institute' author = 'Shashank Lamba, Pralit Patel' -release = '0.6.0' +release = '0.6.2' # -- General configuration --------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration diff --git a/docs/images/error1.png b/docs/images/error1.png new file mode 100644 index 0000000..c0463fa Binary files /dev/null and b/docs/images/error1.png differ diff --git a/docs/index.rst b/docs/index.rst index 85d1fd2..1cc7a58 100755 --- a/docs/index.rst +++ b/docs/index.rst @@ -33,7 +33,6 @@ Contents: --------- .. toctree:: - :maxdepth: 1 :caption: API workers @@ -41,14 +40,17 @@ Contents: functions .. toctree:: - :maxdepth: 1 :caption: How-tos cache_hash container .. toctree:: - :maxdepth: 1 :caption: Demo demo + +.. toctree:: + :caption: Common Issues + + issues diff --git a/docs/issues.rst b/docs/issues.rst new file mode 100644 index 0000000..ee21cea --- /dev/null +++ b/docs/issues.rst @@ -0,0 +1,62 @@ +Common Issues +============= + +Pickling Error +-------------- + +This page outlines some of the common problems and caveats still present in +the current version of scalable. While some of them are being worked on, others +may be inherent to dask. + +To start with, let's look at something which we is already used in the +:doc:`demo`: + +.. code-block:: python + + @cacheable + def run_stitches(recipe, output_path): + import stitches + import dask + ## The dask config is set to synchronous to avoid any issues. + with dask.config.set(scheduler="synchronous"): + outputs = stitches.gridded_stitching(output_path, recipe) + return outputs + +The above code is a simple function that runs +`stitches `_. The primary code line which +runs sitches is ran under the dask.config.set context manager. The scheduler is +set to synchronous in this case. The alternative would've been to write this +function as: + +.. code-block:: python + + @cacheable + def run_stitches(recipe, output_path): + import stitches + outputs = stitches.gridded_stitching(output_path, recipe) + return outputs + +The above code should've worked well. However, the following error is thrown +when the function is called with a dask client (scalable): + +.. image:: images/error1.png + :align: center + +The error thrown above is a pickling error. This happens because dask tries to +use multiple different workers to make the dask task graph. However, since our +workers have different environments, the `run_stitches` task cannot be pickled +by other workers. Therefore, whenever this issue is encountered, it is +recommended to set the scheduler to be "synchronous" which means that it will +pickle the task and run it on the same specified worker. + +General Errors +-------------- + +There can also be just general errors which are either thrown by dask or are +manifested in the form of workers which didn't connect or slurm errors. There +are mechanisms within Scalable which should warn about any workers which +couldn't connect for whatever reason. However, as a rule of thumb, restarting +the cluster and the workflow is the best way to resolve any one time errors. +HPC systems can be unreliable and throw unknown errors sometimes. As always, +please feel free to open an issue +`here `_ for any persistent issues. \ No newline at end of file