Merge pull request #12 from JGCRI/develop

Releasing v0.6.2
JGCRI · Dec 24, 2024 · e0aa6a6 · e0aa6a6
2 parents 70f27d9 + c9c2012
commit e0aa6a6
Show file tree

Hide file tree

Showing 5 changed files with 73 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,12 @@
 # Scalable 
-[v0.6.1](https://github.com/JGCRI/scalable/tree/0.6.1)
+[v0.6.2](https://github.com/JGCRI/scalable/tree/0.6.2)
 
 Scalable is a Python library which aids in running complex workflows on HPCs by orchestrating multiple containers, requesting appropriate HPC jobs to the scheduler, and providing a python environment for distributed computing. It's designed to be primarily used with JGCRI Climate Models but can be easily adapted for any arbitrary uses.
 
+## Documentation
+
+The documentation for Scalable is hosted on [readthedocs](https://scalable.readthedocs.io). 
+
 ## Installation
 
 Use the package manager [pip](https://pip.pypa.io/en/stable/) to install scalable.

diff --git a/docs/conf.py b/docs/conf.py
@@ -15,7 +15,7 @@
 project = 'Scalable'
 copyright = '2024, Joint Global Change Research Institute'
 author = 'Shashank Lamba, Pralit Patel'
-release = '0.6.0'
+release = '0.6.2'
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

diff --git a/docs/images/error1.png b/docs/images/error1.png
diff --git a/docs/index.rst b/docs/index.rst
@@ -33,22 +33,24 @@ Contents:
 ---------
 
 .. toctree::
-   :maxdepth: 1
    :caption: API
 
    workers
    caching
    functions
 
 .. toctree::
-   :maxdepth: 1
    :caption: How-tos
 
    cache_hash
    container
 
 .. toctree::
-   :maxdepth: 1
    :caption: Demo
 
    demo
+
+.. toctree::
+   :caption: Common Issues
+
+   issues
diff --git a/docs/issues.rst b/docs/issues.rst
@@ -0,0 +1,62 @@
+Common Issues
+=============
+
+Pickling Error
+--------------
+
+This page outlines some of the common problems and caveats still present in 
+the current version of scalable. While some of them are being worked on, others 
+may be inherent to dask. 
+
+To start with, let's look at something which we is already used in the 
+:doc:`demo`:
+
+.. code-block:: python
+
+    @cacheable
+    def run_stitches(recipe, output_path):
+        import stitches
+        import dask
+        ## The dask config is set to synchronous to avoid any issues. 
+        with dask.config.set(scheduler="synchronous"):
+            outputs = stitches.gridded_stitching(output_path, recipe)
+        return outputs
+
+The above code is a simple function that runs 
+`stitches <https://github.com/JGCRI/stitches>`_. The primary code line which 
+runs sitches is ran under the dask.config.set context manager. The scheduler is 
+set to synchronous in this case. The alternative would've been to write this 
+function as:
+
+.. code-block:: python
+
+    @cacheable
+    def run_stitches(recipe, output_path):
+        import stitches
+        outputs = stitches.gridded_stitching(output_path, recipe)
+        return outputs
+
+The above code should've worked well. However, the following error is thrown 
+when the function is called with a dask client (scalable):
+
+.. image:: images/error1.png
+    :align: center
+
+The error thrown above is a pickling error. This happens because dask tries to 
+use multiple different workers to make the dask task graph. However, since our 
+workers have different environments, the `run_stitches` task cannot be pickled 
+by other workers. Therefore, whenever this issue is encountered, it is 
+recommended to set the scheduler to be "synchronous" which means that it will 
+pickle the task and run it on the same specified worker. 
+
+General Errors
+--------------
+
+There can also be just general errors which are either thrown by dask or are 
+manifested in the form of workers which didn't connect or slurm errors. There 
+are mechanisms within Scalable which should warn about any workers which 
+couldn't connect for whatever reason. However, as a rule of thumb, restarting 
+the cluster and the workflow is the best way to resolve any one time errors. 
+HPC systems can be unreliable and throw unknown errors sometimes. As always, 
+please feel free to open an issue 
+`here <https://github.com/JGCRI/scalable/issues>`_ for any persistent issues.