Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releasing v0.6.2 #12

Merged
merged 7 commits into from
Dec 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
# Scalable
[v0.6.1](https://github.com/JGCRI/scalable/tree/0.6.1)
[v0.6.2](https://github.com/JGCRI/scalable/tree/0.6.2)

Scalable is a Python library which aids in running complex workflows on HPCs by orchestrating multiple containers, requesting appropriate HPC jobs to the scheduler, and providing a python environment for distributed computing. It's designed to be primarily used with JGCRI Climate Models but can be easily adapted for any arbitrary uses.

## Documentation

The documentation for Scalable is hosted on [readthedocs](https://scalable.readthedocs.io).

## Installation

Use the package manager [pip](https://pip.pypa.io/en/stable/) to install scalable.
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
project = 'Scalable'
copyright = '2024, Joint Global Change Research Institute'
author = 'Shashank Lamba, Pralit Patel'
release = '0.6.0'
release = '0.6.2'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
Binary file added docs/images/error1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 5 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,22 +33,24 @@ Contents:
---------

.. toctree::
:maxdepth: 1
:caption: API

workers
caching
functions

.. toctree::
:maxdepth: 1
:caption: How-tos

cache_hash
container

.. toctree::
:maxdepth: 1
:caption: Demo

demo

.. toctree::
:caption: Common Issues

issues
62 changes: 62 additions & 0 deletions docs/issues.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Common Issues
=============

Pickling Error
--------------

This page outlines some of the common problems and caveats still present in
the current version of scalable. While some of them are being worked on, others
may be inherent to dask.

To start with, let's look at something which we is already used in the
:doc:`demo`:

.. code-block:: python

@cacheable
def run_stitches(recipe, output_path):
import stitches
import dask
## The dask config is set to synchronous to avoid any issues.
with dask.config.set(scheduler="synchronous"):
outputs = stitches.gridded_stitching(output_path, recipe)
return outputs

The above code is a simple function that runs
`stitches <https://github.com/JGCRI/stitches>`_. The primary code line which
runs sitches is ran under the dask.config.set context manager. The scheduler is
set to synchronous in this case. The alternative would've been to write this
function as:

.. code-block:: python

@cacheable
def run_stitches(recipe, output_path):
import stitches
outputs = stitches.gridded_stitching(output_path, recipe)
return outputs

The above code should've worked well. However, the following error is thrown
when the function is called with a dask client (scalable):

.. image:: images/error1.png
:align: center

The error thrown above is a pickling error. This happens because dask tries to
use multiple different workers to make the dask task graph. However, since our
workers have different environments, the `run_stitches` task cannot be pickled
by other workers. Therefore, whenever this issue is encountered, it is
recommended to set the scheduler to be "synchronous" which means that it will
pickle the task and run it on the same specified worker.

General Errors
--------------

There can also be just general errors which are either thrown by dask or are
manifested in the form of workers which didn't connect or slurm errors. There
are mechanisms within Scalable which should warn about any workers which
couldn't connect for whatever reason. However, as a rule of thumb, restarting
the cluster and the workflow is the best way to resolve any one time errors.
HPC systems can be unreliable and throw unknown errors sometimes. As always,
please feel free to open an issue
`here <https://github.com/JGCRI/scalable/issues>`_ for any persistent issues.
Loading