diff --git a/README.md b/README.md
index dee7a13311..cff9eb1abf 100644
--- a/README.md
+++ b/README.md
@@ -9,149 +9,39 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+![Python Version](https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Finstadeepai%2FMava%2Fdevelop%2Fpyproject.toml)
+[![Tests](https://github.com/instadeepai/Mava/actions/workflows/ci.yaml/badge.svg)](https://github.com/instadeepai/Mava/actions/workflows/ci.yaml)
+[![License](https://img.shields.io/badge/License-Apache%202.0-orange.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
+[![MyPy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)
+[![ArXiv](https://img.shields.io/badge/ArXiv-2410.01706-b31b1b.svg)](https://arxiv.org/abs/2410.01706)
+[![Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/instadeepai/Mava/blob/develop/examples/Quickstart.ipynb)
+
## Welcome to Mava! ๐ฆ
-[**Installation**](#installation-) | [**Quickstart**](#quickstart-)
+[**Installation**](#installation-) | [**Getting started**](#getting-started-)
-Mava provides simplified code for quickly iterating on ideas in multi-agent reinforcement learning (MARL) with useful implementations of MARL algorithms in JAX allowing for easy parallelisation across devices with JAX's `pmap`. Mava is a project originating in the Research Team at [InstaDeep](https://www.instadeep.com/).
-
-To join us in these efforts, please feel free to reach out, raise issues or read our [contribution guidelines](#contributing-) (or just star ๐ to stay up to date with the latest developments)!
+Mava allows researchers to experiment with multi-agent reinforcement learning (MARL) at lightning speed. The single-file JAX implementations are built for rapid research iteration - hack, modify, and test new ideas fast. Our [state-of-the-art algorithms][sable] scale seamlessly across devices. Created for researchers, by The Research Team at [InstaDeep](https://www.instadeep.com).
-## Overview ๐ฆ
+## Highlights ๐ฆ
-Mava currently offers the following building blocks for MARL research:
-
-- ๐ฅ **Implementations of MARL algorithms**: Implementations of multi-agent PPO systems that follow both the Centralised Training with Decentralised Execution (CTDE) and Decentralised Training with Decentralised Execution (DTDE) MARL paradigms.
-- ๐ฌ **Environment Wrappers**: Example wrappers for mapping Jumanji environments to an environment that is compatible with Mava. At the moment, we support [Robotic Warehouse][jumanji_rware] and [Level-Based Foraging][jumanji_lbf] with plans to support more environments soon. We have also recently added support for the SMAX environment from [JaxMARL][jaxmarl].
-- ๐ **Educational Material**: [Quickstart notebook][quickstart] to demonstrate how Mava can be used and to highlight the added value of JAX-based MARL.
+- ๐ฅ **Implementations of MARL algorithms**: Implementations of current state-of-the-art MARL algorithms that are distributed and effectively make use of available accelerators.
+- ๐ฌ **Environment Wrappers**: We provide first class support to a few JAX based MARL environment suites through the use of wrappers, however new environments can be easily added by using existing wrappers as a guide.
- ๐งช **Statistically robust evaluation**: Mava natively supports logging to json files which adhere to the standard suggested by [Gorsane et al. (2022)][toward_standard_eval]. This enables easy downstream experiment plotting and aggregation using the tools found in the [MARL-eval][marl_eval] library.
-
-## Performance and Speed ๐
-
-### SMAX
-For comparing Mavaโs stability to other JAX-based baseline algorithms, we train Mavaโs recurrent IPPO and MAPPO systems on a broad range of [SMAX][smax] tasks. In all cases we do not rerun baselines but instead take results for final win rates from the [JaxMARL technical report](https://arxiv.org/pdf/2311.10090.pdf). For the full SMAX experiments results, please see the following [page](docs/smax_benchmark.md).
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Mava Recurrent IPPO and MAPPO performance on the 3s5z
, 6h_vs_8z
and 3s5z_vs_3s6z
SMAX tasks.
-
-
-### Robotic Warehouse
-
-All of the experiments below were performed using an NVIDIA Quadro RTX 4000 GPU with 8GB Memory.
-
-In order to show the utility of end-to-end JAX-based MARL systems and JAX-based environments we compare the speed of Mava against [EPyMARL][epymarl] as measured in total training wallclock time on simple [Robotic Warehouse][rware] (RWARE) tasks with 2 and 4 agents. Our aim is to illustrate the speed increases that are possible with using end-to-end Jax-based systems and we do not necessarily make an effort to achieve optimal performance. For EPyMARL, we use the hyperparameters as recommended by [Papoudakis et al. (2020)](https://arxiv.org/pdf/2006.07869.pdf) and for Mava we performed a basic grid search. In both cases, systems were trained up to 20 million total environment steps using 16 vectorised environments.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Mava feedforward MAPPO performance on the tiny-2ag
, tiny-4ag
and small-4ag
RWARE tasks.
-
-
-
-### ๐ An important note on the differences in converged performance
-
-In order to benefit from the wallclock speed-ups afforded by JAX-based systems it is required that environments also be written in JAX. It is for this reason that Mava does not use the exact same version of the RWARE environment as EPyMARL but instead uses a JAX-based implementation of RWARE found in [Jumanji][jumanji_rware], under the name RobotWarehouse. One of the notable differences in the underlying environment logic is that RobotWarehouse will not attempt to resolve agent collisions but will instead terminate an episode when agents do collide. In our experiments, this appeared to make the environment more challenging. For this reason we show the performance of Mava on Jumanji with and without termination upon collision indicated with `w/o collision` in the figure legends. For a more detailed discussion, please see the following [page](docs/jumanji_rware_comparison.md).
-
-### Level-Based Foraging
-Mava also supports [Jumanji][jumanji_lbf]'s LBF. We evaluate Mava's recurrent MAPPO system on LBF, against [EPyMARL][epymarl] (we used original [LBF](https://github.com/semitable/lb-foraging) for EPyMARL) in 2 and 4 agent settings up to 20 million timesteps. Both systems were trained using 16 vectorized environments. For the EPyMARL systems we use a NVIDIA A100 GPU and for the Mava systems we use a GeForce RTX 3050 laptop GPU with 4GB of memory. To show how Mava can generalise to different hardware, we also train the Mava systems on a TPU v3-8. We plan to publish comprehensive performance benchmarks for all Mava's algorithms across various LBF scenarios soon.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Mava Recurrent MAPPO performance on the 2s-8x8-2p-2f-coop
, and 15x15-4p-3fz
Level-Based Foraging tasks.
-
-
-### ๐งจ Steps per second experiments using vectorised environments
-
-Furthermore, we illustrate the speed of Mava by showing the steps per second as the number of parallel environments is increased. These steps per second scaling plots were computed using a standard laptop GPU, specifically an RTX-3060 GPU with 6GB memory.
-
-
-
-
-
-
-
-
-
-
Mava steps per second scaling with increased vectorised environments and total training run time for 20M environment steps.
-
-
-## Code Philosophy ๐ง
-
-The current code in Mava is adapted from [PureJaxRL][purejaxrl] which provides high-quality single-file implementations with research-friendly features. In turn, PureJaxRL is inspired by the code philosophy from [CleanRL][cleanrl]. Along this vein of easy-to-use and understandable RL codebases, Mava is not designed to be a modular library and is not meant to be imported. Our repository focuses on simplicity and clarity in its implementations while utilising the advantages offered by JAX such as `pmap` and `vmap`, making it an excellent resource for researchers and practitioners to build upon.
+- ๐ฅ๏ธ **JAX Distrubution Architectures for Reinforcement Learning**: Mava supports both [Podracer][anakin_paper] architectures for scaling RL systems. The first of these is _Anakin_, which can be used when environments are written in JAX. This enables end-to-end JIT compilation of the full MARL training loop for fast experiment run times on hardware accelerators. The second is _Sebulba_, which can be used when environments are not written in JAX. Sebulba is particularly useful when running RL experiments where a hardware accelerator can interact with many CPU cores at a time.
+- โก **Blazingly fast experiments**: All of the above allow for very quick runtime for our experiments, especially when compared to other non-JAX based MARL libraries.
## Installation ๐ฌ
-At the moment Mava is not meant to be installed as a library, but rather to be used as a research tool.
-
-You can use Mava by cloning the repo and pip installing as follows:
+At the moment Mava is not meant to be installed as a library, but rather to be used as a research tool. We recommend cloning the Mava repo and pip installing as follows:
```bash
git clone https://github.com/instadeepai/mava.git
@@ -162,18 +52,18 @@ pip install -e .
We have tested `Mava` on Python 3.11 and 3.12, but earlier versions may also work. Specifically, we use Python 3.10 for the Quickstart notebook on Google Colab since Colab uses Python 3.10 by default. Note that because the installation of JAX differs depending on your hardware accelerator,
we advise users to explicitly install the correct JAX version (see the [official installation guide](https://github.com/google/jax#installation)). For more in-depth installation guides including Docker builds and virtual environments, please see our [detailed installation guide](docs/DETAILED_INSTALL.md).
-## Quickstart โก
+## Getting started โก
-To get started with training your first Mava system, simply run one of the system files. e.g.,
+To get started with training your first Mava system, simply run one of the system files:
```bash
-python mava/systems/ff_ippo.py
+python mava/systems/ppo/anakin/ff_ippo.py
```
-Mava makes use of Hydra for config management. In order to see our default system configs please see the `mava/configs/` directory. A benefit of Hydra is that configs can either be set in config yaml files or overwritten from the terminal on the fly. For an example of running a system on the LBF environment, the above code can simply be adapted as follows:
+Mava makes use of [Hydra](https://github.com/facebookresearch/hydra) for config management. In order to see our default system configs please see the `mava/configs/` directory. A benefit of Hydra is that configs can either be set in config yaml files or overwritten from the terminal on the fly. For an example of running a system on the Level-based Foraging environment, the above code can simply be adapted as follows:
```bash
-python mava/systems/ff_ippo.py env=lbf
+python mava/systems/ppo/anakin/ff_ippo.py env=lbf
```
Different scenarios can also be run by making the following config updates from the terminal:
@@ -182,11 +72,72 @@ Different scenarios can also be run by making the following config updates from
python mava/systems/ff_ippo.py env=rware env/scenario=tiny-4ag
```
-Additionally, we also have a [Quickstart notebook][quickstart] that can be used to quickly create and train your first Multi-agent system.
+Additionally, we also have a [Quickstart notebook][quickstart] that can be used to quickly create and train your first multi-agent system.
+
+Algorithms
+
+Mava has implementations of multiple on- and off-policy multi-agent algorithms that follow the independent learners (IL), centralised training with decentralised execution (CTDE) and heterogeneous agent learning paradigms. Aside from MARL learning paradigms, we also include implementations which follow the Anakin and Sebulba architectures to enable scalable training by default. The architecture that is relevant for a given problem depends on whether the environment being used in written in JAX or not. For more information on these paradigms, please see [here][anakin_paper].
-## Advanced Usage ๐ฝ
+| Algorithm | Variants | Continuous | Discrete | Anakin | Sebulba | Paper | Docs |
+|------------|----------------|------------|----------|--------|---------|-------|------|
+| PPO | [`ff_ippo.py`](mava/systems/ppo/anakin/ff_ippo.py) | โ
| โ
| โ
| โ
| [Link](https://arxiv.org/abs/2011.09533) | [Link](mava/systems/ppo/README.md) |
+| | [`ff_mappo.py`](mava/systems/ppo/anakin/ff_mappo.py) | โ
| โ
| โ
| | [Link](https://arxiv.org/abs/2103.01955) | [Link](mava/systems/ppo/README.md) |
+| | [`rec_ippo.py`](mava/systems/ppo/anakin/rec_ippo.py) | โ
| โ
| โ
| | [Link](https://arxiv.org/abs/2011.09533) | [Link](mava/systems/ppo/README.md) |
+| | [`rec_mappo.py`](mava/systems/ppo/anakin/rec_mappo.py) | โ
| โ
| โ
| | [Link](https://arxiv.org/abs/2103.01955) | [Link](mava/systems/ppo/README.md) |
+| Q Learning | [`rec_iql.py`](mava/systems/q_learning/anakin/rec_iql.py) | | โ
| โ
| | [Link](https://arxiv.org/abs/1511.08779) | [Link](mava/systems/q_learning/README.md) |
+| | [`rec_qmix.py`](mava/systems/q_learning/anakin/rec_qmix.py) | | โ
| โ
| | [Link](https://arxiv.org/abs/1803.11485) | [Link](mava/systems/q_learning/README.md) |
+| SAC | [`ff_isac.py`](mava/systems/sac/anakin/ff_isac.py) | โ
| | โ
| | [Link](https://arxiv.org/abs/1801.01290) | [Link](mava/systems/sac/README.md) |
+| | [`ff_masac.py`](mava/systems/sac/anakin/ff_masac.py) | โ
| | โ
| | | [Link](mava/systems/sac/README.md) |
+| | [`ff_hasac.py`](mava/systems/sac/anakin/ff_hasac.py) | โ
| | โ
| | [Link](https://arxiv.org/abs/2306.10715) | [Link](mava/systems/sac/README.md) |
+| MAT | [`mat.py`](mava/systems/mat/anakin/mat.py) | โ
| โ
| โ
| | [Link](https://arxiv.org/abs/2205.14953) | [Link](mava/systems/mat/README.md) |
+| Sable | [`ff_sable.py`](mava/systems/sable/anakin/ff_sable.py) | โ
| โ
| โ
| | [Link](https://arxiv.org/abs/2410.01706) | [Link](mava/systems/sable/README.md) |
+| | [`rec_sable.py`](mava/systems/sable/anakin/rec_sable.py) | โ
| โ
| โ
| | [Link](https://arxiv.org/abs/2410.01706) | [Link](mava/systems/sable/README.md) |
+Environments
-Mava can be used in a wide array of advanced systems. As an example, we demonstrate recording experience data from one of our PPO systems into a [Flashbax](https://github.com/instadeepai/flashbax) `Vault`. This vault can then easily be integrated into offline MARL systems, such as those found in [OG-MARL](https://github.com/instadeepai/og-marl). See the [Advanced README](./examples/advanced_usage/README.md) for more information.
+These are the environments which Mava supports _out of the box_, to add a new environment, please use the [existing wrapper implementations](mava/wrappers/) as an example. We also indicate whether the environment is implemented in JAX or not. JAX-based environments can be used with algorithms that follow the Anakin distribution architecture, while non-JAX environments can be used with algorithms following the Sebulba architecture.
+
+
+| Environment | Action space | JAX | Non-JAX | Paper | JAX Source | Non-JAX Source |
+|---------------------------------|---------------------|-----|-------|-------|------------|----------------|
+| Mulit-Robot Warehouse | Discrete | โ
| โ
| [Link](http://arxiv.org/abs/2006.07869) | [Link](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/robot_warehouse) | [Link](https://github.com/semitable/robotic-warehouse) |
+| Level-based Foraging | Discrete | โ
| โ
| [Link](https://arxiv.org/abs/2006.07169) | [Link](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/lbf) | [Link](https://github.com/semitable/lb-foraging) |
+| StarCraft Multi-Agent Challenge | Discrete | โ
| โ
| [Link](https://arxiv.org/abs/1902.04043) | [Link](https://github.com/FLAIROx/JaxMARL/tree/main/jaxmarl/environments/smax) | [Link](https://github.com/uoe-agents/smaclite) |
+| Multi-Agent Brax | Continuous | โ
| | [Link](https://arxiv.org/abs/2003.06709) | [Link](https://github.com/FLAIROx/JaxMARL/tree/main/jaxmarl/environments/mabrax) | |
+| Matrax | Discrete | โ
| | [Link](https://www.cs.toronto.edu/~cebly/Papers/_download_/multirl.pdf) | [Link](https://github.com/instadeepai/matrax) | |
+| Multi Particle Environments | Discrete/Continuous | โ
| | [Link](https://arxiv.org/abs/1706.02275) | [Link](https://github.com/FLAIROx/JaxMARL/tree/main/jaxmarl/environments/mpe) | |
+
+## Performance and Speed ๐
+We have performed a rigorous benchmark across 45 different scenarios and 6 different environment suites to validate the performance of Mava's algorithm implementations. For more detailed results please see our [Sable paper][sable] and for all hyperparameters, please see the following [website](https://sites.google.com/view/sable-marl).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Mava's algorithm performance: Each algorithm was tuned for 40 trials with the TPE optimizer and benchmarked over 10 seeds for each scenario. Environments from top left Multi-Robot Warehouse (aggregated over 15 scenarios) Level-based Foraging (aggregated over 7 scenarios) StarCraft Multi-Agent Challenge in JAX (aggregated over 11 scenarios) Connector (aggregated over 4 scenarios) Multi-Agent Brax (aggregated over 5 scenarios) Multi Particle Environments (aggregated over 3 scenarios)
+
+
+## Code Philosophy ๐ง
+
+The original code in Mava was adapted from [PureJaxRL][purejaxrl] which provides high-quality single-file implementations with research-friendly features. In turn, PureJaxRL is inspired by the code philosophy from [CleanRL][cleanrl]. Along this vein of easy-to-use and understandable RL codebases, Mava is not designed to be a modular library and is not meant to be imported. Our repository focuses on simplicity and clarity in its implementations while utilising the advantages offered by JAX such as `pmap` and `vmap`, making it an excellent resource for researchers and practitioners to build upon. A notable difference between Mava and CleanRL is that Mava creates small utilities for heavily re-used elements, such as networks and logging, we've found that this, in addition to Hydra configs, greatly improves the readability of the algorithms.
## Contributing ๐ค
@@ -196,17 +147,16 @@ Please read our [contributing docs](docs/CONTRIBUTING.md) for details on how to
We plan to iteratively expand Mava in the following increments:
-- ๐ด Support for more environments.
-- ๐ More robust recurrent systems.
-- ๐ณ Support for non JAX-based environments.
-- ๐ฆพ Support for off-policy algorithms.
-- ๐ Continuous action space environments and algorithms.
+- [x] Support for more environments.
+- [x] More robust recurrent systems.
+- [x] Support for non JAX-based environments.
+- [ ] Add Sebulba versions of more algorithms.
+- [x] Support for off-policy algorithms.
+- [x] Continuous action space environments and algorithms.
+- [ ] Allow systems to easily scale across multiple TPUs/GPUs.
Please do follow along as we develop this next phase!
-## TensorFlow 2 Mava:
-Originally Mava was written in Tensorflow 2. Support for the TF2-based framework and systems has now been fully **deprecated**. If you would still like to use it, please install `v0.1.3` of Mava (i.e. `pip install id-mava==0.1.3`).
-
## See Also ๐
**InstaDeep's MARL ecosystem in JAX.** In particular, we suggest users check out the following sister repositories:
@@ -260,3 +210,4 @@ The development of Mava was supported with Cloud TPUs from Google's [TPU Researc
[toward_standard_eval]: https://arxiv.org/pdf/2209.10485.pdf
[marl_eval]: https://github.com/instadeepai/marl-eval
[smax]: https://github.com/FLAIROx/JaxMARL/tree/main/jaxmarl/environments/smax
+[sable]: https://arxiv.org/pdf/2410.01706
diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md
index 791237ddc9..b7a7a14614 100644
--- a/docs/CONTRIBUTING.md
+++ b/docs/CONTRIBUTING.md
@@ -39,7 +39,7 @@ pre-commit run --all-files
## Naming Conventions
### Branch Names
-We name our feature and bugfix branches as follows - `feature/[BRANCH-NAME]`, `bugfix/[BRANCH-NAME]` or `maintenance/[BRANCH-NAME]`. Please ensure `[BRANCH-NAME]` is hyphen delimited.
+We name our feature and bugfix branches as follows - `feat/[BRANCH-NAME]`, `fix/[BRANCH-NAME]`. Please ensure `[BRANCH-NAME]` is hyphen delimited.
### Commit Messages
We follow the conventional commits [standard](https://www.conventionalcommits.org/en/v1.0.0/).
diff --git a/docs/DETAILED_INSTALL.md b/docs/DETAILED_INSTALL.md
index 28547c8aa5..04a499b72a 100644
--- a/docs/DETAILED_INSTALL.md
+++ b/docs/DETAILED_INSTALL.md
@@ -1,12 +1,11 @@
# Detailed installation guide
### Conda virtual environment
-We recommend using `conda` for package management. These instructions should allow you to install and run mava.
+We recommend using [uv](https://docs.astral.sh/uv/) for package management. These instructions should allow you to install and run mava.
-1. Create and activate a virtual environment
+1. Install `uv`
```bash
-conda create -n mava python=3.12
-conda activate mava
+curl -LsSf https://astral.sh/uv/install.sh | sh
```
2. Clone mava
@@ -15,19 +14,22 @@ git clone https://github.com/instadeepai/Mava.git
cd mava
```
-3. Install the dependencies
+3. Create and activate a virtual environment and install requirements
```bash
-pip install -e .
+uv venv -p=3.12
+source .venv/bin/activate
+uv pip install -e .
```
-4. Install jax on your accelerator. The example below is for an NVIDIA GPU, please the [official install guide](https://github.com/google/jax#installation) for other accelerators
+4. Install jax on your accelerator. The example below is for an NVIDIA GPU, please the [official install guide](https://github.com/google/jax#installation) for other accelerators.
+Note that the Jax version we use will change over time, please check the [requirements.txt](../requirements/requirements.txt) for our latest tested Jax verion.
```bash
-pip install "jax[cuda12]==0.4.30"
+uv pip install "jax[cuda12]==0.4.30"
```
5. Run a system!
```bash
-python mava/systems/ppo/ff_ippo.py env=rware
+python mava/systems/ppo/anakin/ff_ippo.py env=rware
```
### Docker
@@ -50,4 +52,4 @@ If you are having trouble with dependencies we recommend using our docker image
For example, `make run example=mava/systems/ppo/ff_ippo.py`.
- Alternatively, run bash inside a docker container with mava installed by running `make bash`, and from there systems can be run as follows: `python dir/to/system.py`.
+ Alternatively, run bash inside a docker container with Mava installed by running `make bash`, and from there systems can be run as follows: `python dir/to/system.py`.
diff --git a/docs/images/algo_images/sable-arch.png b/docs/images/algo_images/sable-arch.png
new file mode 100644
index 0000000000..1fd92c6c8f
Binary files /dev/null and b/docs/images/algo_images/sable-arch.png differ
diff --git a/docs/images/benchmark_results/connector.png b/docs/images/benchmark_results/connector.png
new file mode 100644
index 0000000000..5931b9c594
Binary files /dev/null and b/docs/images/benchmark_results/connector.png differ
diff --git a/docs/images/benchmark_results/lbf.png b/docs/images/benchmark_results/lbf.png
new file mode 100644
index 0000000000..34be250000
Binary files /dev/null and b/docs/images/benchmark_results/lbf.png differ
diff --git a/docs/images/benchmark_results/legend.jpg b/docs/images/benchmark_results/legend.jpg
new file mode 100644
index 0000000000..3b9070a569
Binary files /dev/null and b/docs/images/benchmark_results/legend.jpg differ
diff --git a/docs/images/benchmark_results/mabrax.png b/docs/images/benchmark_results/mabrax.png
new file mode 100644
index 0000000000..13d8edff49
Binary files /dev/null and b/docs/images/benchmark_results/mabrax.png differ
diff --git a/docs/images/benchmark_results/mpe.png b/docs/images/benchmark_results/mpe.png
new file mode 100644
index 0000000000..76157b2472
Binary files /dev/null and b/docs/images/benchmark_results/mpe.png differ
diff --git a/docs/images/benchmark_results/rware.png b/docs/images/benchmark_results/rware.png
new file mode 100644
index 0000000000..91d9edf46d
Binary files /dev/null and b/docs/images/benchmark_results/rware.png differ
diff --git a/docs/images/benchmark_results/smax.png b/docs/images/benchmark_results/smax.png
new file mode 100644
index 0000000000..a4aeddd47d
Binary files /dev/null and b/docs/images/benchmark_results/smax.png differ
diff --git a/docs/images/lbf_results/15x15-4p-3f_rec_mappo.png b/docs/images/lbf_results/15x15-4p-3f_rec_mappo.png
deleted file mode 100644
index fb01f398ea..0000000000
Binary files a/docs/images/lbf_results/15x15-4p-3f_rec_mappo.png and /dev/null differ
diff --git a/docs/images/lbf_results/2s-8x8-2p-2f-coop_rec_mappo.png b/docs/images/lbf_results/2s-8x8-2p-2f-coop_rec_mappo.png
deleted file mode 100644
index 081527a384..0000000000
Binary files a/docs/images/lbf_results/2s-8x8-2p-2f-coop_rec_mappo.png and /dev/null differ
diff --git a/docs/images/lbf_results/legend_rec_mappo.png b/docs/images/lbf_results/legend_rec_mappo.png
deleted file mode 100644
index 489499f78a..0000000000
Binary files a/docs/images/lbf_results/legend_rec_mappo.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_ippo/small-4ag.png b/docs/images/rware_results/ff_ippo/small-4ag.png
deleted file mode 100644
index a43b2d6d98..0000000000
Binary files a/docs/images/rware_results/ff_ippo/small-4ag.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_ippo/tiny-2ag.png b/docs/images/rware_results/ff_ippo/tiny-2ag.png
deleted file mode 100644
index df43e2077f..0000000000
Binary files a/docs/images/rware_results/ff_ippo/tiny-2ag.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_ippo/tiny-4ag.png b/docs/images/rware_results/ff_ippo/tiny-4ag.png
deleted file mode 100644
index 3962e8e735..0000000000
Binary files a/docs/images/rware_results/ff_ippo/tiny-4ag.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_mappo/main_readme/legend.png b/docs/images/rware_results/ff_mappo/main_readme/legend.png
deleted file mode 100644
index c7239b7197..0000000000
Binary files a/docs/images/rware_results/ff_mappo/main_readme/legend.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_mappo/main_readme/small-4ag-1.png b/docs/images/rware_results/ff_mappo/main_readme/small-4ag-1.png
deleted file mode 100644
index a899f00704..0000000000
Binary files a/docs/images/rware_results/ff_mappo/main_readme/small-4ag-1.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_mappo/main_readme/tiny-2ag-1.png b/docs/images/rware_results/ff_mappo/main_readme/tiny-2ag-1.png
deleted file mode 100644
index 6cd8086d2b..0000000000
Binary files a/docs/images/rware_results/ff_mappo/main_readme/tiny-2ag-1.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_mappo/main_readme/tiny-4ag-1.png b/docs/images/rware_results/ff_mappo/main_readme/tiny-4ag-1.png
deleted file mode 100644
index 0a89c9dcd5..0000000000
Binary files a/docs/images/rware_results/ff_mappo/main_readme/tiny-4ag-1.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_mappo/small-4ag.png b/docs/images/rware_results/ff_mappo/small-4ag.png
deleted file mode 100644
index 5ecfbbdfa0..0000000000
Binary files a/docs/images/rware_results/ff_mappo/small-4ag.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_mappo/tiny-2ag.png b/docs/images/rware_results/ff_mappo/tiny-2ag.png
deleted file mode 100644
index e16f4bbfa4..0000000000
Binary files a/docs/images/rware_results/ff_mappo/tiny-2ag.png and /dev/null differ
diff --git a/docs/images/rware_results/ff_mappo/tiny-4ag.png b/docs/images/rware_results/ff_mappo/tiny-4ag.png
deleted file mode 100644
index 59f259c5c1..0000000000
Binary files a/docs/images/rware_results/ff_mappo/tiny-4ag.png and /dev/null differ
diff --git a/docs/images/rware_results/rec_ippo/small-4ag.png b/docs/images/rware_results/rec_ippo/small-4ag.png
deleted file mode 100644
index edab2f32c0..0000000000
Binary files a/docs/images/rware_results/rec_ippo/small-4ag.png and /dev/null differ
diff --git a/docs/images/rware_results/rec_ippo/tiny-2ag.png b/docs/images/rware_results/rec_ippo/tiny-2ag.png
deleted file mode 100644
index 82f2e25e2d..0000000000
Binary files a/docs/images/rware_results/rec_ippo/tiny-2ag.png and /dev/null differ
diff --git a/docs/images/rware_results/rec_ippo/tiny-4ag.png b/docs/images/rware_results/rec_ippo/tiny-4ag.png
deleted file mode 100644
index d224507dda..0000000000
Binary files a/docs/images/rware_results/rec_ippo/tiny-4ag.png and /dev/null differ
diff --git a/docs/images/rware_results/rec_mappo/small-4ag.png b/docs/images/rware_results/rec_mappo/small-4ag.png
deleted file mode 100644
index 534847212e..0000000000
Binary files a/docs/images/rware_results/rec_mappo/small-4ag.png and /dev/null differ
diff --git a/docs/images/rware_results/rec_mappo/tiny-2ag.png b/docs/images/rware_results/rec_mappo/tiny-2ag.png
deleted file mode 100644
index 2927ca5cbf..0000000000
Binary files a/docs/images/rware_results/rec_mappo/tiny-2ag.png and /dev/null differ
diff --git a/docs/images/rware_results/rec_mappo/tiny-4ag.png b/docs/images/rware_results/rec_mappo/tiny-4ag.png
deleted file mode 100644
index ee5f390a48..0000000000
Binary files a/docs/images/rware_results/rec_mappo/tiny-4ag.png and /dev/null differ
diff --git a/docs/images/smax_results/10m_vs_11m.png b/docs/images/smax_results/10m_vs_11m.png
deleted file mode 100644
index c50c8ccccc..0000000000
Binary files a/docs/images/smax_results/10m_vs_11m.png and /dev/null differ
diff --git a/docs/images/smax_results/27m_vs_30m.png b/docs/images/smax_results/27m_vs_30m.png
deleted file mode 100644
index 2ed84c5831..0000000000
Binary files a/docs/images/smax_results/27m_vs_30m.png and /dev/null differ
diff --git a/docs/images/smax_results/2s3z.png b/docs/images/smax_results/2s3z.png
deleted file mode 100644
index ca34009eec..0000000000
Binary files a/docs/images/smax_results/2s3z.png and /dev/null differ
diff --git a/docs/images/smax_results/3s5z.png b/docs/images/smax_results/3s5z.png
deleted file mode 100644
index bc4f6fb6dd..0000000000
Binary files a/docs/images/smax_results/3s5z.png and /dev/null differ
diff --git a/docs/images/smax_results/3s5z_vs_3s6z.png b/docs/images/smax_results/3s5z_vs_3s6z.png
deleted file mode 100644
index db06e43fe7..0000000000
Binary files a/docs/images/smax_results/3s5z_vs_3s6z.png and /dev/null differ
diff --git a/docs/images/smax_results/3s_vs_5z.png b/docs/images/smax_results/3s_vs_5z.png
deleted file mode 100644
index db63fc8433..0000000000
Binary files a/docs/images/smax_results/3s_vs_5z.png and /dev/null differ
diff --git a/docs/images/smax_results/5m_vs_6m.png b/docs/images/smax_results/5m_vs_6m.png
deleted file mode 100644
index a52b5fb7d7..0000000000
Binary files a/docs/images/smax_results/5m_vs_6m.png and /dev/null differ
diff --git a/docs/images/smax_results/6h_vs_8z.png b/docs/images/smax_results/6h_vs_8z.png
deleted file mode 100644
index e76ae9bf2b..0000000000
Binary files a/docs/images/smax_results/6h_vs_8z.png and /dev/null differ
diff --git a/docs/images/smax_results/legend.png b/docs/images/smax_results/legend.png
deleted file mode 100644
index ed607b332d..0000000000
Binary files a/docs/images/smax_results/legend.png and /dev/null differ
diff --git a/docs/images/speed_results/ff_mappo_speed_comparison.png b/docs/images/speed_results/ff_mappo_speed_comparison.png
deleted file mode 100644
index 44f7ee821e..0000000000
Binary files a/docs/images/speed_results/ff_mappo_speed_comparison.png and /dev/null differ
diff --git a/docs/images/speed_results/mava_sps_results.png b/docs/images/speed_results/mava_sps_results.png
deleted file mode 100644
index 8393ea2bb1..0000000000
Binary files a/docs/images/speed_results/mava_sps_results.png and /dev/null differ
diff --git a/docs/images/speed_results/speed.png b/docs/images/speed_results/speed.png
new file mode 100644
index 0000000000..0099d33192
Binary files /dev/null and b/docs/images/speed_results/speed.png differ
diff --git a/docs/jumanji_rware_comparison.md b/docs/jumanji_rware_comparison.md
deleted file mode 100644
index 3d041ef124..0000000000
--- a/docs/jumanji_rware_comparison.md
+++ /dev/null
@@ -1,74 +0,0 @@
-# Differences in performance using Jumanji's version of RWARE
-
-There is a core difference in the way collisions are handled in the stateless JAX-based implementation of RWARE (called RobotWarehouse) found in [Jumanji][jumanji_rware] and the [original RWARE][original_rware] environment.
-
-As mentioned in the original repo, collisions are handled as follows:
- > The dynamics of the environment are also of particular interest. Like a real, 3-dimensional warehouse, the robots can move beneath the shelves. Of course, when the robots are loaded, they must use the corridors, avoiding any standing shelves.
->
->Any collisions are resolved in a way that allows for maximum mobility. When two or more agents attempt to move to the same location, we prioritise the one that also blocks others. Otherwise, the selection is done arbitrarily. The visuals below demonstrate the resolution of various collisions.
-
-In contrast to the collision resolution strategy above, the current version of the Jumanji implementation will not handle collisions dynamically but instead terminates an episode upon agent collision. In our experience, this appeared to make the task at hand more challenging and made it easier for agents to get trapped in local optima where episodes are never rolled out for the maximum length.
-
-To investigate this, we ran our algorithms on a version of Jumanji's RWARE where episodes do not terminate upon agent collision, but rather multiple agents are allowed to occupy the same grid position. This setup is not identical to that of the original environment but represents a closer version to its dynamics, allowing agents to easily reach the end of an episode.
-
-Please see below for Mava's recurrent and feedforward implementations of IPPO and MAPPO on the regular version of Jumanji as well as the adapted version of Jumanji without termination upon agent collision.
-
-
-
-
-
-
-
-
-
-
-
-
-
Mava feedforward MAPPO performance on the tiny-2ag
, tiny-4ag
and small-4ag
RWARE tasks.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Mava feedforward IPPO performance on the tiny-2ag
, tiny-4ag
and small-4ag
RWARE tasks.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Mava recurrent IPPO performance on the tiny-2ag
, tiny-4ag
and small-4ag
RWARE tasks.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Mava recurrent MAPPO performance on the tiny-2ag
, tiny-4ag
and small-4ag
RWARE tasks.
-
-
-
-[jumanji_rware]: https://instadeepai.github.io/jumanji/environments/robot_warehouse/
-[original_rware]: https://github.com/semitable/robotic-warehouse
diff --git a/docs/smax_benchmark.md b/docs/smax_benchmark.md
deleted file mode 100644
index dc840b51fd..0000000000
--- a/docs/smax_benchmark.md
+++ /dev/null
@@ -1,43 +0,0 @@
-# StarCraft Multi-Agent Challenge in JAX
-
-We trained Mavaโs recurrent systems on eight SMAX scenarios. The outcomes were then compared to the final win rates reported by [Rutherford et al., 2023](https://arxiv.org/pdf/2311.10090.pdf). To ensure fair comparisons we also train Mava's system up to 10 million timesteps with 64 vectorised environments.
-
-
-
-
-
-
-
-
-
-
- |
- |
- |
-
-
- 2s3z |
- 3s_vs_5z |
- 3s5z_vs_3s6z |
-
-
- |
- |
- |
-
-
- 3s5z |
- 5m_vs_6m |
- 6h_vs_8z |
-
-
- |
- |
- |
-
-
- 10m_vs_11m |
- 27m_vs_30m |
- |
-
-
diff --git a/mava/systems/mat/README.md b/mava/systems/mat/README.md
new file mode 100644
index 0000000000..8f2bc67bf0
--- /dev/null
+++ b/mava/systems/mat/README.md
@@ -0,0 +1,6 @@
+# Multi-agent Transformer
+
+We provide an implementation of the Multi-agent Transformer algorithm in JAX. MAT casts cooperative multi-agent reinforcement learning as a sequence modelling problem where agent observations and actions are treated as a sequence. At each timestep the observations of all agents are encoded and then these encoded observations are used for auto-regressive action selection.
+
+## Relevant paper:
+* [Multi-Agent Reinforcement Learning is a Sequence Modeling Problem](https://arxiv.org/pdf/2205.14953)
diff --git a/mava/systems/ppo/README.md b/mava/systems/ppo/README.md
new file mode 100644
index 0000000000..75754f19a6
--- /dev/null
+++ b/mava/systems/ppo/README.md
@@ -0,0 +1,17 @@
+# Proximal Policy Optimization
+
+We provide the following four multi-agent extensions to [PPO](https://arxiv.org/pdf/1707.06347) following the Anakin architecture.
+
+* [ff-IPPO](../../systems/ppo/anakin/ff_ippo.py)
+* [ff-MAPPO](../../systems/ppo/anakin/ff_mappo.py)
+* [rec-IPPO](../../systems/ppo/anakin/rec_ippo.py)
+* [rec-MAPPO](../../systems/ppo/anakin/rec_mappo.py)
+
+In all cases IPPO implies that it is an implementation following the independent learners MARL paradigm while MAPPO implies that the implementation follows the centralised training with decentralised execution paradigm by having a centralised critic during training. The `ff` or `rec` suffixes in the system names implies that the policy networks are MLPs or have a [GRU](https://arxiv.org/pdf/1406.1078) memory module to help learning despite partial observability in the environment.
+
+In addition to the Anakin-based implementations, we also include a Sebulba-based implementation of [ff-IPPO](../../systems/ppo/sebulba/ff_ippo.py) which can be used on environments that are not written in JAX and adhere to the Gymnasium API.
+
+## Relevant papers:
+* [Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347)
+* [The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games](https://arxiv.org/pdf/2103.01955)
+* [Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?](https://arxiv.org/pdf/2011.09533)
diff --git a/mava/systems/q_learning/README.md b/mava/systems/q_learning/README.md
new file mode 100644
index 0000000000..eef858cd9a
--- /dev/null
+++ b/mava/systems/q_learning/README.md
@@ -0,0 +1,14 @@
+# Q Learning
+
+We provide two Q-Learning based systems that follow the independent learners and centralised training with decentralised execution paradigms:
+
+* [rec-IQL](../../systems/q_learning/anakin/rec_iql.py)
+* [rec-QMIX](../../systems/q_learning/anakin/rec_qmix.py)
+
+`rec-IQL` is a multi-agent version of DQN that uses double DQN and has a GRU memory module and `rec-QMIX` is an implementation of QMIX in JAX that uses monontic value function decomposition.
+
+## Relevant papers:
+* [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/pdf/1312.5602)
+* [Multiagent Cooperation and Competition with Deep Reinforcement Learning](https://arxiv.org/pdf/1511.08779)
+* [QMIX: Monotonic Value Function Factorisation for
+Deep Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/1803.11485)
diff --git a/mava/systems/sable/README.md b/mava/systems/sable/README.md
new file mode 100644
index 0000000000..92e19b693e
--- /dev/null
+++ b/mava/systems/sable/README.md
@@ -0,0 +1,24 @@
+# Sable
+
+Sable is an algorithm that was developed by the research team at InstaDeep. It also casts MARL as a sequence modelling problem and leverages the [advantage decompostion theorem](https://arxiv.org/pdf/2108.08612) through auto-regressive action selection for convergence guarantees and can scale to thousands of agents by leveraging the memory efficiency of Retentive Networks.
+
+We provide two Anakin based implementations of Sable:
+* [ff-sable](../../systems/sable/anakin/ff_sable.py)
+* [rec-sable](../../systems/sable/anakin/rec_sable.py)
+
+Here the `ff` suffix implies that the algorithm retains no memory over time but treats only the agents as the sequence dimension while `rec` implies that the algorithms maintains memory over both agents and time for long context memory in partially observable environments.
+
+For an overview of how the algorithm works, please see the diagram below. For a more detailed overview please see our associated [paper](https://arxiv.org/pdf/2410.01706).
+
+
+
+
+
+
+
+*Sable architecture and execution.* The encoder receives all agent observations $o_t^1,\dots,o_t^N$ from the current timestep $t$ along with a hidden state $h\_{t-1}^{\text{enc}}$ representing past timesteps and produces encoded observations $\hat{o}\_t^1,\dots,\hat{o}\_t^N$, observation-values $v \left( \hat{o}\_t^1 \right),\dots,v \left( \hat{o}\_t^N \right) $, and a new hidden state $h_t^{\text{enc}}$.
+The decoder performs recurrent retention over the current action $a_t^{m-1}$, followed by cross attention with the encoded observations, producing the next action $a_t^m$. The initial hidden states for recurrence over agents in the decoder at the current timestep are $( h\_{t-1}^{\text{dec}\_1},h\_{t-1}^{\text{dec}\_2})$, and by the end of the decoding process, it generates the updated hidden states $(h_t^{\text{dec}_1},h_t^{\text{dec}_2})$.
+
+## Relevant paper:
+* [Performant, Memory Efficient and Scalable Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2410.01706)
+* [Retentive Network: A Successor to Transformer for Large Language Models](https://arxiv.org/pdf/2307.08621)
diff --git a/mava/systems/sac/README.md b/mava/systems/sac/README.md
new file mode 100644
index 0000000000..af7b274113
--- /dev/null
+++ b/mava/systems/sac/README.md
@@ -0,0 +1,16 @@
+# Soft Actor-Critic
+
+We provide the following three multi-agent extensions to the Soft Actor-Critic (SAC) algorithm.
+
+* [ff-ISAC](../../systems/sac/anakin/ff_isac.py)
+* [ff-MASAC](../../systems/sac/anakin/ff_masac.py)
+* [ff-HASAC](../../systems/sac/anakin/ff_hasac.py)
+
+`ISAC` is an implementation following the independent learners MARL paradigm while `MASAC` is an implementation that follows the centralised training with decentralised execution paradigm by having a centralised critic during training. `HASAC` follows the heterogeneous agent learning paradigm through sequential policy updates. The `ff` prefix to the algorithm names indicate that the algorithms use MLP-based policy networks.
+
+## Relevant papers
+* [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/pdf/1801.01290)
+* [Multi-Agent Actor-Critic for Mixed
+Cooperative-Competitive Environments](https://arxiv.org/pdf/1706.02275)
+* [Robust Multi-Agent Control via Maximum Entropy
+Heterogeneous-Agent Reinforcement Learning](https://arxiv.org/pdf/2306.10715)