Above all, OpenSpiel is designed to be easy to install and use, easy to understand, easy to extend (“hackable”), and general/broad. OpenSpiel is built around two major important design criteria:
-
Keep it simple. Simple choices are preferred to more complex ones. The code should be readable, usable, extendable by non-experts in the programming language(s), and especially to researchers from potentially different fields. OpenSpiel provides reference implementations that are used to learn from and prototype with, rather than fully-optimized / high-performance code that would require additional assumptions (narrowing the scope / breadth) or advanced (or lower-level) language features.
-
Keep it light. Dependencies can be problematic for long-term compatibility, maintenance, and ease-of- use. Unless there is strong justification, we tend to avoid introducing dependencies to keep things easy to install and more portable.
Contributions to this project must be accompanied by a Contributor License Agreement (CLA). See CONTRIBUTING.md for the details.
Here, we outline our intentions for the future, giving an overview of what we hope to add over the coming years. We also suggest a number of contributions that we would like to see, but have not had the time to add ourselves.
Before making a contribution to OpenSpiel, please read the guidelines. We also kindly request that you contact us before writing any large piece of code, in case (a) we are already working on it and/or (b) it's something we have already considered and may have some design advice on its implementation. Please also note that some games may have copyrights which might require legal approval. Otherwise, happy hacking!
The following list is both a Call for Contributions and an idealized road map. We certainly are planning to add some of these ourselves (and, in some cases already have implementations that were just not tested well enough to make the release!). Contributions are certainly not limited to these suggestions!
-
AlphaZero. An implementation of AlphaZero. Preferably, an implementation that closely matches the pseudo-code provided in the paper.
-
Baselines for Monte Carlo CFR. Implementations of the variance-reduction techniques for MCCFR (Ref1, Ref2).
-
Checkers / Draughts. This is a classic game and an important one in the history of game AI ("Checkers is solved").
-
Chinese Checkers / Halma. Chinese Checkers is the canonical multiplayer (more than two player) perfect information game. Currently, OpenSpiel does not contain any games in this category.
-
Correlated Equilibrium. There is a simple linear program that can be solved to find a correlated equilibrium in a normal-form game (see Section 4.6 of Shoham & Leyton-Brown '09). This would be a nice complement to the existing solving of zero-sum games in
python/algorithms/lp_solver.py
. -
Deep TreeStrap. An implementation of TreeStrap (see Bootstrapping from Game Tree Search), except with a DQN-like replay buffer, storing value targets obtained from minimax searches. We have an initial implementation, but it is not yet ready for release. We also hope to support PyTorch for this algorithm as well.
-
Double Neural Counterfactual Regret Minimization. This is a technique similar to Regression CFR that uses a robust sampling technique and a new network architecture that predicts both the cumulative regret and the average strategy. (Ref)
-
Differentiable Games and Algorithms. For example, Symplectic Gradient Adjustment (Ref).
-
Emergent Communication Algorithms. For example, RIAL and/or DIAL and CommNet.
-
Emergent Communication Games. Referential games such as the ones in Ref1, Ref2, Ref3.
-
Extensive-form Evolutionary Dynamics. There have been a number of different evolutionary dynamics suggested for the sequential games, such as state-coupled replicator dynamics (Ref), sequence-form replicator dynamics (Ref1, Ref2), sequence-form Q-learning (Ref), and the logit dynamics (Ref).
-
Game Query/Customization API. There is no easy way to retrieve game-specific information since all the algorithms interact with the general API only. But sometimes this is necessary, such as when a technique is being tested or specialized on one game. There is also no way to change the representation of observations without changing the implementation of the game. This module would expose game-specific information via queries and customization without having to hack the game implementations directly.
-
General Games Wrapper. There are several general game engine languages and databases of general games that currently exist, for example within the general game-playing project and the Ludii General Game System. A very nice addition to OpenSpiel would be a game that interprets games represented in these languages and presents them as OpenSpiel games. This could lead to the potential of evaluating learning agents on hundreds to thousands of games.
-
Go API. We currently have a prototype Go API similar to the Python API. It is exposed using cgo via a C API much like the CFFI Python bindings from the Hanabi Learning Environment. It is not currently ready for release, but should be possible in a future update.
-
Grid Worlds. There are currently three grid world games in OpenSpiel: Markov soccer, the coin game, and cooperative box-pushing. There could be more, especially ones that have been used in multiagent RL such as Laser Tag and Gathering from Ref1 Ref2.
-
Hanabi Learning Environment Wrapper. Provide a game that wraps the Hanabi Learning Environment. We do have a working prototype, but is not yet ready for release.
-
Heuristic Payoff Tables and Empirical Game-Theoretic Analysis. Methods found in Analyzing Complex Strategic Interactions in Multi-Agent Systems, Methods for Empirical Game-Theoretic Analysis, An evolutionary game-theoretic analysis of poker strategies, Ref4.
-
MacOS support. We would like to officially support MacOS, if possible. We do not anticipate any problems, as all the dependencies are available via
brew
, but we have not tested this yet. -
Minimax-Q and other classic MARL algorithms. Minimax-Q is a classic multiagent reinforcement learning algorithm (Markov games as a framework for multi-agent reinforcement learning. Other classic algorithms, such as Correlated Q-learning, NashQ, and Friend-or-Foe Q-learning (Friend-or-foe q-learning in general-sum games would be welcome as well.
-
Nash Averaging. An evaluation tool first described in Re-evaluating Evaluation.
-
Negotiation Games. A game similar to the negotiation game presented in Ref1, Ref2. Also, Colored Trails (Modeling how Humans Reason about Others with Partial Information, Metastrategies in the coloredtrails game.
-
Opponent Modeling / Shaping Algorithms. For example, DRON, LOLA, and Stable Opponent Shaping.
-
PyTorch. While we officially support Tensorflow, the API is agnostic to the library that is used for learning. We would like to have some examples and support for PyTorch as well in the future.
-
Repeated Games. There is currently no explicit support for repeated games. Supporting repeated games as one sequential game could be useful for application of RL algorithms. This could take the form of another game transform, where intermediate rewards are given for game instances. It could also support random termination, found in the literature and tournaments.
-
Sequential Social Dilemmas. Sequential social dilemmas, such as the ones found in Ref1, Ref2 . Wolfpack could be a nice one, since pursuit-evasion games have been common in the literature (Ref). Also the coin games from Ref1 and Ref2, and Clamity, Cleanup and/or Harvest from Ref3 Ref4.
-
Single-Agent Games and Environments. There are currently no single-player (i.e. solitaire) games or traditional RL environments implemented (in C++, accessible to the entire code base) despite the API supporting the use case. Games that fit into the category, such as Morpion and Klondike, and traditional RL environments such as grid worlds, that have been used commonly in AI research, would be welcome contributions.
-
Structured Action Spaces. Currently, actions are integers between 0 and some value. There is no easy way to interpret what each action means in a game-specific way. Nor is there any way to easily represent a composite action in terms of its parts. A structured action space could represent actions as a sequence of values (like information states and observations-- and can also include shapes) which can be learned instead of mappings to flat numbers. Then, each game could have a mapping from the structured action to the action taken.
-
TF_Trajectories. The source code currently includes a batch inference for running a batch of episodes using Tensorflow directly from C++ (in
contrib/
). It has not yet been tested with CMake and public Tensorflow. We would like to officially support this and move it into the core library. -
Value Iteration for Simultaneous Move Games. The current implementation of value iteration does not support simultaneous move games despite having the necessary LP-solving routines needed. This is a simple change to support solving simultaneous-move games.
-
Windows Support. We would like to officially support Windows, if possible.