Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 785 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 785 Bytes

alternate_pg

Code for the paper An Alternate Policy Gradient Estimator for Softmax Policies (https://arxiv.org/abs/2112.11622) published at AISTATS 2022.

Different settings have different codes (all require Numpy, Scipy, matplotlib):

  • bandits (3 armed bandit testbed with normal noise; also contains code for plotting the policy update directions on the policy simplex)
  • tabular (linear chain with REINFORCE; involves exact gradients)
  • linear (online AC with linear function approximation (+ tilecoding) with softmax and escort transform; also entropy regularization; requires additional files for running the environments and tilecode --- look up the help file in the folder)
  • neural (online AC with neural networks; also contains the DotReacher environment; requires PyTorch)