In this project we studied how different RL algos explore the parameters space of the policy in order to explain the differences in performance on the Pendulum benchmark. New version of Olivier Sigaud's code that includes CEM (Cross Entropy Method), new visualisations, and a Beta policy. You can find slides of a short presentation about the kind of experiments you can do with this code here (in French).
Visualization of the reward landscape over the parameters space around the five first policies learned by CEM and PG:
Visualization of the reward landscape over the parameters space around a good policy learned by the CEM:
To launch a comparison between Policy Gradient (reinforce or custom PG) and CEM you can use:
python3 --experiment comparison --env_name Pendulum-v0 --policy_type normal --nb_cycles 100 --nb_repet 1 --nb_eval 1 --eval_freq 20 --nb_trajs_cem 1 --reinforce True --nb_trajs_pg 20 --population 15 --lr_actor 1e-4
Plots and models are found in the /data folder.
For classic reinforce use --reinforce True. Otherwise you can build your own policy gradient algorithm like this for exemple:
python3 --experiment pg --env_name Pendulum-v0 --policy_type normal --critic_update_method dataset --study_name discount --gamma 0.99 --lr_critic 1e-2 --gradients sum+baseline --critic_estim_method td --nb_trajs_pg 20
To study the CEM you can use:
python3 --experiment cem --population 20 --elites_frac 0.2 --sigma 1 --nb_trajs_cem 2
pip install -r requirements.txt
- Compatibility between CEM and Bernoulli Policy.
- Fix plot axes and problem of last eval.
- Make a "CustomNetwork" class with dim of NN and policy type as arguments.
- Compatibility of with all types of policy.
- Rebuild code for comparison so it runs both algo independently and then build plots.
- Rethink compatibility with Vignettes.
- Make better wrapper for Beta policy actions.
- Make sure everything is in English
- Translate slides of results in English