Lecture Module: Department of Computer Science, Reinforcement Learning, University College London Supervisor: Prof. Hado Van Hasselt, Matteo Hessel, and Diana Borsa
All data was provided by UCL
-
Agent Implementation: Random, UCB, Epsilon-greedy and REINFORCE Agent. Implemented four different kinds of experiment and analysed them.
-
Learning Algorithms for Sequential Decision Problems: Tabular Reinforcement Learning, TD Learning, Policy Iteration, Q-learning agents [General Q-learning, Sarsa, Expected Sarsa, Double Q-learning], and analysed each result.
-
Analysed Q-learning, Double Q-learning, and Target Q-learning
-
Off-policy Bellman Operators with Fuction Approximation and Analysed Each Results