Marcello Restelli (marcello.restelli

Detailed description of the topics

Reinforcement learning deals with solving sequential decision making problems, when no (or minimal) prior information is available.

Solving sequential decision making problems means to find their optimal control policies.

Using reinforcement-learning algorithms, the optimal policy is learned through the direct interaction between the agent (or controller) and the system to be controlled.

The course will introduce the main modeling frameworks, will analyze the most relevant reinforcement-learning techniques, and, finally, some interesting applications of these techniques to real-world domains will be shown.

1) Models

* Finite Markov Decision Processes

* Continuous Markov Decision Processes

* Partially Observable Markov Decision Processes

* Semi Markov Decision Processes

* Markov Games

2) Algorithms

* Value Iteration based algorithms (Q-learning, SARSA, TD(lambda))

* Policy Iteration based algorithms (actor-critic methods, LSPI)

* Policy Search algorithms (policy gradient methods and stochastic search techniques)

* Exploration techniques (R-MAX, model-based Interval Estimation)

* Model-free vs Model-based algorithms

* Batch algorithms (Fitted Q-iteration)

* Function approximation in Reinforcement Learning algorithms

* Hierarchical Learning (options, HAMs, MAX-Q)

* Multi-Agent Learning techniques (basic elements)

3) Applications

* Autonomic Computing

* Robot Control

* Water Resources Management

* Portfolio Management

Exam

The course evaluation can take the form of an oral examination or a project on topics related to the course material.

Schedule

The course will start in March and will last for 5 weeks, with two classes of two hours per week.

Bibliography

Dimitri P. Bertsekas and John Tsitsiklis,

Richard S. Sutton and Andrew G. Barto,

Csaba Szepesvári,

Powered by Politecnico di Milano