Sebi Wette

TUD Lecture on RL #2

Lecture 2: Markov Decision Processes

An agent learns by interacting with its "environment". How can we formalize this?

MDPs

image

Episodes

Returns

Goals and rewards

Policies and value functions

Value function

Bellman Equations

Further reading and supplementary material

#RL-Lecture