Background
I'm currently working on the background chapter for the thesis and wanted to share the structure and rough first draft of it here.
My governing principles:
The background should contain exactly what the reader needs to understand the central argument and methology and should be clearly separated from related work. It should lay the conceptual foundation (i.e. definitions, formalism, mechanisms) while related work surveys "who did what" (ecosystem lineage, method families, existing benchmarks).
The goal is to lay the groundwork so that later my argument is a short step (the point actually gets made in the problem statement and methodology). I do not want to argue in the background but only "foreshadow" (Maybe signal relevance with light forward-pointing references, e.g.:"This factorization of environment change into transition and reward components will prove central to the modification taxonomy introduced in Chapter X").
TLDR: state field facts neutrally, foreshadow their relevance with a pointer, but withhold the load-bearing claim.
What background is needed?
RL preliminaries
- MDP tuple, extending to POMDP (Sutton & Barto,Puterman 1994, Kaelbling et al. 1998)
- Policy
- Value/return
- Objective I will try to keep it lean deliberately becaue I only need enough to define the terms and set up the distinction my whole taxonomy rests on (MDP factorizes into transition function and reward function )(Pan et al. CRL survey). Also take a look at: Feng, Huang, Zhang & Magliacane, "Factored Adaptation for Non-Stationary RL".
Algorithm background (maybe "merge" into RL)
I'm using PPO so a short treatment of policy-gradient / actor-critic and PPO specifically would probably be good (Schulman et al. 2017).
The CRL setting
- Abel et al. 2023 and Khetarpal et al. 2020: define the continual setting
- Non-stationary stream of tasks, agent that never stops learning
- Stability-plasticity and the two failure modes:
- Catastrophic forgetting (losing old task performance)
- Loss of plasticity / primacy bias / dormant neurons (network progressively losing the ability to learn new things)
- CRL method families (taxonomy only): regularization / replay / parameter isolation
Atari as a testbed
- Conceptual justificaiton (not the entire ecosystem)
- Why Atari/ALE is the standard RL benchmark + what properties make it suitable
Get in Touch
If you have any feedback you want to share with me feel free to reach out at mail@sebastianwette.de. I would be more than happy to chat about it.