Background

18 Jun, 2026

I'm currently working on the background chapter for the thesis and wanted to share the structure and rough first draft of it here.

My governing principles:

The background should contain exactly what the reader needs to understand the central argument and methology and should be clearly separated from related work. It should lay the conceptual foundation (i.e. definitions, formalism, mechanisms) while related work surveys "who did what" (ecosystem lineage, method families, existing benchmarks).

The goal is to lay the groundwork so that later my argument is a short step (the point actually gets made in the problem statement and methodology). I do not want to argue in the background but only "foreshadow" (Maybe signal relevance with light forward-pointing references, e.g.:"This factorization of environment change into transition and reward components will prove central to the modification taxonomy introduced in Chapter X").

TLDR: state field facts neutrally, foreshadow their relevance with a pointer, but withhold the load-bearing claim.

What background is needed?

RL preliminaries

MDP tuple, extending to POMDP (Sutton & Barto,Puterman 1994, Kaelbling et al. 1998)
Policy
Value/return
Objective I will try to keep it lean deliberately becaue I only need enough to define the terms and set up the distinction my whole taxonomy rests on (MDP factorizes into transition function $P$ and reward function $R$ )(Pan et al. CRL survey). Also take a look at: Feng, Huang, Zhang & Magliacane, "Factored Adaptation for Non-Stationary RL".

Algorithm background (maybe "merge" into RL)

I'm using PPO so a short treatment of policy-gradient / actor-critic and PPO specifically would probably be good (Schulman et al. 2017).

The CRL setting

Abel et al. 2023 and Khetarpal et al. 2020: define the continual setting
Non-stationary stream of tasks, agent that never stops learning
Stability-plasticity and the two failure modes:
- Catastrophic forgetting (losing old task performance)
- Loss of plasticity / primacy bias / dormant neurons (network progressively losing the ability to learn new things)
CRL method families (taxonomy only): regularization / replay / parameter isolation

Atari as a testbed

Conceptual justificaiton (not the entire ecosystem)
Why Atari/ALE is the standard RL benchmark + what properties make it suitable

Get in Touch

If you have any feedback you want to share with me feel free to reach out at mail@sebastianwette.de. I would be more than happy to chat about it.

Sebi Wette