Related Work
After finishing the "Background" part I am now focusing on the "Related Work" Chapter. Again I thought I might share the structure and rough first draft.
Background
- "[...] Atari is employed in this thesis not as a subject of study, but as a controllable testbed in which appearance, dynamics, and rewards can be independently modified."
- Clean transition into related work.
Related Work
Atari Environment Ecosystem
- This subsection covers the framework lineage (who built what?, what did each add? ...) ending in JAXtari. Nothing on "why Atari" (this was done in Background).
- ALE as the "spine" (bellemare/machado)
- OCAtari for native OC-state (delfosse)
- HackAtari for modular modifications (delfosse)
- JAXtari (JAX based Atari, modifications, OC-states ...)
- JAX-native env context (matthews, bonnet, radji, young) -> (simplified) Atari precedent
- Close by naming the inherited combination of JAXtari (OC, Mods, JAX)
Continual RL Methods
- This subsection surveys the method families and situates the four baselines as one representative each (three-family taxonomy already introduced in Background via De Lange, just reference briefly)
- Regularization: EWC (kirkpatrick) as baseline and e.g. Progress & Compress (schwarz) as another example
- Replay/Rehearsal: AGEM (chaudhry) as baseline and GEM (Lopez) for lineage as well as CLEAR (rolnick) as another prominent example
- Architecture: PackNet (mallya) as baseline, Progressive Nets (rusu) and Modular Composition (mendez) as other examples.
- Emerging family: world models for CRL (kessler)
Continual RL Benchmarks
- This subsection applies the confound critique + carves out the gap. Referenced benchmarks are "sorted": ones that are further away from this thesis -> ones that are closer.
- Cross-environment, conflate all axes:
- CORA (powers): Cross-game Atari. Configurable CRL platform with forgetting/transfer/retention metrics (position my standardize protocol relative to this).
- Continual World (wolczyk) which is more about robotics. Forgetting is observable but not so much attributable.
- JAX native but different setting: MEAL (tomilin) -> multi-agent, overcookey-layout variation (I can claim "only single-agent JAX-native)
- Single-axis / controlled shift / Intra-environment
- KAGE-Bench (cherepanov): "Pixel-based reinforcement learning agents often fail under purely visual distribution shift even when latent dynamics and rewards are unchanged, but existing benchmarks entangle multiple sources of shift and hinder systematic analysis" -> generalization and not CRL, but JAX native and isolates visual axis for a custom platformer environment
- TAPE (pan) isolates dynamics axis, but also only for OOD generalization benchmarking (cellular automata) -> Idea of isolated axis is in the zeitgeist
- ANOVA (rusu) -> ALEs built in difficulty modes (single game) but benchmarks transfer, and not clearly single axis
- COOM (tomilin): visual/szenario-axis sequence on Doom (intra-environment) -> "COOM presents a meticulously crafted suite of task sequences set within visually distinct 3D environments"
- CRL Maze (lomonaco): 3D maze, visually distinct task sequences also on Doom
The Gap
- Single-axis controlled shift is already in the air BUT only one axis at a time, framed as zero-shot generalization rather than sequential continual learning and not on Atari.
- No existing benchmark performs a controlled cross-type comparison of visual vs. dynamic vs. reward shift, under continual/forgetting evaluation, on a JAX-native single-base-game (Atari) substrate with pixel/OC parity