Sebi Wette

Some marginal notes on initial research

Reading List

While I found a lot of material to read, I will try to keep it as lean as possible to start off.

I also need to refresh and/or broaden my knowledge on RL in general (since i think its a good starting point to assume that I know little or that what I know is probably wrong). But at least for today thats not the point. However I will start to re-visit basic RL topics in a more systematic way soon.

Here is what I plan on reading, although I might change it spontaneously:


A Survey on Continual Reinforcement Learning

The paper starts of with an intro to RL and the problem of DRL agents struggling with efficient transfer of knowledge across tasks to adapt to new environments. Typically they need to be re-trained from scratch. There is however a lot of reasearch on enabling these agents to do better in such scenarios and avoid catastrophic forgetting.

The field of Continual RL (or lifelong / incremental learning).

Background

In this section the authors provide a fairly deep refresher of RL, starting with "what's an MDP?", and building from there. Next up they do the same thing for Continual Learning (CL = paradigm in ML that focuses on incrementally updating models to adapt to new tasks while maintaining performance on previous tasks).

Three categories of strategies to overcome the main issues in CL (forgetting & transfer):

Overview of Research in CRL

  1. Lifelong Adaptation: agent trained on sequence of tasks and its performance is evaluated only on new tasks
  2. Non-Stationarity Learning: tasks in sequence differ in their reward functions or transitin functions (but share same underlying logic)
  3. Task Incremental Learning: task in sequence doffer from one another both in reward function and transition function -> more distinct. Some even have different state and action space.
  4. Task Agnostic Learning : agent is trained on a sequence of tasks without full knowledge of task labels or identities

Method Review

In this section the authors present a taxonomy of CRL Methods by asking "what knowledge is stored and/or transferred". They come up with four categories:

  1. Policy-focused (Policy re-use, policy decomposition e.g. Progressive NNs or hierarchical decomposition, Policy Merging e.g. Distillation, Regularization and EWS,
  2. Experience-focused (Direct Replay e.g. CLEAR, Generative Replay)
  3. Dynamics-focused (Related to Model-Based RL -> learn model of env's dynamics to predict future state/reward. Direct Modeling, Indirect Modeling)
  4. Reward-focused

Later on they also write a bit more about other "emerging" research directions such as Task Detection

Applications

A Definition of Continual Reinforcement Learning

They start by describing RL as "finding a solution" and compare it to CRL as "endless adaption". Further, they ask if this endless adaption might be a better way to model the RL problem. They go on and state that the community lacks a clean and general definition of CRL.

OC-Atari, HackAtari & Co.

Categorizes modifications into 4 paradigms:

  1. Visual Domain Adaptation (e.g. changing colors)
  2. Dynamics Adaptation (alter gameplay by introducing small perturbations)
  3. Curriculum Reinforcement Learning: task complexity is incrementally increased -> learn different skills in a strucured manner
  4. Reward Signal Adaptation: modify the reward function in order to test agents ability to adapt to new objectives

Related Work: CRL benchmarks -> need to expand Atari benchmarks. HackAtari provides new applications such as benchmarking for generalization (and I would also add forgetting here).

CLEVA-Compass

The authors pick up on continual learning (and its problems). Since there is no agreed-upon formal definition "beyond the idea to continuously observe data [...]" things like reproducibility, interpretation of results and comparability are an even bigger problem with CL.

With this paper (which is a reproducibilty work) they try to promote transparency and comparability of reported results for the CL-case -> Contunual Learning EValuation Assessment (CLEVA) Compass (visual representation that presents and intuitive chart to identify a work's priorities and context in the borader literature landscape as well. It also enables a way to determine how methods differ in terms of reported metrics, where they resemble each other or what elements would be missing towards a fairer comparison).

Get in Touch

If you have any feedback you want to share with me feel free to reach out at mail@sebastianwette.de. I would be more than happy to chat about it.


#thesis