Johannes Ackermann

I am a fourth-year PhD student at the University of Tokyo, working on Reinforcement Learning supervised by Masashi Sugiyama, and a part-time researcher at RIKEN AIP.

I previously interned at Sakana AI and at Preferred Networks. Prior to starting my PhD, I worked on applied ML for Optical Communication at Huawei, obtained a B.Sc. and M.Sc. in Electrical Engineering and Information Technology from the Technical University of Munich and wrote my Master’s Thesis at ETH Zurich’s Disco Group.

I am particularly interested in the nature of “tasks” in RL, here defined as the combination of transition and reward function:

Changing Tasks: How can we deal with changing tasks during dataset collection for Offline RL [RLC1] or changing dynamics shift during deployment [RLC2]?
Structure of Tasks: In Multi-Task RL, all tasks are usually treated as equally (dis)similar. I investigated how to identify and use task relations, by learning continuous task spaces [Thesis, Chapter 3] and task clusterings [ECML-PKDD1].
Task Specification: We showed that reward models learned from human preferences (RLHF) need off-policy corrections [COLM1]. I also (co-)investigated different ways to accumulate rewards, beyond the simple discounted sum [RLC3], such as range, min, max, variance, etc.

I’m always happy to chat about research, so feel free to reach out by e-mail or socials!

news

Jul 08, 2025	Off-Policy Corrected Reward Modeling for RLHF has been accepted at COLM 2025
May 10, 2025	Two papers 1, 2 accepted at RLC 2025
May 17, 2024	Our work on Offline Reinforcement Learning from Datasets with Structured Non-Stationarity was accepted at RLC 2024

latest posts

Apr 20, 2022	Building a Text to Image Web App

selected publications

COLM

Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback

Johannes Ackermann, Takashi Ishida, and Masashi Sugiyama

In Conference on Language Modeling (COLM) 2025 , Oct 2025

PDF
RLC

Recursive Reward Aggregation

Yuting Tang, Yivan Zhang, Johannes Ackermann, Yu-Jie Zhang, Soichiro Nishimori, and Masashi Sugiyama

In Reinforcement Learning Conference (RLC) 2025 , Aug 2025

PDF
RLC

Offline Reinforcement Learning with Domain-Unlabeled Data

Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, and Masashi Sugiyama

In Reinforcement Learning Conference (RLC) 2025 , Aug 2025

arXiv PDF
RLC

Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

Johannes Ackermann, Takayuki Osa, and Masashi Sugiyama

In Reinforcement Learning Conference (RLC) 2024 , Aug 2024

arXiv PDF Website
Workshop

High-Resolution Image Editing via Multi-Stage Blended Diffusion

Johannes Ackermann, and Minjun Li

In NeurIPS Machine Learning for Creativity and Design Workshop 2022 , Dec 2022

arXiv PDF Blog
ECML-PKDD

Unsupervised Task Clustering for Multi-Task Reinforcement Learning

Johannes Ackermann, Oliver Richter, and Roger Wattenhofer

In ECML-PKDD 2021 , Sep 2021

PDF Video
Workshop

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Johannes Ackermann, Volker Gabler, Takayuki Osa, and Masashi Sugiyama

In Deep Reinforcement Learning Workshop at NeurIPS , Dec 2019

arXiv PDF