Johannes Ackermann

prof_pic.jpg

I am on the job market for postdocs or industry positions starting Fall 2026 or Spring 2027, please reach out if you have a position matching my interests.

I am a final-year PhD student at the University of Tokyo, working on Reinforcement Learning supervised by Masashi Sugiyama, and a part-time researcher at RIKEN AIP.

I previously interned at Sakana AI and at Preferred Networks, worked on applied ML at Huawei, obtained a B.Sc. and M.Sc. in Electrical Engineering and Information Technology from the Technical University of Munich and wrote my Master’s Thesis at ETH Zurich’s Disco Group.

My main research interest is the nature of rewards in RL post-training.

  • Reward Hacking / Reward Specification in LLM post-training: Desired behaviors for LLMs are often complex to specify via rule-based rewards and thus often use reward models or other LLMs as judges. These are naturally imperfect and we need to account for their imperfections in our training methods. In our recent publication [ICML1] we rephrased post-training as trying to maximize both reward and reward accuracy. We then show that flatness is connected to reward accuracy and gradient regularization can thus be used to preserve reward accuracy in both RLHF and RLVR tasks! We also previously showed that reward models learned from human preferences need off-policy corrections [COLM1] during training. We also have two forthcoming papers about reward hacking through a robust optimization lens [ICMLWorkshop1] and specifically for coding tasks through a Bayes error perspective [ICMLWorkshop2].

But I have also taken on a few other directions over the years:

  • Changing or Structured Tasks: How can we deal with changing tasks during dataset collection for Offline RL [RLC1] or changing dynamics shift during deployment [RLC2]? In Multi-Task RL, all tasks are usually treated as equally (dis)similar. Can we instead identify and use task relations, by learning continuous task spaces [Thesis, Chapter 3] or task clusterings [ECML-PKDD1]? I also (co-)investigated different ways to accumulate rewards, beyond the simple discounted sum [RLC3], such as range, min, max, variance, etc.

  • Optimization Dynamics: I also have a side interest in optimization dynamics, which led to an internship paper about Black-Box-Optimizers [ICML2], but also blends into my RL work [ICML1].

  • Applications: I am also interested in applications of general ML. I previously worked on applied ML for optical communication [ECOC] and applied diffusion models for high-resolution image generation [Workshop].

I’m always happy to chat about research, so feel free to reach out by e-mail or socials!

news

Jun 07, 2026 Two more workshop paper at ICML 2026! Mitigating Reward Hacking in RLHF via Advantage Sign Robustness and Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests ! :tada:
May 01, 2026 Two papers accepted at ICML 2026! Gradient Regularization prevents Reward Hacking in RLHF/RLVR and Bridging Spherical Black Box Optimizers! :tada:
Jul 08, 2025 Off-Policy Corrected Reward Modeling for RLHF has been accepted at COLM 2025 :tada:

selected publications

  1. Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests
    Thanawat Lodkaew, Johannes Ackermann, Soichiro Nishimori, Nontawat Charoenphakdee, Masashi Sugiyama, and Takashi Ishida
    In ICML 2026 Workshop: Statistical Frameworks for Uncertainty in Agentic Systems , Jul 2026
  2. Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
    Johannes Ackermann, Michael Noukhovitch, Takashi Ishida, and Masashi Sugiyama
    In ICML 2026 , Jul 2026
  3. Bridging Spherical Black-Box Optimizers
    Johannes Ackermann, and Stefano Peluchetti
    In ICML 2026 , Jul 2026
  4. Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
    Johannes Ackermann, Takashi Ishida, and Masashi Sugiyama
    In Conference on Language Modeling (COLM) 2025 , Oct 2025
  5. RLC
    recursive_reward_aggregation.png
    Recursive Reward Aggregation
    Yuting Tang, Yivan Zhang, Johannes Ackermann, Yu-Jie Zhang, Soichiro Nishimori, and Masashi Sugiyama
    In Reinforcement Learning Conference (RLC) 2025 , Aug 2025
  6. RLC
    nonstationary_dataset.png
    Offline Reinforcement Learning from Datasets with Structured Non-Stationarity
    Johannes Ackermann, Takayuki Osa, and Masashi Sugiyama
    In Reinforcement Learning Conference (RLC) 2024 , Aug 2024
  7. Unsupervised Task Clustering for Multi-Task Reinforcement Learning
    Johannes Ackermann, Oliver Richter, and Roger Wattenhofer
    In ECML-PKDD 2021 , Sep 2021
  8. Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics
    Johannes Ackermann, Volker Gabler, Takayuki Osa, and Masashi Sugiyama
    In Deep Reinforcement Learning Workshop at NeurIPS , Dec 2019