Notable publication: Reinforcement Learning vs Supervised Learning for Code Refactoring
In “Reinforcement Learning vs Supervised Learning: A Tug of War to Generate Refactored Code Accurately,” Indranil Palit and Tushar Sharma investigate how to best align code language models for automated refactoring. They propose a reinforcement learning–based approach that fine-tunes sequence-to-sequence generative models and aligns them using the Proximal Policy Optimisation (PPO) algorithm.
The method uses code compilation and the presence of the desired refactoring in the generated code as reward signals, enabling the model to learn accurate extract-method refactorings for Java source code. The paper appears at EASE 2025 (Evaluation and Assessment in Software Engineering), a CORE A–ranked conference.
Read the paper: https://tusharma.in/preprints/EASE2025_RL_Refactoring.pdf