Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training
- View PDF HTML (experimental) Abstract:Reinforcement learning (RL) has become a central component of post-training large language models (LLMs), yet little is understood about how RL adaptation is distributed across transformer layers.
- Existing approaches typically update all model parameters uniformly, implicitly assuming that every layer contributes similarly to the gains obtained during RL post-training.
- In this work, we challenge this assumption through a systematic layer-wise study of RL training.
Unverified
- View PDF HTML (experimental) Abstract:Reinforcement learning (RL) has become a central component of post-training large language models (LLMs), yet little is understood about how RL adaptation is distributed across transformer layers.
- Existing approaches typically update all model parameters uniformly, implicitly assuming that every layer contributes similarly to the gains obtained during RL post-training.
- In this work, we challenge this assumption through a systematic layer-wise study of RL training.
Sources: Arxiv