WALAR - a lyf07 Collection

lyf07 's Collections

WALAR

updated Mar 17

Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

lyf07/LLaMAX3-8B-Alpaca-WALAR

Translation • 8B • Updated Mar 21 • 3
lyf07/Qwen3-8B-WALAR

Translation • 8B • Updated Mar 21 • 4
lyf07/Translategemma-4B-it-WALAR

Translation • 769k • Updated Mar 21 • 104 • 3
Mending the Holes: Mitigating Reward Hacking in Reinforcement Learning for Multilingual Translation

Paper • 2603.13045 • Published Mar 13 • 2