On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper β’ 2508.05629 β’ Published Aug 7, 2025 β’ 192
Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b Viewer β’ Updated Jan 31 β’ 306k β’ 2.23k β’ 320