TeichAI/LFM2.5-1.2B-Thinking-Pony-Alpha-Distill Text Generation β’ 1B β’ Updated about 21 hours ago β’ 59 β’ 2
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models Paper β’ 2602.04649 β’ Published 9 days ago β’ 11