AnIdealRing's picture

AnIdealRing

SmartDazi

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

upvoted a paper 14 days ago

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

upvoted a paper 21 days ago

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

View all activity

Organizations

SmartDazi 's datasets

None public yet

SmartDazi (AnIdealRing)

AnIdealRing's picture

AnIdealRing

SmartDazi

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

upvoted a paper 14 days ago

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

upvoted a paper 21 days ago

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

View all activity

Organizations

SmartDazi 's datasets

None public yet