Taeho Hwang's picture

2 9 2

Taeho Hwang

doubleyyh

·

ThisIsHwang

AI & ML interests

None yet

Recent Activity

reacted to sergiopaniego's post with 🚀 about 23 hours ago

TRL v0.27.0 is out!! 🥳 It includes GDPO, the latest variant of GRPO for multi-reward RL ✨ GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence — developed by @sliuau @SimonX et al. Explore the paper: https://huggingface.co/papers/2601.05242 Explore the full set of changes here: https://github.com/huggingface/trl/releases/tag/v0.27.0

liked a Space 12 days ago

SamsungResearch/TRUEBench

upvoted a paper 3 months ago

Adaptive Multi-Agent Response Refinement in Conversational Systems

View all activity

Organizations

New activity in nvidia/Llama-Nemotron-Post-Training-Dataset 7 months ago

Regarding Instruction Following SFT dataset.

#22 opened 7 months ago by

doubleyyh (Taeho Hwang) – Community Activity

Taeho Hwang's picture

2 9 2

Taeho Hwang

doubleyyh

·

ThisIsHwang

AI & ML interests

None yet

Recent Activity

reacted to sergiopaniego's post with 🚀 about 23 hours ago

TRL v0.27.0 is out!! 🥳 It includes GDPO, the latest variant of GRPO for multi-reward RL ✨ GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence — developed by @sliuau @SimonX et al. Explore the paper: https://huggingface.co/papers/2601.05242 Explore the full set of changes here: https://github.com/huggingface/trl/releases/tag/v0.27.0

liked a Space 12 days ago

SamsungResearch/TRUEBench

upvoted a paper 3 months ago

Adaptive Multi-Agent Response Refinement in Conversational Systems

View all activity

Organizations

New activity in nvidia/Llama-Nemotron-Post-Training-Dataset 7 months ago

Regarding Instruction Following SFT dataset.

#22 opened 7 months ago by