Taeho Hwang
doubleyyh
AI & ML interests
None yet
Recent Activity
reacted
to
sergiopaniego's
post
with ๐
about 23 hours ago
TRL v0.27.0 is out!! ๐ฅณ
It includes GDPO, the latest variant of GRPO for multi-reward RL โจ
GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence โ developed by
@sliuau @SimonX et al.
Explore the paper: https://huggingface.co/papers/2601.05242
Explore the full set of changes here:
https://github.com/huggingface/trl/releases/tag/v0.27.0
liked
a Space
12 days ago
SamsungResearch/TRUEBench
upvoted
a
paper
3 months ago
Adaptive Multi-Agent Response Refinement in Conversational Systems