Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper โข 2602.05261 โข Published 17 days ago โข 49
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper โข 2512.16676 โข Published Dec 18, 2025 โข 219
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper โข 2512.17532 โข Published Dec 19, 2025 โข 67
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 Text Generation โข 32B โข Updated 1 day ago โข 977k โข 641
view post Post 2446 NEW: @mistralai released a fantastic family of multimodal models, Ministral 3. You can fine-tune them for free on Colab using TRL โก๏ธ, supporting both SFT and GRPOLink to the notebooks:- SFT: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_ministral3_vl.ipynb- GRPO: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_ministral3_vl.ipynb- TRL and more examples: https://huggingface.co/docs/trl/index See translation 2 replies ยท ๐ฅ 8 8 + Reply