Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
Grandmaster Level
Paper
• 2411.03562
• Published
• 69
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
• 2502.06060
• Published
• 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper
• 2502.14499
• Published
• 194
SurveyX: Academic Survey Automation via Large Language Models
Paper
• 2502.14776
• Published
• 100
Why Do Multi-Agent LLM Systems Fail?
Paper
• 2503.13657
• Published
• 48
Scaling Test-time Compute for LLM Agents
Paper
• 2506.12928
• Published
• 63
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper
• 2507.08616
• Published
• 15
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published
• 88
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published
• 158
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
• 2508.03680
• Published
• 136
Efficient Agents: Building Effective Agents While Reducing Cost
Paper
• 2508.02694
• Published
• 86
WideSearch: Benchmarking Agentic Broad Info-Seeking
Paper
• 2508.07999
• Published
• 110
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper
• 2508.05748
• Published
• 141
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
Distillation and Agentic RL
Paper
• 2508.13167
• Published
• 129
Provable Benefits of In-Tool Learning for Large Language Models
Paper
• 2508.20755
• Published
• 11
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published
• 230
GEM: A Gym for Agentic LLMs
Paper
• 2510.01051
• Published
• 90
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
Paper
• 2509.26354
• Published
• 18
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published
• 107
Multi-Agent Tool-Integrated Policy Optimization
Paper
• 2510.04678
• Published
• 31
Don't Just Fine-tune the Agent, Tune the Environment
Paper
• 2510.10197
• Published
• 30
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
Paper
• 2510.09577
• Published
• 8
Agentic Entropy-Balanced Policy Optimization
Paper
• 2510.14545
• Published
• 106
Search Self-play: Pushing the Frontier of Agent Capability without
Supervision
Paper
• 2510.18821
• Published
• 18
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Paper
• 2510.24699
• Published
• 71
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
Paper
• 2511.11373
• Published
• 14
Latent Collaboration in Multi-Agent Systems
Paper
• 2511.20639
• Published
• 121
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory
Paper
• 2511.21678
• Published
• 12
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper
• 2512.04324
• Published
• 154
DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems
Paper
• 2512.06749
• Published
• 28
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
Paper
• 2512.17008
• Published
• 11
Nested Browser-Use Learning for Agentic Information Seeking
Paper
• 2512.23647
• Published
• 19
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards
Paper
• 2601.06021
• Published
• 47
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents
Paper
• 2601.07264
• Published
• 24
MAXS: Meta-Adaptive Exploration with LLM Agents
Paper
• 2601.09259
• Published
• 95
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
Paper
• 2601.11077
• Published
• 65
LLM-in-Sandbox Elicits General Agentic Intelligence
Paper
• 2601.16206
• Published
• 84
Behavior Knowledge Merge in Reinforced Agentic Models
Paper
• 2601.13572
• Published
• 24
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience
Paper
• 2601.15876
• Published
• 90
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents
Paper
• 2601.20975
• Published
• 9
AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Paper
• 2602.03786
• Published
• 85
LatentMem: Customizing Latent Memory for Multi-Agent Systems
Paper
• 2602.03036
• Published
• 14
Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems
Paper
• 2602.08847
• Published
• 26
Multi-agent cooperation through in-context co-player inference
Paper
• 2602.16301
• Published
• 15
Towards a Science of AI Agent Reliability
Paper
• 2602.16666
• Published
• 12
ResearchGym: Evaluating Language Model Agents on Real-World AI Research
Paper
• 2602.15112
• Published
• 18
Discovering Multiagent Learning Algorithms with Large Language Models
Paper
• 2602.16928
• Published
• 11