view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 3 days ago • 32
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 3 days ago • 32
Running on CPU Upgrade Featured 1.21k Open ASR Leaderboard 🏆 1.21k Explore speech recognition model benchmarks and request evaluations
view article Article Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks +2 Nov 21, 2025 • 25
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published 12 days ago • 25