Nathan Habib's picture

Building on HF

Nathan Habib PRO

SaylorTwift

huggingface

·

AI & ML interests

Evals

Recent Activity

upvoted an article 1 day ago

Community Evals: Because we're done trusting black-box leaderboards over the community

new activity 2 days ago

MathArena/aime_2025:adds-evalyaml

published an article 3 days ago

Community Evals: Because we're done trusting black-box leaderboards over the community

View all activity

Organizations

upvoted an article 1 day ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

3 days ago

•

32

New activity in MathArena/aime_2025 2 days ago

adds-evalyaml

#2 opened 9 days ago by

published an article 3 days ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

3 days ago

•

32

New activity in MathArena/hmmt_nov_2025 4 days ago

adds-evalyaml

#1 opened 8 days ago by

Upload eval.yaml

#3 opened 4 days ago by

updated a Space 4 days ago

Aime 25

View and analyze log files with interactive inspection tools

published a Space 4 days ago

Aime 25

View and analyze log files with interactive inspection tools

liked a Space 7 days ago

Open ASR Leaderboard

Explore speech recognition model benchmarks and request evaluations

upvoted an article 7 days ago

Article

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

+2

Nov 21, 2025

•

25

liked 2 datasets 7 days ago

speechcolab/gigaspeech

Updated Nov 23, 2023 • 5.47k • 145

facebook/voxpopuli

Viewer • Updated 7 days ago • 1.26M • 7.4k • 142

liked a model 9 days ago

Qwen/Qwen2.5-7B-Instruct

Text Generation • 8B • Updated Jan 12, 2025 • 10.8M • • 1.06k

liked a dataset 10 days ago

Cloudriver/PhyX

Viewer • Updated Dec 22, 2025 • 17k • 1.82k • 24

liked a model 10 days ago

moonshotai/Kimi-K2.5

Image-Text-to-Text • 171B • Updated 1 day ago • 274k • • 1.78k

liked 4 datasets 10 days ago

allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 7.79k • 276k • 313

nyuuzyou/google-code-archive

Viewer • Updated 5 days ago • 65.8M • 1.54k • 68

mercor/apex-agents

Updated about 3 hours ago • 14.9k • 85

Anthropic/EconomicIndex

Updated 22 days ago • 10.5k • 448

upvoted a paper 10 days ago

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Paper • 2601.18137 • Published 12 days ago • 25

liked a dataset 10 days ago

LEXam-Benchmark/LEXam

Viewer • Updated 11 days ago • 7.54k • 535 • 39

SaylorTwift (Nathan Habib)

Nathan Habib's picture

Building on HF

Nathan Habib PRO

SaylorTwift

huggingface

·

AI & ML interests

Evals

Recent Activity

upvoted an article 1 day ago

Community Evals: Because we're done trusting black-box leaderboards over the community

new activity 2 days ago

MathArena/aime_2025:adds-evalyaml

published an article 3 days ago

Community Evals: Because we're done trusting black-box leaderboards over the community

View all activity

Organizations

upvoted an article 1 day ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

3 days ago

•

32

New activity in MathArena/aime_2025 2 days ago

adds-evalyaml

#2 opened 9 days ago by

published an article 3 days ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

3 days ago

•

32

New activity in MathArena/hmmt_nov_2025 4 days ago

adds-evalyaml

#1 opened 8 days ago by

Upload eval.yaml

#3 opened 4 days ago by

updated a Space 4 days ago

Aime 25

View and analyze log files with interactive inspection tools

published a Space 4 days ago

Aime 25

View and analyze log files with interactive inspection tools

liked a Space 7 days ago

Open ASR Leaderboard

Explore speech recognition model benchmarks and request evaluations

upvoted an article 7 days ago

Article

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

+2

Nov 21, 2025

•

25

liked 2 datasets 7 days ago

speechcolab/gigaspeech

Updated Nov 23, 2023 • 5.47k • 145

facebook/voxpopuli

Viewer • Updated 7 days ago • 1.26M • 7.4k • 142

liked a model 9 days ago

Qwen/Qwen2.5-7B-Instruct

Text Generation • 8B • Updated Jan 12, 2025 • 10.8M • • 1.06k

liked a dataset 10 days ago

Cloudriver/PhyX

Viewer • Updated Dec 22, 2025 • 17k • 1.82k • 24

liked a model 10 days ago

moonshotai/Kimi-K2.5

Image-Text-to-Text • 171B • Updated 1 day ago • 274k • • 1.78k

liked 4 datasets 10 days ago

allenai/ai2_arc

Viewer • Updated Dec 21, 2023 • 7.79k • 276k • 313

nyuuzyou/google-code-archive

Viewer • Updated 5 days ago • 65.8M • 1.54k • 68

mercor/apex-agents

Updated about 3 hours ago • 14.9k • 85

Anthropic/EconomicIndex

Updated 22 days ago • 10.5k • 448

upvoted a paper 10 days ago

DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints

Paper • 2601.18137 • Published 12 days ago • 25

liked a dataset 10 days ago

LEXam-Benchmark/LEXam

Viewer • Updated 11 days ago • 7.54k • 535 • 39