Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published 3 days ago • 49
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published 3 days ago • 49
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features Paper • 2509.22033 • Published Sep 26, 2025 • 19
The Rogue Scalpel: Activation Steering Compromises LLM Safety Paper • 2509.22067 • Published Sep 26, 2025 • 28
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published Mar 24, 2025 • 119