AI & ML interests

Principled evaluation of mechanistic interpretability methods.

mib-bench (Mechanistic Interpretability Benchmark)

AI & ML interests

Principled evaluation of mechanistic interpretability methods.