AI & ML interests

Probing, contrast-consistent search, inference-time intervention, truthfulness, deception, mechanistic interpretability, RLHF

Decept 's models

None public yet
Decept (Truthfulness & Deception Research Team)

AI & ML interests

Probing, contrast-consistent search, inference-time intervention, truthfulness, deception, mechanistic interpretability, RLHF

Decept 's models

None public yet