I gave a short presentation on Anthropic’s work Towards Monosemanticity: Decomposing Language Models With Dictionary Learning.

Slides can be accessed here.