Explained: Towards Monosemanticity in LLMs
I gave a short presentation on Anthropic’s work Towards Monosemanticity: Decomposing Language Models With Dictionary Learning.
Slides can be accessed here.
-->
I gave a short presentation on Anthropic’s work Towards Monosemanticity: Decomposing Language Models With Dictionary Learning.
Slides can be accessed here.