Github

代码库

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
Python