WangResearchLab/SteeringSafety
Viewer
•
Updated
•
84.5k
•
1.22k
•
3
A benchmark for evaluating effectiveness and entanglement in representation steering across seven safety-relevant perspectives