Ongoing work with high school students I mentor on mitigating LLM over-refusal with fine-grained refusal tokens has been accepted to the NeurIPS 2025 Mechanistic Interpretability Workshop (≈300 submissions)!
Ongoing work with high school students I mentor on mitigating LLM over-refusal with fine-grained refusal tokens has been accepted to the NeurIPS 2025 Mechanistic Interpretability Workshop (≈300 submissions)!