If we harden the models to ignore emotional framing entirely, we kill the very empathy that makes LLMs useful. If we leave them soft, they remain vulnerable to manipulation.
This is the . The model isn't refusing to answer; it is refusing to adopt a specific voice. It defaults to a corporate-safe average, effectively sanding down the jagged edges that make writing interesting. tonal jailbreak
: Embedding a request within a specific tone—such as compassionate, fearful, or curious—can shift the model's perceived intent, making it more likely to provide restricted information. If we harden the models to ignore emotional
"The
: Research into Audio Language Models (ALMs) shows that the literal "tone of voice" in audio inputs can be manipulated to conduct "audio-originated" jailbreak attacks. The model isn't refusing to answer; it is
But as Large Language Models (LLMs) become more sophisticated, a new, more subtle vulnerability has emerged. It doesn’t rely on role-playing tricks (like the famous "DAN" prompt) or obfuscated code. Instead, it relies on .
The Tonal Jailbreak: Explaining the Linguistic Strategy to Bypass AI Safety