●builderSystem prompts that establish compliant personas may structurally undermine refusal behavior in deployed 7–8B models, warranting review of persona framing in production prompts.
●researcherThis reframes refusal as a two-stage mechanism gated by persona, requiring multi-direction intervention models rather than single refusal-direction ablations.
●policyPersona-based jailbreaks are mechanistically grounded here — safety evaluations that test refusal in isolation underestimate real-world bypass risk.