Study Reveals Poetic Prompts Can Bypass AI Safety; Tricking Major Chatbots Into Harmful Replies – Mashable India

Artificial intelligence (AI) chatbots are trained to provide helpful information while blocking harmful or dangerous content. Most major AI systems refuse to answer prompts involving cyberattacks, weapons, manipulation, or any activity that violates safety standards. However, new research reveals an unexpected loophole: simply rewriting the request in poetic form may be enough to bypass these safeguards.
A study conducted by Icaro Lab, a joint initiative between Sapienza University of Rome and the DexAI think tank, examined whether creative, lyrical prompts could break through safety filters built into large language models (LLMs). The findings were alarming. By reshaping malicious prompts into poems, researchers were able to trick all 25 chatbots tested, including those from Google, OpenAI, Anthropic, Meta, and xAI.
According to the study, poetic phrasing enabled researchers to obtain harmful responses 62% of the time on average. Some of the most advanced AI models responded incorrectly as much as 90% of the time, indicating that even cutting-edge safety systems are vulnerable to this form of bypass.
AI poetry exploit exposes safety laws
Poetry can bypass AI safety guardrails: Study @samikshaa3 brings you more
62% success rate in jailbreaking 25 AI models pic.twitter.com/jXqpsBZHad— WION (@WIONews) December 1, 2025
The prompts used in the tests covered categories such as cybercrime, harmful persuasion, and CBRN-related concerns (Chemical, Biological, Radiological and Nuclear). When written directly and clearly, models typically blocked them. But when rewritten metaphorically, using symbolism, rhythmic language, and abstract imagery, many chatbots failed to interpret the underlying harmful intent.
The researchers found a 34.99% higher chance of eliciting unsafe responses when prompts were written poetically instead of in plain language.
The core reason lies in how safety systems currently work. Most LLM safety mechanisms rely on detecting specific keywords, patterns, and sentence structures commonly associated with harmful content. Poetic language disrupts these patterns. Metaphors, fragmented syntax, unusual word order, and artistic ambiguity make it harder for the system to recognise intent.
As the study explains, chatbots may interpret a poetic request as a harmless creative exercise rather than a dangerous inquiry. As a result, the model may generate information it would normally restrict.
This highlights a critical flaw: AI models are not yet good at understanding the deeper purpose behind creative phrasing. When safety checks focus mainly on surface-level text patterns, users can disguise malicious intent simply by adopting an artistic style.
The researchers did not publicly release the full set of prompts they used, citing safety reasons. However, their findings underline the urgent need for more robust safeguards, ones capable of analysing intent, not just wording.
As AI systems continue to evolve, the study warns that overlooking creative language could leave models open to manipulation, raising important questions about the future of AI safety.
SEE ALSO: Google’s Nano Banana Pro Shocks Users As Viral Comparison Shows AI Images Nearly Identical To Real Photos