Malicious Actors Exploiting AI Chatbot Jailbreaking Tips – Security Boulevard

The Home of the Security Bloggers Network
Home » Security Boulevard (Original) » Malicious Actors Exploiting AI Chatbot Jailbreaking Tips
Malicious actors are joining forces—or at least sharing trade secrets—about how to jailbreak AI chatbots like ChatGPT. In other words, they’re acting in concert to get around the ethical and safety guardrails companies have imposed on the fast-developing technology.
“While AI jailbreaking is still in its experimental phase, it allows for the creation of uncensored content without much consideration for the potential consequences,” researchers from SlashNext, who discovered the activity, wrote in a blog post.
“Jailbreaks take advantage of weaknesses in the chatbot’s prompting system,” the researchers explained, adding that “users issue specific commands that trigger an unrestricted mode, causing the AI to disregard its built-in safety measures and guidelines.”
As a result, a chatbot is able to respond without the burden of restrictions on output.
Callie Guenther, cyber threat researcher senior manager at Critical Start, said that recent advancements in AI are groundbreaking, and “while AI jailbreaking is still somewhat nascent, its potential applications—and the concerns they raise—are vast because they allow for content generation with little oversight, which can be particularly alarming when considered in the context of the cyberthreat landscape.”
Among the jailbreak prompts used are straightforward commands and abstract narratives “designed to coax the chatbot into bypassing its constraint,” the SlashNext researchers wrote. “The overall goal is to find specific language that convinces the AI to unleash its full, uncensored potential.”
More recently, SlashNext has observed a proliferation of online communities where members are exploring the potential of AI and testing chatbot boundaries. “Members in these communities exchange jailbreaking tactics, strategies, and prompts to gain unrestricted access to chatbot capabilities,” the researchers said. “These communities foster collaboration among users who are eager to expand the limits of AI through shared experimentation and lessons learned.”
They point to the “Anarchy” method of jailbreaking as an example. In this case, a commanding tone is used “to trigger an unrestricted mode in AI chatbots, specifically targeting ChatGPT.”
When input commands that challenge the chatbot’s limitations are imposed, “users can witness its unhinged abilities firsthand,” they said, detailing “an example of a jailbroken session that offers insights into enhancing the effectiveness of a phishing email and augmenting its persuasiveness.”
Not surprisingly, this genuine interest in pushing the boundaries of AI and chatbots has sparked the interest of cybercriminals keen on building malicious AI tools, which they advertise on cybercrime forums as leveraging unique large language models (LLMs).
“The trend began with a tool called WormGPT, which claimed to employ a custom LLM. Subsequently, other variations emerged, such as EscapeGPT, BadGPT, DarkGPT and Black Hat GPT,” SlashNext noted.
“One of the largest concerns with these prompt-based LLMs (especially publicly available and open source LLMs) is securing against prompt injection vulnerabilities and attacks, similar to the security problems previously faced with SQL-based injections,” said Nicole Carignan, vice president of strategic cyber AI at Darktrace.
“A threat actor can take control of the LLM and force it to produce malicious outputs because of the implicit confusion between the control and data planes in LLMs. By crafting a prompt that can manipulate the LLM to use its prompt as an instruction set, the actor can control the LLM’s response,” she said. “It’s no surprise that threat actors have figured out how to profit from this by offering anonymous interfaces to jailbroken LLMs. This is just one example of how generative AI is upskilling the more novice threat actors.”
But SlashNext’s observations have led researchers to believe “that the majority of these tools do not genuinely utilize custom LLMs, with the exception of WormGPT.”
Rather, the tools “use interfaces that connect to jailbroken versions of public chatbots like ChatGPT, disguised through a wrapper,” they said. “In essence, cybercriminals exploit jailbroken versions of publicly accessible language models like OpenGPT, falsely presenting them as custom LLMs.
“The rise of communities seeking to exploit new technologies isn’t novel,” Guenther pointed out. “With every significant technological leap—whether it was the introduction of smartphones, personal computers or even the internet itself—there have always been both enthusiasts seeking to maximize potential and malicious actors looking for vulnerabilities to exploit.”
But she is concerned with the rise in cybercriminal activity around AI jailbreaking. “Malicious actors are not only devising tools that act as interfaces to jailbroken versions of popular chatbots, but they are also marketing them as unique, custom-built language models,” she said. “In most cases, as our research indicates, these are not custom models but repurposed, jailbroken iterations of platforms like ChatGPT. The primary allure for cybercriminals? Anonymity. Through these interfaces, they can harness AI’s expansive capabilities for illicit purposes, all while remaining undetected.”
But Shawn Surber, senior director of technical account management at Tanium, believed cybercriminals aren’t gaining traction with these actions yet. “I’m not seeing much evidence that it’s really making a significant difference. While there are certainly advantages to non-native speakers in crafting better phishing text or for inexperienced coders to hack together malware more quickly, there’s nothing indicating that professional cybercriminals are gaining any advantage from AI,” he said. “It feels like Black Friday on the dark web. The sellers are all hyping their product to buyers who aren’t doing their own research. ‘Caveat Emptor’ apparently still has meaning, even in the modern malware marketplace.”
When Surber first started reading about “jailbroken” LLMs, he said he was “far more worried that we’d be hearing about malicious actors compromising the AI-driven chatbots that are becoming ubiquitous on legitimate websites,” said Surber.  “To me, that’s a far greater hazard to the common consumer than a phishing email with better grammar.”
“But that doesn’t mean GPT-style AIs aren’t a threat. Rather, we haven’t yet figured out exactly what that threat will be,” he said. “The advantage to the defenders is that with all of this hyper-focus, we’re all looking carefully into the future of AI in cybersecurity and hopefully closing the more serious vulnerabilities before they’re ever exploited.”
No doubt cybercriminals will continue their assault as ChatGPT and other AI systems continue to rise, but SlashNext researchers said some of the concerns can be mitigated if companies focus on responsible innovation and enhance safeguards around AI and its use.
“Organizations like OpenAI are already taking proactive measures to enhance the security of their chatbots,” they said. “They conduct red team exercises to identify vulnerabilities, enforce access controls and diligently monitor for malicious activity.”
The ultimate goal for the organizations “is to develop chatbots that can resist attempts to compromise their safety while continuing to provide valuable services to users.”
Guenther said defenders have two options. “First, they can assist in research on how to secure LLMs from prompt-based injection and share those learnings with the community,” she said. “It has been great to see the security community and the public sector already engaging in this through public events like the AI red teaming exercise at DEF CON.”
And “they can use AI to defend at scale against more sophisticated social engineering attacks,” she added. “It will take a growing arsenal of defensive AI to effectively protect systems in the age of offensive AI, and we are already making significant progress on this front.”
From the time she was 10 years old and her father gave her an electric typewriter for Christmas, Teri Robinson knew she wanted to be a writer. What she didn’t know is how the path from graduate school at LSU, where she earned a Masters degree in Journalism, would lead her on a decades-long journey from her native Louisiana to Washington, D.C. and eventually to New York City where she established a thriving practice as a writer, editor, content specialist and consultant, covering cybersecurity, business and technology, finance, regulatory, policy and customer service, among other topics; contributed to a book on the first year of motherhood; penned award-winning screenplays; and filmed a series of short movies. Most recently, as the executive editor of SC Media, Teri helped transform a 30-year-old, well-respected brand into a digital powerhouse that delivers thought leadership, high-impact journalism and the most relevant, actionable information to an audience of cybersecurity professionals, policymakers and practitioners.
teri-robinson has 195 posts and counting.See all posts by teri-robinson
More Webinars
Security Boulevard Logo White