OpenAI says its AI browser may never be fully secure from hackers – Fortune

OpenAI has said that some attack methods against AI browsers like ChatGPT Atlas are likely here to stay, raising questions about whether AI agents can ever safely operate across the open web.
The main issue is a type of attack called “prompt injection,” where hackers hide malicious instructions in websites, documents, or emails that can trick the AI agent into doing something harmful. For example, an attacker could embed hidden commands in a webpage—perhaps in text that is invisible to the human eye but looks legitimate to an AI—that override a user’s instructions and tell an agent to share a user’s emails, or drain someone’s bank account.
Following the launch of OpenAI’s ChatGPT Atlas browser in October, security researchers were quick to demonstrate how a few words hidden in a Google Doc or clipboard link could manipulate the AI agent’s behavior. Cybersecurity firm Brave, also published findings showing that indirect prompt injection is a systematic challenge affecting multiple AI-powered browsers, including Perplexity’s Comet.
“Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully ‘solved,'” OpenAI wrote in a blog post Monday, adding that “agent mode” in ChatGPT Atlas “expands the security threat surface.”
“We’re optimistic that a proactive, highly responsive rapid response loop can continue to materially reduce real-world risk over time,” the company said.
OpenAI’s approach to the problem is to use an AI-powered attacker of its own—essentially a bot trained through reinforcement learning to act like a hacker seeking ways to sneak malicious instructions to AI agents. The bot can test attacks in simulation, observe how the target AI would respond, then refine its approach and try again repeatedly.
“Our [reinforcement learning]-trained attacker can steer an agent into executing sophisticated, long-horizon harmful workflows that unfold over tens (or even hundreds) of steps,” OpenAI wrote. “We also observed novel attack strategies that did not appear in our human red teaming campaign or external reports.”
However, some cybersecurity experts are skeptical that OpenAI’s approach can address the fundamental problem.
“What concerns me is that we’re trying to retrofit one of the most security-sensitive pieces of consumer software with a technology that’s still probabilistic, opaque, and easy to steer in subtle ways,” Charlie Eriksen, a security researcher at Aikido Security, told Fortune.
“Red-teaming and AI-based vulnerability hunting can catch obvious failures, but they don’t change the underlying dynamic. Until we have much clearer boundaries around what these systems are allowed to do and whose instructions they should listen to, it’s reasonable to be skeptical that the tradeoff makes sense for everyday users right now,” he said. “I think prompt injection will remain a long-term problem … You could even argue that this is a feature, not a bug.”
Security researchers also previously told Fortune that while a lot of cybersecurity risks were essentially a continuous cat-and-mouse game, the deep access that AI agents need—such as users’ passwords and permission to take actions on a user’s behalf—posed such a vulnerable threat opportunity it was unclear if their advantages were worth the risk.
George Chalhoub, assistant professor at UCL Interaction Centre, said that the risk is severe because prompt injection “collapses the boundary between the data and the instructions,” potentially turning an AI agent “from a helpful tool to a potential attack vector against the user” that could extract emails, steal personal data, or access passwords.
“That’s what makes AI browsers fundamentally risky,” Eriksen said. “We’re delegating authority to a system that wasn’t designed with strong isolation or a clear permission model. Traditional browsers treat the web as untrusted by default. Agentic browsers blur that line by allowing content to shape behavior, not just be displayed.”
The U.K.’s National Cyber Security Centre has also warned that prompt injection attacks against generative AI systems are a long‑term issue that may never be fully eliminated. Instead of assuming these attacks can be completely stopped, the agency advises security teams to design systems so that the damage from a successful prompt injection is limited, and to focus on reducing both the likelihood and impact of data exposure or other harmful outcomes.
OpenAI recommends users give agents specific instructions rather than providing broad access with vague directions like “take whatever action is needed.” The company also said Atlas is trained to get user confirmation before sending messages or making payments.
“Wide latitude makes it easier for hidden or malicious content to influence the agent, even when safeguards are in place,” OpenAI said in the blogpost.
Beatrice Nolan is a tech reporter on Fortune’s AI team, covering artificial intelligence and emerging technologies and their impact on work, industry, and culture. She's based in Fortune's London office and holds a bachelor’s degree in English from the University of York. You can reach her securely via Signal at beatricenolan.08
© 2025 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.