AI chatbots can be persuaded to break rules using basic psych tricks – PCWorld
When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.
A new study from researchers at University of Pennsylvania shows that AI models can be persuaded to break their own rules using several classic psychological tricks, reports The Verge.
In the study, the Penn researchers tested seven different persuasive techniques on OpenAI’s GPT-4o mini model, including authority, commitment, liking, reciprocity, scarcity, social proof, and unity.
The most successful method turned out to be commitment. By first getting the model to answer a seemingly innocent question, the researchers were then able to escalate to more rule-breaking responses. One example was when the model first agreed to use milder insults before also accepting harsher ones.
Techniques such as flattery and peer pressure also had an effect, albeit to a lesser extent. Nevertheless, these methods demonstrably increased the likelihood of the AI model giving in to forbidden requests.
This article originally appeared on our sister publication PC för Alla and was translated and localized from Swedish.
Viktor writes news and reports for our sister sites, M3 and PC för Alla. He is passionate about technology and is on the ball with the latest product releases and the hottest talking points in the consumer tech industry.
Business
Laptop
Mobile
PC Hardware
Storage
Deals
TechHive
Digital Magazine – Subscribe
Digital Magazine – Info
Gift Subscription
Newsletters