AI chatbots like ChatGPT and Gemini will agree with you even when you’re wrong, new study finds – Mint

Most people these days are interacting with one chatbot or the other depending on their preferences. But no matter which chatbot you may be using, one would have noticed that the AI seems to agree with you more than your friends or family members usually would and now we have a study confirming that this is indeed the truth.
A new study which was posted as preprint on arXiv server analyzed the top 11 AI chatbots including OpenAI’s ChatGPT, Anthropic’s Claude, Meta’s Llama, DeepSeek and Google Gemini among others.
The researchers found that turning to AI chatbots for personal advice comes with “insidious risks”. Many of the current chatbots would validate the users when their posts contains manipulation, deception, or self-harm themes. This, the researchers argue, leads to making people less inclined to take prosocial steps like repairing a relationship and more convinced that they were right.
Moreover, the chatbots which engage in sycophantic behavious – being overly agreeable with users – are also rated as higher quality and users said they would use them again, which creates incentives for the models to keep agreeing with users .
Myra Cheng, a computer scientist at Stanford University and one of the authors of the study, told Guardian that “social sycophancy” in AI chatbots is a huge problem.
“Our key concern is that if models are always affirming people, then this may distort people’s judgments of themselves, their relationships, and the world around them. It can be hard to even realise that models are subtly, or not-so-subtly, reinforcing their existing beliefs, assumptions, and decisions.” Cheng told the publication
The research showed that these leading chatbots are 50% more likely to agree with the user in personal advice as compared to humans.
The researchers told Nature that they tested the impact of sycophantic behaviour in solving mathematical problems. They designed experiments using 504 mathematical problems from competitions held this year and altered each theorem statement to introduce subtle errors while asking four LLMs to provide proof for the flawed statements.
The idea behind this experiment was to check if sycophantic behaviour of the AI chatbots would lead to them to fail to detect the errors in a statement.
Among the analyzed chatbots, OpenAI’s GPT-5 showed the least sycophantic behavious with the chatbot generating overly agreeable responses in 29% of the time. Meanwhile, DeepSeek’s V3.1 model was found to be the most sycophantic, generating overly agreeable responses 70% of the time.
The researchers say that these LLMs have the capability to spot the errors in mahematical statements but they “just assumed what the user says is correct”.
Catch all the Technology News and Updates on Live Mint. Download The Mint News App to get Daily Market Updates & Live Business News.
Download the Mint app and read premium stories
Log in to our website to save your bookmarks. It'll just take a moment.