AI shouldn’t be trusted with your mental health, teen finds – Science News Explores

Come explore with us!
Her research suggests that chatbots aren’t suitable substitutes for human therapists
Zeynep Demirbas wanted to see if AI models like ChatGPT could be trusted with people’s mental health.
Society for Science
By Maria Temming
2 hours ago
Inspiration for Zeynep Demirbas’ research struck during a chat with a family friend. That friend, a psychologist, said some health insurance companies were pushing the use of artificial intelligence, such as ChatGPT, for mental health. The idea: AI might be less costly and easier to access than human therapists.
That worried Zeynep, 14. She knew that ChatGPT often gave wrong answers or agreed with incorrect statements. Could this type of AI, known as a large language model — or LLM — really be trusted with our mental health?
To find out, she tested whether several LLMs could detect stress in human text. She gave the models a dataset of more than 3,500 Reddit posts. Human raters had labeled each one as containing stress or not. Zeynep asked the models to identify which posts showed stress.
To judge how well the models did, Zeynep calculated something called an F1-score for each one. This score considers how many stress-containing posts the models accurately spotted. It also accounts for how often the models missed cases of stress and how often they mislabeled posts as showing stress.
An LLM specifically designed for mental health did the best. It scored about 82 percent. ChatGPT scored only about 74 percent.
An aspiring computer scientist, Zeynep did this project as an eighth grader at Transit Middle School in East Amherst, N.Y. Her research earned her a finalist spot in the 2025 Thermo Fisher Scientific Junior Innovators Challenge. The Society for Science runs this program (and also publishes Science News Explores).
Here, Zeynep shares her research experiences and advice.
ChatGPT performing badly was “really surprising,” Zeynep says. It did even worse than the “random-forest” model. This model makes predictions by using a collection of decision trees, sort of like a super-complex flow chart. Random-forest is “supposed to be a very simple and old technique. So I just put it in as a baseline,” Zeynep says. “That was very interesting — how something so small and simple was able to beat an LLM [like ChatGPT] that used millions of parameters and had so much coding go into it.”
“We should be mindful with AI, because it doesn’t really have an acceptable grade in mental health,” Zeynep says. “That doesn’t mean that LLMs are bad, because they’re for general use. They’re not necessarily meant for mental health.” Her data led her to conclude that LLMs should not be replacing human therapists. Instead, these models might help identify people who are struggling and refer them to a mental health professional.
“One way I feel I could expand it is seeing whether LLMs carry biases toward different genders,” Zeynep says. “How would it respond [differently] if it was girls or guys?” Zeynep has read that sometimes doctors dismiss the symptoms of female patients because they think that women are exaggerating. The doctors’ assumption is an example of personal bias. Since LLMs are trained on texts written by people, they can pick up human biases, Zeynep says. She’s curious whether LLMs would show gender biases similar to human doctors.
artificial intelligence: A type of knowledge-based decision-making exhibited by machines or computers. The term also refers to the field of study in which scientists try to create machines or computer software capable of intelligent behavior.
bias: The tendency to hold a particular perspective or preference that favors some thing, some group or some choice. Scientists often “blind” subjects to the details of a test (don’t tell them what it is) so that their biases will not affect the results.
coding: (in computing) A slang term for developing computer programming — or software — that performs a particular, desired computational task.
decision tree: (in computer science): A type of learning algorithm (or precise series of process steps) to classify things or make predictions. It asks a step-wise series of questions with two possible answers. Based on the answer, it goes through this process again, asking a new question of that first solution, again with two possible answers or outcomes. It goes through this process again and again until it identifies what appears to be the best option among what initially had been a series of choices possible answers.
flowchart: A series of step-by-step actions or decisions that have been mapped as a series of boxes in a diagram. Lines connecting the boxes (and sometimes symbols) identify the order of steps that come from moving from one box to another (and sometimes decision points where some choice must be made).
gender: The complex relationship between someone’s body, their identity and often how their culture tries to assign them roles and behaviors. Gender and biological sex are often incorrectly used to mean the same thing. Gender identity includes binary (female or male) and nonbinary (genderfluid, genderqueer and more). People share some of their gender identity by their choice of pronouns; for example, he, she or they are common ones. Someone’s gender can be the same or different from the sex that individual was assigned at birth.
large language model: (in computing) Language models are a type of machine learning. They attempt to predict upcoming words (in text or speech) and then present those predictions using words that almost anyone should understand. The models learn to do this by reviewing large quantities of text or speech. As their name would imply, large language models train using enormous troves of data. They organize and make sense of those data using “neural nets” — a scheme patterned a bit off of the pathways of nerves in the human brain (and for whose development the 2024 Nobel Prize in physics was awarded). Large language models don’t just learn words, but also phrases made of many words. They can even learn from the context in which a new phrase and idea is worded (meaning the words that accompany those phrases or in which those phrases have been embedded).
mental health: A term for someone’s emotional, psychological and social well-being. It refers to how people behave on their own and how they interact with others. It includes how people make choices, handle stress and manage fear or anxiety. Poor mental health can be triggered by disease or might reflect a short-term response to life’s challenges. It can occur in people of any age, from babies to the elderly.
middle school: A designation for grades six through eight in the U.S. educational system. It comes immediately prior to high school. Some school systems break their age groups slightly different, including sixth grade as part of elementary school and then referring to grades seven and eight as “junior” high school.
model: A simulation of a real-world event (usually using a computer) that has been developed to predict one or more likely outcomes. Or an individual that is meant to display how something would work in or look on others.
parameter: A condition of some situation to be studied or defined that can be quantified or in some way measured.
psychologist: A scientist or mental-health professional who studies the mind, especially in relation to actions and behaviors. Some work with people. Others may conduct experiments with animals (usually rodents) to test how their minds respond to different stimuli and conditions.
random forest model: A common type of algorithm used in machine learning (a type of artificial intelligence). To arrive at a final conclusion, this computer model first considers and accounts for decisions made by two or more decision trees.
Society for Science: A nonprofit organization created in 1921 and based in Washington, D.C. Since its founding, the Society has been promoting not only public engagement in scientific research but also the public understanding of science. It created and continues to run three renowned science competitions: the Regeneron Science Talent Search (begun in 1942), the Regeneron International Science and Engineering Fair (initially launched in 1950) and the middle-school MASTERS competition (from 2010 to 2022) that morphed into the Thermo Fisher Scientific Junior Innovators Challenge (and launched in 2023). The Society also publishes award-winning journalism: in Science News (launched in 1922) and Science News Explores (created in 2003).
stress: (in psychology) A mental, physical, emotional or behavioral reaction to an event or circumstance (stressor) that disturbs a person or animal’s usual state of being or places increased demands on a person or animal; psychological stress can be either positive or negative.
symptom: A physical or mental indicator generally regarded to be characteristic of a disease. Sometimes a single symptom — especially a general one, such as fever or pain — can be a sign of any of many different types of injury or disease.
transit: (in astronomy) The passing of a planet, asteroid or comet across the face of a star, or of a moon across the face of a planet.
Maria Temming is the Assistant Managing Editor at Science News Explores. She has bachelor’s degrees in physics and English, and a master’s in science writing.
Readability Score: 8
Founded in 2003, Science News Explores is a free, award-winning online publication dedicated to providing age-appropriate science news to learners, parents and educators. The publication, as well as Science News magazine, are published by the Society for Science, a nonprofit 501(c)(3) membership organization dedicated to public engagement in scientific research and education.
© Society for Science & the Public 2000–2025. All rights reserved.

source

AI shouldn’t be trusted with your mental health, teen finds – Science News Explores

AI shouldn’t be trusted with your mental health, teen finds – Science News Explores

Jesse

https://playwithchatgtp.com