AI chatbots struggle with subtle mental health cues

Ina Fried

A bar chart that compares AI model scores on a suicide-risk identification benchmark across 300 clinician-designed, multi-turn role plays in March and April 2026. Claude Sonnet 4.5 leads at 9.19, followed by GPT 5.2 at 9.09, Gemini 2.5 Flash at 8.47 and Grok 4.1 at 6.79. — Data: Mpathic; Chart: Megan Morrone/Axios

The leading chatbots mostly avoid giving dangerous answers to prompts about suicide, but still struggle when mental health risks show up subtly or unfold over long conversations, according to new research from Seattle-based Mpathic.

Why it matters: People are increasingly turning to AI systems for emotional support in conversations where models can sound supportive while missing serious risk — and where mounting lawsuits and regulatory scrutiny are pushing labs to prove their bots are safe enough.

Driving the news: Mpathic built new clinician-led benchmarks for testing AI systems in high-risk conversations and evaluated six major models on suicide-related and eating disorder-related chats.

Its suicide benchmark tested models across 300 multi-turn role plays, each 10–15 turns long, designed by 50 licensed clinicians.
Its eating disorder benchmark tested whether models could detect, interpret and respond to disordered eating signals — including indirect cues framed as dieting, discipline, fitness or health optimization.

What they found: The models generally handled explicit suicide risk better than murkier cases.

On the suicide benchmark, Anthropic's Claude Sonnet 4.5 had the highest score across safety and helpfulness, while OpenAI's GPT-5.2 "stood out for consistently avoiding harmful responses," Mpathic said.
The chatbots all fared less well when it came to discussions around eating disorders, missing more subtle but critical clues, Mpathic said.

What they're saying: "Many of these systems do fairly well when the risk is very explicit," Mpathic co-founder and chief business officer Danielle Schlosser told Axios. "Almost all the models struggled with more nuanced risk signals."

The quality of advice also tends to degrade during extended conversations, said Schlosser, who is also a licensed psychologist.

Reality check: Mpathic is a for-profit company paid to consult with the leading labs to improve model behavior in high-risk human conversations.

How it works: Unlike other evaluations based on a single prompt, Mpathic's mPACT benchmark measures performance based on longer conversations the chatbot has with trained psychologists.

Licensed clinicians create test scenarios that include both explicit and subtle expressions of risk.
Mpathic then evaluates the responses for helpful and harmful behaviors and assesses the models on how well they detect and interpret issues and the quality of their response.

Zoom out: The findings land as AI companies face growing pressure over chatbot safety.

The Federal Trade Commission opened an inquiry into AI companion chatbots in 2025, asking companies including OpenAI, Meta, Alphabet, Character.AI, Snap and xAI about child and teen safety practices.

Families of teens who died by suicide after chatbot interactions testified before Congress in 2025.
Pennsylvania recently sued Character.AI, alleging some of its bots falsely presented themselves as licensed medical professionals.

Between the lines: One of the challenges comes in how AI models are trained. "In the spirit of trying to be helpful, the model usually wants to agree with the user," Schlosser said.

But that gets problematic when a person's goal could harm them, such as someone who requests help planning a 500-calorie-per-day diet, for example.
"Most people don't say 'I'm at risk' directly — they demonstrate it through subtle behaviors over time that are obvious to human clinicians," Mpathic CEO Grin Lord said in a statement.

Yes, but: Large language models are non-deterministic, which means they will give different answers to the same prompt, making it difficult to track the overall quality of responses.

Models are also constantly being updated in ways that can change how they handle particular queries.

What we're watching: The models are getting better at handling obvious crises, but the tougher problem is whether they can stop being agreeable when a user's goal is dangerous.

If you or someone you know needs support now, call or text 988 or chat with someone at 988lifeline.org. En español.

Add Axios on Google

AI chatbots struggle with subtle mental health cues

What to read next