AI Models Excel in Identifying Suicidal Thought Responses

RAND Corporation

Two artificial intelligence platforms are nearly on par with -- or sometimes surpass -- mental health professionals in evaluating appropriate responses to people who exhibit suicidal thoughts, according to a new RAND study.

Though the researchers did not evaluate these models' direct interactions with suicidal individuals, the findings underscore the importance of safe design and rigorous testing, and may provide lessons for those developing tools such as mental health apps built on AI.

The study used a standard assessment tool to test the knowledge of three major, large language models -- ChatGPT by OpenAI, Claude by Anthropic and Gemini by Google. The project is among the first to gauge the knowledge of AI tools about suicide.

The assessment is designed to evaluate an individual's knowledge about what constitutes appropriate responses to a series of statements that might be made by someone who is experiencing suicidal ideation.

Researchers had each of the large language models respond to the assessment tool, comparing the scores of the AI models against previous studies that assessed the knowledge of groups such as K-12 teachers, master's-level psychology students, and practicing mental health professionals.

All three AI models showed a consistent tendency to overrate the appropriateness of clinician responses to suicidal thoughts, suggesting room for improvement in their calibration. However, the overall performance of ChatGPT and Claude proved comparable to that of professional counselors, nurses and psychiatrists as assessed during other studies.

The findings are published by the Journal of Medical Internet Research.

"In evaluating appropriate interactions with individuals expressing suicidal ideation, we found these large language models can be surprisingly discerning," said Ryan McBain, the study's lead author and a senior policy researcher at RAND, a nonprofit research organization. "However, the bias of these models to rate responses as more appropriate than they are -- at least according to clinical experts -- indicates they should be further improved."

Suicide is one of the leading causes of death among individuals under the age of 50 in the U.S., with the rate of suicide growing sharply in recent years.

Large language models have drawn widespread attention as a potential vehicle for helping or harming individuals who are depressed and at risk of suicide. The models are designed to interpret and generate human-like text responses to written and spoken queries, and they include broad health applications.

To assess the knowledge of the three large language models, researchers used an assessment known as the Suicidal Ideation Response Inventory (SIRI-2) that poses 24 hypothetical scenarios in which a patient exhibits depressive symptoms and suicidal ideation, followed by possible clinician responses.

The final score produced by Gemini was roughly equivalent to past scores produced by K-12 school staff prior to suicide intervention skills training. The final score produced by ChatGPT was closer to those exhibited by doctoral students in clinical psychology or master's-level counselors. Claude exhibited the strongest performance, surpassing scores observed even among individuals who recently completed suicide intervention skills training, as well as scores from studies with psychiatrists and other mental health professionals.

"Our goal is to help policymakers and tech developers recognize both the promise and the limitations of using large language models in mental health," McBain said. "We are pressure testing a benchmark that could be used by tech platforms building mental health care, which would be especially impactful in communities that have limited resources. But caution is essential -- these AI models aren't replacements for crisis lines or professional care."

Researchers say that future studies should include directly studying how AI tools respond to questions that might be posted by people who are having suicidal ideation or are experiencing another type of mental health crisis.

Support for the study was provided by the National Institute of Mental Health. Other authors of the study are Jonathan H Cantor, Li Ang Zhang, Aaron Kofner, Joshua Breslau and Bradley Stein, all of RAND; Olesya Baker, Fang Zhang and Hao Yu, all of the Harvard School of Medicine; Alyssa Halbisen of the Harvard Pilgrim Health Care Institute; and Ateev Mehrotra of the Brown University School of Public Health.

RAND Health Care promotes healthier societies by improving health care systems in the United States and other countries.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.