Cheng-Che Chen , Justin A. Chen , Chih-Sung Liang , Yu-Hsuan Lin
{"title":"Large language models may struggle to detect culturally embedded filicide-suicide risks","authors":"Cheng-Che Chen , Justin A. Chen , Chih-Sung Liang , Yu-Hsuan Lin","doi":"10.1016/j.ajp.2025.104395","DOIUrl":null,"url":null,"abstract":"<div><div>This study examines the capacity of six large language models (LLMs)—GPT-4o, GPT-o1, DeepSeek-R1, Claude 3.5 Sonnet, Sonar Large (LLaMA-3.1), and Gemma-2-2b—to detect risks of domestic violence, suicide, and filicide-suicide in the Taiwanese flash fiction “Barbecue”. The story, narrated by a six-year-old girl, depicts family tension and subtle cues of potential filicide-suicide through charcoal-burning, a culturally recognized method in Taiwan. Each model was tasked with interpreting the story’s risks, with roles simulating different mental health expertise levels. Results showed that all models detected domestic violence; however, only GPT-o1, Claude 3.5 Sonnet and Sonar Large identified the risk of suicide based on cultural cues. GPT-4o, DeepSeek-R1 and Gemma-2-2b missed the suicide risk, interpreting the mother’s isolation as merely a psychological response. Notably, none of the models comprehended the cultural context behind the mother sparing her daughter, reflecting a gap in LLMs' understanding of non-Western sociocultural nuances. These findings highlight the limitations of LLMs in addressing culturally embedded risks, essential for effective mental health assessments</div></div>","PeriodicalId":8543,"journal":{"name":"Asian journal of psychiatry","volume":"105 ","pages":"Article 104395"},"PeriodicalIF":3.8000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian journal of psychiatry","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1876201825000383","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0
Abstract
This study examines the capacity of six large language models (LLMs)—GPT-4o, GPT-o1, DeepSeek-R1, Claude 3.5 Sonnet, Sonar Large (LLaMA-3.1), and Gemma-2-2b—to detect risks of domestic violence, suicide, and filicide-suicide in the Taiwanese flash fiction “Barbecue”. The story, narrated by a six-year-old girl, depicts family tension and subtle cues of potential filicide-suicide through charcoal-burning, a culturally recognized method in Taiwan. Each model was tasked with interpreting the story’s risks, with roles simulating different mental health expertise levels. Results showed that all models detected domestic violence; however, only GPT-o1, Claude 3.5 Sonnet and Sonar Large identified the risk of suicide based on cultural cues. GPT-4o, DeepSeek-R1 and Gemma-2-2b missed the suicide risk, interpreting the mother’s isolation as merely a psychological response. Notably, none of the models comprehended the cultural context behind the mother sparing her daughter, reflecting a gap in LLMs' understanding of non-Western sociocultural nuances. These findings highlight the limitations of LLMs in addressing culturally embedded risks, essential for effective mental health assessments
期刊介绍:
The Asian Journal of Psychiatry serves as a comprehensive resource for psychiatrists, mental health clinicians, neurologists, physicians, mental health students, and policymakers. Its goal is to facilitate the exchange of research findings and clinical practices between Asia and the global community. The journal focuses on psychiatric research relevant to Asia, covering preclinical, clinical, service system, and policy development topics. It also highlights the socio-cultural diversity of the region in relation to mental health.