推理语言模型更透明地预测自杀风险。

IF 4.9 0 PSYCHIATRY
Thomas H McCoy,Roy H Perlis
{"title":"推理语言模型更透明地预测自杀风险。","authors":"Thomas H McCoy,Roy H Perlis","doi":"10.1136/bmjment-2025-301654","DOIUrl":null,"url":null,"abstract":"BACKGROUND\r\nWe previously demonstrated that a large language model could estimate suicide risk using hospital discharge notes.\r\n\r\nOBJECTIVE\r\nWith the emergence of reasoning models that can be run on consumer-grade hardware, we investigated whether these models can approximate the performance of much larger and costlier models.\r\n\r\nMETHODS\r\nFrom 458 053 adults hospitalised at one of two academic medical centres between 4 January 2005 and 2 January 2014, we identified 1995 who died by suicide or accident, and matched them with 5 control individuals. We used Llama-DeepSeek-R1 8B to generate predictions of risk. Beyond discrimination and calibration, we examined the aspects of model reasoning-that is, the topics in the chain of thought-associated with correct or incorrect predictions.\r\n\r\nFINDINGS\r\nThe cohort included 1995 individuals who died by suicide or accidental death and 9975 individuals matched 5:1, totalling 11 954 discharges and 58 933 person-years of follow-up. In Fine and Grey regression, hazard as estimated by the Llama3-distilled model was significantly associated with observed risk (unadjusted HR 4.65 (3.58-6.04)). The corresponding c-statistic was 0.64 (0.63-0.65), modestly poorer than the GPT4o model (0.67 (0.66-0.68)). In chain-of-thought reasoning, topics including Substance Abuse, Surgical Procedure, and Age-related Comorbidities were associated with correct predictions, while Fall-related Injury was associated with incorrect prediction.\r\n\r\nCONCLUSIONS\r\nApplication of a reasoning model using local, consumer-grade hardware only modestly diminished performance in stratifying suicide risk.\r\n\r\nCLINICAL IMPLICATIONS\r\nSmaller models can yield more secure, scalable and transparent risk prediction.","PeriodicalId":72434,"journal":{"name":"BMJ mental health","volume":"28 1","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reasoning language models for more transparent prediction of suicide risk.\",\"authors\":\"Thomas H McCoy,Roy H Perlis\",\"doi\":\"10.1136/bmjment-2025-301654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"BACKGROUND\\r\\nWe previously demonstrated that a large language model could estimate suicide risk using hospital discharge notes.\\r\\n\\r\\nOBJECTIVE\\r\\nWith the emergence of reasoning models that can be run on consumer-grade hardware, we investigated whether these models can approximate the performance of much larger and costlier models.\\r\\n\\r\\nMETHODS\\r\\nFrom 458 053 adults hospitalised at one of two academic medical centres between 4 January 2005 and 2 January 2014, we identified 1995 who died by suicide or accident, and matched them with 5 control individuals. We used Llama-DeepSeek-R1 8B to generate predictions of risk. Beyond discrimination and calibration, we examined the aspects of model reasoning-that is, the topics in the chain of thought-associated with correct or incorrect predictions.\\r\\n\\r\\nFINDINGS\\r\\nThe cohort included 1995 individuals who died by suicide or accidental death and 9975 individuals matched 5:1, totalling 11 954 discharges and 58 933 person-years of follow-up. In Fine and Grey regression, hazard as estimated by the Llama3-distilled model was significantly associated with observed risk (unadjusted HR 4.65 (3.58-6.04)). The corresponding c-statistic was 0.64 (0.63-0.65), modestly poorer than the GPT4o model (0.67 (0.66-0.68)). In chain-of-thought reasoning, topics including Substance Abuse, Surgical Procedure, and Age-related Comorbidities were associated with correct predictions, while Fall-related Injury was associated with incorrect prediction.\\r\\n\\r\\nCONCLUSIONS\\r\\nApplication of a reasoning model using local, consumer-grade hardware only modestly diminished performance in stratifying suicide risk.\\r\\n\\r\\nCLINICAL IMPLICATIONS\\r\\nSmaller models can yield more secure, scalable and transparent risk prediction.\",\"PeriodicalId\":72434,\"journal\":{\"name\":\"BMJ mental health\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMJ mental health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1136/bmjment-2025-301654\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ mental health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjment-2025-301654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

摘要

我们之前证明了一个大型语言模型可以使用出院记录来估计自杀风险。随着可以在消费级硬件上运行的推理模型的出现,我们研究这些模型是否可以近似于更大、更昂贵的模型的性能。方法:从2005年1月4日至2014年1月2日在两家学术医疗中心之一住院的458053名成年人中,我们确定了1995名死于自杀或事故的人,并将他们与5名对照个体进行了匹配。我们使用Llama-DeepSeek-R1 8B来生成风险预测。除了区分和校准之外,我们还研究了模型推理的各个方面,即与正确或错误预测相关的思想链中的主题。研究结果:该队列包括1995名自杀或意外死亡的个体和9975名比例为5:1的个体,共11,954名出院者和58933人年的随访。在Fine和Grey回归中,llama3蒸馏模型估计的风险与观察到的风险显著相关(未经调整的HR 4.65(3.58-6.04))。相应的c-统计量为0.64(0.63-0.65),略低于gpt40模型(0.67(0.66-0.68))。在思维链推理中,包括药物滥用、外科手术和年龄相关共病在内的主题与正确的预测相关,而与跌倒相关的伤害与错误的预测相关。结论使用本地消费级硬件的推理模型只会适度降低自杀风险分层的性能。临床意义较小的模型可以产生更安全、可扩展和透明的风险预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reasoning language models for more transparent prediction of suicide risk.
BACKGROUND We previously demonstrated that a large language model could estimate suicide risk using hospital discharge notes. OBJECTIVE With the emergence of reasoning models that can be run on consumer-grade hardware, we investigated whether these models can approximate the performance of much larger and costlier models. METHODS From 458 053 adults hospitalised at one of two academic medical centres between 4 January 2005 and 2 January 2014, we identified 1995 who died by suicide or accident, and matched them with 5 control individuals. We used Llama-DeepSeek-R1 8B to generate predictions of risk. Beyond discrimination and calibration, we examined the aspects of model reasoning-that is, the topics in the chain of thought-associated with correct or incorrect predictions. FINDINGS The cohort included 1995 individuals who died by suicide or accidental death and 9975 individuals matched 5:1, totalling 11 954 discharges and 58 933 person-years of follow-up. In Fine and Grey regression, hazard as estimated by the Llama3-distilled model was significantly associated with observed risk (unadjusted HR 4.65 (3.58-6.04)). The corresponding c-statistic was 0.64 (0.63-0.65), modestly poorer than the GPT4o model (0.67 (0.66-0.68)). In chain-of-thought reasoning, topics including Substance Abuse, Surgical Procedure, and Age-related Comorbidities were associated with correct predictions, while Fall-related Injury was associated with incorrect prediction. CONCLUSIONS Application of a reasoning model using local, consumer-grade hardware only modestly diminished performance in stratifying suicide risk. CLINICAL IMPLICATIONS Smaller models can yield more secure, scalable and transparent risk prediction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信