A multi-stage large language model framework for extracting suicide-related social determinants of health.

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL
Song Wang, Yishu Wei, Haotian Ma, Max Lovitt, Kelly Deng, Yuan Meng, Zihan Xu, Jingze Zhang, Yunyu Xiao, Ying Ding, Xuhai Xu, Joydeep Ghosh, Yifan Peng
{"title":"A multi-stage large language model framework for extracting suicide-related social determinants of health.","authors":"Song Wang, Yishu Wei, Haotian Ma, Max Lovitt, Kelly Deng, Yuan Meng, Zihan Xu, Jingze Zhang, Yunyu Xiao, Ying Ding, Xuhai Xu, Joydeep Ghosh, Yifan Peng","doi":"10.1038/s43856-025-01114-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Understanding social determinants of health (SDoH) factors contributing to suicide incidents is crucial for early intervention and prevention. However, data-driven approaches to this goal face challenges such as long-tailed factor distributions, analyzing pivotal stressors preceding suicide incidents, and limited model explainability.</p><p><strong>Methods: </strong>We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text. Our approach was compared to other state-of-the-art language models (i.e., pre-trained BioBERT and GPT-3.5-turbo) and reasoning models (i.e., DeepSeek-R1). We also evaluated how the model's explanations help people annotate SDoH factors more quickly and accurately. The analysis included both automated comparisons and a pilot user study.</p><p><strong>Results: </strong>We show that our proposed framework demonstrates performance boosts in the overarching task of extracting SDoH factors and in the finer-grained tasks of retrieving relevant context. Additionally, we show that fine-tuning a smaller, task-specific model achieves comparable or better performance with reduced inference costs. The multi-stage design not only enhances extraction but also provides intermediate explanations, improving model explainability.</p><p><strong>Conclusions: </strong>Our approach improves both the accuracy and transparency of extracting suicide-related SDoH from unstructured texts. These advancements have the potential to support early identification of individuals at risk and inform more effective prevention strategies.</p>","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"404"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12480878/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-01114-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Understanding social determinants of health (SDoH) factors contributing to suicide incidents is crucial for early intervention and prevention. However, data-driven approaches to this goal face challenges such as long-tailed factor distributions, analyzing pivotal stressors preceding suicide incidents, and limited model explainability.

Methods: We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text. Our approach was compared to other state-of-the-art language models (i.e., pre-trained BioBERT and GPT-3.5-turbo) and reasoning models (i.e., DeepSeek-R1). We also evaluated how the model's explanations help people annotate SDoH factors more quickly and accurately. The analysis included both automated comparisons and a pilot user study.

Results: We show that our proposed framework demonstrates performance boosts in the overarching task of extracting SDoH factors and in the finer-grained tasks of retrieving relevant context. Additionally, we show that fine-tuning a smaller, task-specific model achieves comparable or better performance with reduced inference costs. The multi-stage design not only enhances extraction but also provides intermediate explanations, improving model explainability.

Conclusions: Our approach improves both the accuracy and transparency of extracting suicide-related SDoH from unstructured texts. These advancements have the potential to support early identification of individuals at risk and inform more effective prevention strategies.

用于提取与自杀相关的健康社会决定因素的多阶段大语言模型框架。
背景:了解导致自杀事件的健康社会决定因素(SDoH)对于早期干预和预防至关重要。然而,实现这一目标的数据驱动方法面临着一些挑战,如长尾因素分布、分析自杀事件前的关键压力源以及有限的模型可解释性。方法:我们提出了一个多阶段的大型语言模型框架,以增强从非结构化文本中提取SDoH因子。我们的方法与其他最先进的语言模型(即预训练的BioBERT和GPT-3.5-turbo)和推理模型(即DeepSeek-R1)进行了比较。我们还评估了模型的解释如何帮助人们更快、更准确地注释SDoH因子。分析包括自动比较和试点用户研究。结果:我们表明,我们提出的框架在提取SDoH因素的总体任务和检索相关上下文的细粒度任务中表现出性能提升。此外,我们还表明,微调一个更小的、特定于任务的模型可以在降低推理成本的情况下实现相当或更好的性能。多级设计不仅增强了提取,而且提供了中间解释,提高了模型的可解释性。结论:我们的方法提高了从非结构化文本中提取自杀相关SDoH的准确性和透明度。这些进展有可能支持早期识别有风险的个体,并为更有效的预防战略提供信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信