A multi-stage large language model framework for extracting suicide-related social determinants of health.

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

Communications medicine Pub Date : 2025-09-29 DOI:10.1038/s43856-025-01114-z

Song Wang, Yishu Wei, Haotian Ma, Max Lovitt, Kelly Deng, Yuan Meng, Zihan Xu, Jingze Zhang, Yunyu Xiao, Ying Ding, Xuhai Xu, Joydeep Ghosh, Yifan Peng

{"title":"A multi-stage large language model framework for extracting suicide-related social determinants of health.","authors":"Song Wang, Yishu Wei, Haotian Ma, Max Lovitt, Kelly Deng, Yuan Meng, Zihan Xu, Jingze Zhang, Yunyu Xiao, Ying Ding, Xuhai Xu, Joydeep Ghosh, Yifan Peng","doi":"10.1038/s43856-025-01114-z","DOIUrl":null,"url":null,"abstract":"Background: Understanding social determinants of health (SDoH) factors contributing to suicide incidents is crucial for early intervention and prevention. However, data-driven approaches to this goal face challenges such as long-tailed factor distributions, analyzing pivotal stressors preceding suicide incidents, and limited model explainability.Methods: We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text. Our approach was compared to other state-of-the-art language models (i.e., pre-trained BioBERT and GPT-3.5-turbo) and reasoning models (i.e., DeepSeek-R1). We also evaluated how the model's explanations help people annotate SDoH factors more quickly and accurately. The analysis included both automated comparisons and a pilot user study.Results: We show that our proposed framework demonstrates performance boosts in the overarching task of extracting SDoH factors and in the finer-grained tasks of retrieving relevant context. Additionally, we show that fine-tuning a smaller, task-specific model achieves comparable or better performance with reduced inference costs. The multi-stage design not only enhances extraction but also provides intermediate explanations, improving model explainability.Conclusions: Our approach improves both the accuracy and transparency of extracting suicide-related SDoH from unstructured texts. These advancements have the potential to support early identification of individuals at risk and inform more effective prevention strategies.","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":"5 1","pages":"404"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12480878/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s43856-025-01114-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Understanding social determinants of health (SDoH) factors contributing to suicide incidents is crucial for early intervention and prevention. However, data-driven approaches to this goal face challenges such as long-tailed factor distributions, analyzing pivotal stressors preceding suicide incidents, and limited model explainability.

Methods: We present a multi-stage large language model framework to enhance SDoH factor extraction from unstructured text. Our approach was compared to other state-of-the-art language models (i.e., pre-trained BioBERT and GPT-3.5-turbo) and reasoning models (i.e., DeepSeek-R1). We also evaluated how the model's explanations help people annotate SDoH factors more quickly and accurately. The analysis included both automated comparisons and a pilot user study.

Results: We show that our proposed framework demonstrates performance boosts in the overarching task of extracting SDoH factors and in the finer-grained tasks of retrieving relevant context. Additionally, we show that fine-tuning a smaller, task-specific model achieves comparable or better performance with reduced inference costs. The multi-stage design not only enhances extraction but also provides intermediate explanations, improving model explainability.

Conclusions: Our approach improves both the accuracy and transparency of extracting suicide-related SDoH from unstructured texts. These advancements have the potential to support early identification of individuals at risk and inform more effective prevention strategies.

查看原文本刊更多论文

用于提取与自杀相关的健康社会决定因素的多阶段大语言模型框架。

背景：了解导致自杀事件的健康社会决定因素（SDoH）对于早期干预和预防至关重要。然而，实现这一目标的数据驱动方法面临着一些挑战，如长尾因素分布、分析自杀事件前的关键压力源以及有限的模型可解释性。方法：我们提出了一个多阶段的大型语言模型框架，以增强从非结构化文本中提取SDoH因子。我们的方法与其他最先进的语言模型（即预训练的BioBERT和GPT-3.5-turbo）和推理模型（即DeepSeek-R1）进行了比较。我们还评估了模型的解释如何帮助人们更快、更准确地注释SDoH因子。分析包括自动比较和试点用户研究。结果：我们表明，我们提出的框架在提取SDoH因素的总体任务和检索相关上下文的细粒度任务中表现出性能提升。此外，我们还表明，微调一个更小的、特定于任务的模型可以在降低推理成本的情况下实现相当或更好的性能。多级设计不仅增强了提取，而且提供了中间解释，提高了模型的可解释性。结论：我们的方法提高了从非结构化文本中提取自杀相关SDoH的准确性和透明度。这些进展有可能支持早期识别有风险的个体，并为更有效的预防战略提供信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Communications medicine

自引率

0.00%

发文量