通过文本和声学分析预测抑郁水平

IF 6.3 2区 医学 Q1 BIOLOGY
Jisun Hong , Jihun Lee , Daegil Choi , Jaehyo Jung
{"title":"通过文本和声学分析预测抑郁水平","authors":"Jisun Hong ,&nbsp;Jihun Lee ,&nbsp;Daegil Choi ,&nbsp;Jaehyo Jung","doi":"10.1016/j.compbiomed.2025.110009","DOIUrl":null,"url":null,"abstract":"<div><div>Extensive research on automatic depression diagnosis has utilized video data to capture related cues, but data collection is challenging because of privacy concerns. By contrast, voice data offer a less-intrusive assessment method and can be analyzed for features such as simple tones, the expression of negative emotions, and a focus on oneself. Recent advancements in multimodal depression-level prediction using speech and text data have gained traction, but most studies overlook the temporal alignment of these modalities, limiting their analysis of the interaction between speech content and intonation. To overcome these limitations, this study introduces timestamp-integrated multimodal encoding for depression (TIMEX-D) which synchronizes the acoustic features of human speech with corresponding text data to predict depression levels on the basis of their relationship. TIMEX-D comprises three main components: a timestamp extraction block that extracts timestamps from speech and text, a multimodal encoding block that extends positional encoding from transformers to mimic human speech recognition, and a depression analysis block that predicts depression levels while reducing model complexity compared with existing transformers. In experiments using the DAIC-WOZ and EDAIC datasets, TIMEX-D achieved accuracies of 99.17 % and 99.81 %, respectively, outperforming previous methods by approximately 13 %. The effectiveness of TIMEX-D in predicting depression levels can enhance mental health diagnostics and monitoring across various contexts.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"190 ","pages":"Article 110009"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Depression level prediction via textual and acoustic analysis\",\"authors\":\"Jisun Hong ,&nbsp;Jihun Lee ,&nbsp;Daegil Choi ,&nbsp;Jaehyo Jung\",\"doi\":\"10.1016/j.compbiomed.2025.110009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Extensive research on automatic depression diagnosis has utilized video data to capture related cues, but data collection is challenging because of privacy concerns. By contrast, voice data offer a less-intrusive assessment method and can be analyzed for features such as simple tones, the expression of negative emotions, and a focus on oneself. Recent advancements in multimodal depression-level prediction using speech and text data have gained traction, but most studies overlook the temporal alignment of these modalities, limiting their analysis of the interaction between speech content and intonation. To overcome these limitations, this study introduces timestamp-integrated multimodal encoding for depression (TIMEX-D) which synchronizes the acoustic features of human speech with corresponding text data to predict depression levels on the basis of their relationship. TIMEX-D comprises three main components: a timestamp extraction block that extracts timestamps from speech and text, a multimodal encoding block that extends positional encoding from transformers to mimic human speech recognition, and a depression analysis block that predicts depression levels while reducing model complexity compared with existing transformers. In experiments using the DAIC-WOZ and EDAIC datasets, TIMEX-D achieved accuracies of 99.17 % and 99.81 %, respectively, outperforming previous methods by approximately 13 %. The effectiveness of TIMEX-D in predicting depression levels can enhance mental health diagnostics and monitoring across various contexts.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"190 \",\"pages\":\"Article 110009\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010482525003609\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525003609","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

对抑郁症自动诊断的广泛研究已经利用视频数据来捕捉相关线索,但由于隐私问题,数据收集具有挑战性。相比之下,语音数据提供了一种侵入性较小的评估方法,可以分析简单的语气、消极情绪的表达和对自己的关注等特征。最近在使用语音和文本数据进行多模态抑郁水平预测方面取得了进展,但大多数研究忽略了这些模态的时间一致性,限制了他们对语音内容和语调之间相互作用的分析。为了克服这些限制,本研究引入了时间戳集成的抑郁症多模态编码(TIMEX-D),该编码将人类语音的声学特征与相应的文本数据同步,根据它们之间的关系来预测抑郁症的程度。TIMEX-D包括三个主要组件:从语音和文本中提取时间戳的时间戳提取块,从变压器扩展位置编码以模拟人类语音识别的多模态编码块,以及预测抑郁程度的抑郁分析块,同时与现有变压器相比降低了模型复杂性。在使用DAIC-WOZ和EDAIC数据集的实验中,TIMEX-D的准确率分别达到99.17%和99.81%,比以前的方法高出约13%。TIMEX-D在预测抑郁水平方面的有效性可以增强各种情况下的心理健康诊断和监测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Depression level prediction via textual and acoustic analysis
Extensive research on automatic depression diagnosis has utilized video data to capture related cues, but data collection is challenging because of privacy concerns. By contrast, voice data offer a less-intrusive assessment method and can be analyzed for features such as simple tones, the expression of negative emotions, and a focus on oneself. Recent advancements in multimodal depression-level prediction using speech and text data have gained traction, but most studies overlook the temporal alignment of these modalities, limiting their analysis of the interaction between speech content and intonation. To overcome these limitations, this study introduces timestamp-integrated multimodal encoding for depression (TIMEX-D) which synchronizes the acoustic features of human speech with corresponding text data to predict depression levels on the basis of their relationship. TIMEX-D comprises three main components: a timestamp extraction block that extracts timestamps from speech and text, a multimodal encoding block that extends positional encoding from transformers to mimic human speech recognition, and a depression analysis block that predicts depression levels while reducing model complexity compared with existing transformers. In experiments using the DAIC-WOZ and EDAIC datasets, TIMEX-D achieved accuracies of 99.17 % and 99.81 %, respectively, outperforming previous methods by approximately 13 %. The effectiveness of TIMEX-D in predicting depression levels can enhance mental health diagnostics and monitoring across various contexts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信