Jisun Hong , Jihun Lee , Daegil Choi , Jaehyo Jung
{"title":"Depression level prediction via textual and acoustic analysis","authors":"Jisun Hong , Jihun Lee , Daegil Choi , Jaehyo Jung","doi":"10.1016/j.compbiomed.2025.110009","DOIUrl":null,"url":null,"abstract":"<div><div>Extensive research on automatic depression diagnosis has utilized video data to capture related cues, but data collection is challenging because of privacy concerns. By contrast, voice data offer a less-intrusive assessment method and can be analyzed for features such as simple tones, the expression of negative emotions, and a focus on oneself. Recent advancements in multimodal depression-level prediction using speech and text data have gained traction, but most studies overlook the temporal alignment of these modalities, limiting their analysis of the interaction between speech content and intonation. To overcome these limitations, this study introduces timestamp-integrated multimodal encoding for depression (TIMEX-D) which synchronizes the acoustic features of human speech with corresponding text data to predict depression levels on the basis of their relationship. TIMEX-D comprises three main components: a timestamp extraction block that extracts timestamps from speech and text, a multimodal encoding block that extends positional encoding from transformers to mimic human speech recognition, and a depression analysis block that predicts depression levels while reducing model complexity compared with existing transformers. In experiments using the DAIC-WOZ and EDAIC datasets, TIMEX-D achieved accuracies of 99.17 % and 99.81 %, respectively, outperforming previous methods by approximately 13 %. The effectiveness of TIMEX-D in predicting depression levels can enhance mental health diagnostics and monitoring across various contexts.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"190 ","pages":"Article 110009"},"PeriodicalIF":7.0000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525003609","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Extensive research on automatic depression diagnosis has utilized video data to capture related cues, but data collection is challenging because of privacy concerns. By contrast, voice data offer a less-intrusive assessment method and can be analyzed for features such as simple tones, the expression of negative emotions, and a focus on oneself. Recent advancements in multimodal depression-level prediction using speech and text data have gained traction, but most studies overlook the temporal alignment of these modalities, limiting their analysis of the interaction between speech content and intonation. To overcome these limitations, this study introduces timestamp-integrated multimodal encoding for depression (TIMEX-D) which synchronizes the acoustic features of human speech with corresponding text data to predict depression levels on the basis of their relationship. TIMEX-D comprises three main components: a timestamp extraction block that extracts timestamps from speech and text, a multimodal encoding block that extends positional encoding from transformers to mimic human speech recognition, and a depression analysis block that predicts depression levels while reducing model complexity compared with existing transformers. In experiments using the DAIC-WOZ and EDAIC datasets, TIMEX-D achieved accuracies of 99.17 % and 99.81 %, respectively, outperforming previous methods by approximately 13 %. The effectiveness of TIMEX-D in predicting depression levels can enhance mental health diagnostics and monitoring across various contexts.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.