{"title":"RADIANCE: Reliable and interpretable depression detection from speech using transformer","authors":"Anup Kumar Gupta, Ashutosh Dhamaniya, Puneet Gupta","doi":"10.1016/j.compbiomed.2024.109325","DOIUrl":null,"url":null,"abstract":"<div><div>Depression is a common but severe mental disorder that adversely impacts the ability of an individual to function normally in their day-to-day life. A majority of depressed individuals remain undiagnosed due to factors such as social stigma and a shortage of healthcare professionals. Consequently, several Machine Learning and Deep Learning (DL) models based on speech have been proposed for automatic depression detection, with the latter generally outperforming the former. However, DL models are blackbox and offer no transparency. In contrast, healthcare professionals prefer models that provide interpretability besides being accurate. In this direction, we propose a method <em>RADIANCE</em> (Reliable AnD InterpretAble depressioN deteCtion transformErs). <em>RADIANCE</em> incorporates a novel FilterBank VIsion Transformer (<em>FBViT</em>) network, which provides the symptoms of depression as interpretable features. Additionally, we employ a novel loss function that handles the class imbalance issue in the datasets. It also incorporates a penalty term that addresses the hierarchy of misclassification errors. We also propose a reliability predictor based on low-level descriptors that provides a reliability score to indicate the trustworthiness of the prediction by <em>FBViT</em>. Furthermore, in contrast to the conventional averaging and majority pooling, <em>RADIANCE</em> consolidates predictions from multiple clips of the input audio by intricately weighing each prediction based on its reliability score, ensuring a more accurate overall prediction. <em>RADIANCE</em> outperforms the state-of-the-art depression detection methods, achieving an accuracy of 89.36%, 80.36%, and 94.44% over the DAIC-WOZ, E-DAIC, and CMDC datasets, respectively. Further, <em>RADIANCE</em> achieves MAE scores of 3.27 and 5.04 on the DAIC-WOZ and E-DAIC datasets, respectively.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"183 ","pages":"Article 109325"},"PeriodicalIF":7.0000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482524014100","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Depression is a common but severe mental disorder that adversely impacts the ability of an individual to function normally in their day-to-day life. A majority of depressed individuals remain undiagnosed due to factors such as social stigma and a shortage of healthcare professionals. Consequently, several Machine Learning and Deep Learning (DL) models based on speech have been proposed for automatic depression detection, with the latter generally outperforming the former. However, DL models are blackbox and offer no transparency. In contrast, healthcare professionals prefer models that provide interpretability besides being accurate. In this direction, we propose a method RADIANCE (Reliable AnD InterpretAble depressioN deteCtion transformErs). RADIANCE incorporates a novel FilterBank VIsion Transformer (FBViT) network, which provides the symptoms of depression as interpretable features. Additionally, we employ a novel loss function that handles the class imbalance issue in the datasets. It also incorporates a penalty term that addresses the hierarchy of misclassification errors. We also propose a reliability predictor based on low-level descriptors that provides a reliability score to indicate the trustworthiness of the prediction by FBViT. Furthermore, in contrast to the conventional averaging and majority pooling, RADIANCE consolidates predictions from multiple clips of the input audio by intricately weighing each prediction based on its reliability score, ensuring a more accurate overall prediction. RADIANCE outperforms the state-of-the-art depression detection methods, achieving an accuracy of 89.36%, 80.36%, and 94.44% over the DAIC-WOZ, E-DAIC, and CMDC datasets, respectively. Further, RADIANCE achieves MAE scores of 3.27 and 5.04 on the DAIC-WOZ and E-DAIC datasets, respectively.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.