Kiruthiga Devi Murugavel , Parthasarathy R , Sandeep Kumar Mathivanan , Saravanan Srinivasan , Basu Dev Shivahare , Mohd Asif Shah
{"title":"A multimodal machine learning model for bipolar disorder mania classification: Insights from acoustic, linguistic, and visual cues","authors":"Kiruthiga Devi Murugavel , Parthasarathy R , Sandeep Kumar Mathivanan , Saravanan Srinivasan , Basu Dev Shivahare , Mohd Asif Shah","doi":"10.1016/j.ibmed.2025.100223","DOIUrl":null,"url":null,"abstract":"<div><div>Mood fluctuations that can vary from manic to depressive states are a symptom of a disease known as bipolar disorder, which affects mental health. Interviews with patients and gathering information from their families are essential steps in the diagnostic process for bipolar disorder. Automated approaches for treating bipolar disorder are also being explored. In mental health prevention and care, machine learning techniques (ML) are increasingly used to detect and treat diseases. With frequently analyzed human behaviour patterns, identified symptoms, and risk factors as various parameters of the dataset, predictions can be made for improving traditional diagnosis methods. In this study, A Multimodal Fusion System was developed based on an auditory, linguistic, and visual patient recording as an input dataset for a three-stage mania classification decision system. Deep Denoising Autoencoders (DDAEs) are introduced to learn common representations across five modalities: acoustic characteristics, eye gaze, facial landmarks, head posture, and Facial Action Units (FAUs). This is done in particular for the audio-visual modality. The distributed representations and the transient information during each recording session are eventually encoded into Fisher Vectors (FVs), which capture the representations. Once the Fisher Vectors (FVs) and document embeddings are integrated, a Multi-Task Neural Network is used to perform the classification task, while mitigating overfitting issues caused by the limited size of the bipolar disorder dataset. The study introduces Deep Denoising Autoencoders (DDAEs) for cross-modal representation learning and utilizes Fisher Vectors with Multi-Task Neural Networks, enhancing diagnostic accuracy while highlighting the benefits of multimodal fusion for mental health diagnostics. Achieving an unweighted average recall score of 64.8 %, with the highest AUC-ROC of 0.85 & less interface time of 6.5 ms/sample scores the effectiveness of integrating multiple modalities in improving system performance and advancing feature representation and model interpretability.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"11 ","pages":"Article 100223"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521225000262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Mood fluctuations that can vary from manic to depressive states are a symptom of a disease known as bipolar disorder, which affects mental health. Interviews with patients and gathering information from their families are essential steps in the diagnostic process for bipolar disorder. Automated approaches for treating bipolar disorder are also being explored. In mental health prevention and care, machine learning techniques (ML) are increasingly used to detect and treat diseases. With frequently analyzed human behaviour patterns, identified symptoms, and risk factors as various parameters of the dataset, predictions can be made for improving traditional diagnosis methods. In this study, A Multimodal Fusion System was developed based on an auditory, linguistic, and visual patient recording as an input dataset for a three-stage mania classification decision system. Deep Denoising Autoencoders (DDAEs) are introduced to learn common representations across five modalities: acoustic characteristics, eye gaze, facial landmarks, head posture, and Facial Action Units (FAUs). This is done in particular for the audio-visual modality. The distributed representations and the transient information during each recording session are eventually encoded into Fisher Vectors (FVs), which capture the representations. Once the Fisher Vectors (FVs) and document embeddings are integrated, a Multi-Task Neural Network is used to perform the classification task, while mitigating overfitting issues caused by the limited size of the bipolar disorder dataset. The study introduces Deep Denoising Autoencoders (DDAEs) for cross-modal representation learning and utilizes Fisher Vectors with Multi-Task Neural Networks, enhancing diagnostic accuracy while highlighting the benefits of multimodal fusion for mental health diagnostics. Achieving an unweighted average recall score of 64.8 %, with the highest AUC-ROC of 0.85 & less interface time of 6.5 ms/sample scores the effectiveness of integrating multiple modalities in improving system performance and advancing feature representation and model interpretability.