Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou
{"title":"利用诱导言语的深度瓶颈特征识别情绪障碍","authors":"Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou","doi":"10.1109/APSIPA.2017.8282296","DOIUrl":null,"url":null,"abstract":"In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Mood disorder identification using deep bottleneck features of elicited speech\",\"authors\":\"Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou\",\"doi\":\"10.1109/APSIPA.2017.8282296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.\",\"PeriodicalId\":142091,\"journal\":{\"name\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2017.8282296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mood disorder identification using deep bottleneck features of elicited speech
In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.