利用诱导言语的深度瓶颈特征识别情绪障碍

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI:10.1109/APSIPA.2017.8282296

Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou

{"title":"利用诱导言语的深度瓶颈特征识别情绪障碍","authors":"Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou","doi":"10.1109/APSIPA.2017.8282296","DOIUrl":null,"url":null,"abstract":"In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Mood disorder identification using deep bottleneck features of elicited speech\",\"authors\":\"Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou\",\"doi\":\"10.1109/APSIPA.2017.8282296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.\",\"PeriodicalId\":142091,\"journal\":{\"name\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2017.8282296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在精神健康障碍的诊断中，很大一部分双相情感障碍(BD)患者在最初表现时很可能被误诊为单极抑郁症(UD)。由于言语是最自然的情绪表达方式，本研究的重点是跟踪诱发言语的情绪特征，用于短期情绪障碍的识别。在这项工作中，提取语音信号的深散射谱(DSS)和低电平描述符(LLDs)作为语音特征。采用层次谱聚类(HSC)算法将情绪数据库与情绪障碍数据库相适应，以缓解数据偏差问题。然后使用去噪自编码器提取DSS和lld的瓶颈特征，以便更好地表示。基于瓶颈特征，采用长短期记忆(LSTM)方法生成时变情绪剖面序列。最后，在给定情绪轮廓序列的情况下，采用基于hmm的识别验证模型对情绪障碍进行识别。本工作收集了15名bd、15名ud和15名健康对照者的情感语音诱发数据，用于系统训练和评估。采用五重交叉验证进行评价。实验结果表明，使用瓶颈特征的系统识别准确率为73.33%，比不使用瓶颈特征的系统提高了8.89%。有验证机制的系统比无验证的系统性能提高了4.44%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mood disorder identification using deep bottleneck features of elicited speech

In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量