发音约束学习在语音情感识别中的应用。

IF 1.9 3区计算机科学 Q2 ACOUSTICS

Eurasip Journal on Audio Speech and Music Processing Pub Date : 2019-01-01 Epub Date: 2019-08-20 DOI:10.1186/s13636-019-0157-9

Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti, Andreas Spanias

{"title":"发音约束学习在语音情感识别中的应用。","authors":"Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti, Andreas Spanias","doi":"10.1186/s13636-019-0157-9","DOIUrl":null,"url":null,"abstract":"Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ 1-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/,/AE/,/IY/,/UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2019 ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13636-019-0157-9","citationCount":"9","resultStr":"{\"title\":\"Articulation constrained learning with application to speech emotion recognition.\",\"authors\":\"Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti, Andreas Spanias\",\"doi\":\"10.1186/s13636-019-0157-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ 1-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/,/AE/,/IY/,/UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.\",\"PeriodicalId\":49202,\"journal\":{\"name\":\"Eurasip Journal on Audio Speech and Music Processing\",\"volume\":\"2019 \",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/s13636-019-0157-9\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eurasip Journal on Audio Speech and Music Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1186/s13636-019-0157-9\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/8/20 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Audio Speech and Music Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s13636-019-0157-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/8/20 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 9

摘要

结合发音信息和声学特征的语音情感识别方法已被证明可以提高识别性能。在许多情况下，大规模收集发音数据可能是不可行的，从而限制了这些方法的范围和适用性。本文提出了一种基于语音和发音信息的情感识别判别学习方法。将传统的1-正则化逻辑回归代价函数扩展到包含附加约束，以强制模型重构铰接数据。这导致同时为两个任务联合优化稀疏和可解释的表示。此外，该模型在训练过程中只需要发音特征;对样本外数据的推断只需要语音特征。实验评估了对元音/AA/、/AE/、/IY/、/UW/和完整语音的情绪识别性能。结合发音信息可以显着提高基于值的分类的性能。在语料库内和跨语料库的分类情绪识别结果表明，该方法在区分快乐和其他情绪方面更有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Articulation constrained learning with application to speech emotion recognition.

查看原文本刊更多论文

Articulation constrained learning with application to speech emotion recognition.

Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional ℓ ₁-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/,/AE/,/IY/,/UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Eurasip Journal on Audio Speech and Music Processing ACOUSTICS-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

4.10

自引率

4.20%

发文量

审稿时长

12 months

期刊介绍： The aim of “EURASIP Journal on Audio, Speech, and Music Processing” is to bring together researchers, scientists and engineers working on the theory and applications of the processing of various audio signals, with a specific focus on speech and music. EURASIP Journal on Audio, Speech, and Music Processing will be an interdisciplinary journal for the dissemination of all basic and applied aspects of speech communication and audio processes.