George P. Kafentzis, Theodora Yakoumaki, A. Mouchtaris, Y. Stylianou
{"title":"Analysis of emotional speech using an adaptive sinusoidal model","authors":"George P. Kafentzis, Theodora Yakoumaki, A. Mouchtaris, Y. Stylianou","doi":"10.5281/ZENODO.44181","DOIUrl":null,"url":null,"abstract":"Processing of emotional (or expressive) speech has gained attention over recent years in the speech community due to its numerous applications. In this paper, an adaptive sinusoidal model (aSM), dubbed extended adaptive Quasi-Harmonic Model - eaQHM, is employed to analyze emotional speech in accurate, robust, continuous, timevarying parameters (amplitude, frequency, and phase). It is shown that these parameters can adequately and accurately represent emotional speech content. Using a well known database of narrowband expressive speech (SUSAS) we show that very high Signal-to-Reconstruction-Error Ratio (SRER) values can be obtained, compared to the standard sinusoidal model (SM). Formal listening tests on a smaller wideband speech database show that the eaQHM outperforms SM from a perceptual resynthesis quality point of view. Finally, preliminary emotion classification tests show that the parameters obtained from the adaptive model lead to a higher classification score, compared to the standard SM parameters.","PeriodicalId":198408,"journal":{"name":"2014 22nd European Signal Processing Conference (EUSIPCO)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 22nd European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.44181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Processing of emotional (or expressive) speech has gained attention over recent years in the speech community due to its numerous applications. In this paper, an adaptive sinusoidal model (aSM), dubbed extended adaptive Quasi-Harmonic Model - eaQHM, is employed to analyze emotional speech in accurate, robust, continuous, timevarying parameters (amplitude, frequency, and phase). It is shown that these parameters can adequately and accurately represent emotional speech content. Using a well known database of narrowband expressive speech (SUSAS) we show that very high Signal-to-Reconstruction-Error Ratio (SRER) values can be obtained, compared to the standard sinusoidal model (SM). Formal listening tests on a smaller wideband speech database show that the eaQHM outperforms SM from a perceptual resynthesis quality point of view. Finally, preliminary emotion classification tests show that the parameters obtained from the adaptive model lead to a higher classification score, compared to the standard SM parameters.