{"title":"Contribution of Glottal Waveform in Speech Emotion: A Comparative Pairwise Investigation","authors":"Zhongzhe Xiao, Ying Chen, Zhi Tao","doi":"10.1109/PIC.2018.8706134","DOIUrl":null,"url":null,"abstract":"In this work, we investigated the contribution of the glottal waveform in human vocal emotion expressing. Seven emotional states including moderate and intense versions of three emotional families as anger, joy, and sadness, plus a neutral state are considered, with speech samples in Mandarin Chinese. The glottal waveform extracted from speech samples of different emotion states are first analyzed in both time domain and frequency domain to discover their differences. Comparative emotion classifications are then taken out based on features extracted from two sources: original whole speech signal, or only glottal wave signal. Two sets of experiments are performed, as the generation of a performance-driven hierarchical classifier architecture, and pairwise classification on individual emotional states. The low difference between accuracies obtained from the two sources proved that a majority of emotional cues in speech could be conveyed through glottal waveform. The best distinguishable emotional pair by glottal waveform is intense anger against moderate sadness, with the accuracy up to 92.45%. It is also concluded in this work that glottal waveform represent better valence cues than arousal cues of emotion.","PeriodicalId":236106,"journal":{"name":"2018 IEEE International Conference on Progress in Informatics and Computing (PIC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Progress in Informatics and Computing (PIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PIC.2018.8706134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this work, we investigated the contribution of the glottal waveform in human vocal emotion expressing. Seven emotional states including moderate and intense versions of three emotional families as anger, joy, and sadness, plus a neutral state are considered, with speech samples in Mandarin Chinese. The glottal waveform extracted from speech samples of different emotion states are first analyzed in both time domain and frequency domain to discover their differences. Comparative emotion classifications are then taken out based on features extracted from two sources: original whole speech signal, or only glottal wave signal. Two sets of experiments are performed, as the generation of a performance-driven hierarchical classifier architecture, and pairwise classification on individual emotional states. The low difference between accuracies obtained from the two sources proved that a majority of emotional cues in speech could be conveyed through glottal waveform. The best distinguishable emotional pair by glottal waveform is intense anger against moderate sadness, with the accuracy up to 92.45%. It is also concluded in this work that glottal waveform represent better valence cues than arousal cues of emotion.