Utkarsh Mehrotra, Sparsh Garg, K. Gurugubelli, A. Vuppala
{"title":"Towards improving Disfluency Detection from Speech using Shifted Delta Cepstral Coefficients","authors":"Utkarsh Mehrotra, Sparsh Garg, K. Gurugubelli, A. Vuppala","doi":"10.1145/3549206.3549269","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the use of shifted delta cepstral (SDC) coefficients for detecting disfluencies in two types of speech - stuttered speech and spontaneous lecture-mode speech. SDC features capture temporal variations in the speech signal effectively across several frames. The UCLASS stuttered speech dataset and IIITH-IED dataset are used here to develop frame-level automatic disfluency detection systems for four types of disfluencies and the effect of SDC features on the detection of each disfluency type is examined using MFCC and SFFCC cepstral representations. Overall, it is found that using MFCC+SDC features gives an absolute improvement of 2.98% and 6.02% for stuttered and spontaneous speech disfluency detection respectively over the static MFCC features, while SFFCC+SDC features give an absolute improvement of 4.62% for stutter disfluencies and 7.28% for spontaneous speech disfluencies over the static SFFCC features, showing the importance of considering temporal variations for disfluency detection.","PeriodicalId":199675,"journal":{"name":"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3549206.3549269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we investigate the use of shifted delta cepstral (SDC) coefficients for detecting disfluencies in two types of speech - stuttered speech and spontaneous lecture-mode speech. SDC features capture temporal variations in the speech signal effectively across several frames. The UCLASS stuttered speech dataset and IIITH-IED dataset are used here to develop frame-level automatic disfluency detection systems for four types of disfluencies and the effect of SDC features on the detection of each disfluency type is examined using MFCC and SFFCC cepstral representations. Overall, it is found that using MFCC+SDC features gives an absolute improvement of 2.98% and 6.02% for stuttered and spontaneous speech disfluency detection respectively over the static MFCC features, while SFFCC+SDC features give an absolute improvement of 4.62% for stutter disfluencies and 7.28% for spontaneous speech disfluencies over the static SFFCC features, showing the importance of considering temporal variations for disfluency detection.