Towards improving Disfluency Detection from Speech using Shifted Delta Cepstral Coefficients

Utkarsh Mehrotra, Sparsh Garg, K. Gurugubelli, A. Vuppala
{"title":"Towards improving Disfluency Detection from Speech using Shifted Delta Cepstral Coefficients","authors":"Utkarsh Mehrotra, Sparsh Garg, K. Gurugubelli, A. Vuppala","doi":"10.1145/3549206.3549269","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate the use of shifted delta cepstral (SDC) coefficients for detecting disfluencies in two types of speech - stuttered speech and spontaneous lecture-mode speech. SDC features capture temporal variations in the speech signal effectively across several frames. The UCLASS stuttered speech dataset and IIITH-IED dataset are used here to develop frame-level automatic disfluency detection systems for four types of disfluencies and the effect of SDC features on the detection of each disfluency type is examined using MFCC and SFFCC cepstral representations. Overall, it is found that using MFCC+SDC features gives an absolute improvement of 2.98% and 6.02% for stuttered and spontaneous speech disfluency detection respectively over the static MFCC features, while SFFCC+SDC features give an absolute improvement of 4.62% for stutter disfluencies and 7.28% for spontaneous speech disfluencies over the static SFFCC features, showing the importance of considering temporal variations for disfluency detection.","PeriodicalId":199675,"journal":{"name":"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3549206.3549269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, we investigate the use of shifted delta cepstral (SDC) coefficients for detecting disfluencies in two types of speech - stuttered speech and spontaneous lecture-mode speech. SDC features capture temporal variations in the speech signal effectively across several frames. The UCLASS stuttered speech dataset and IIITH-IED dataset are used here to develop frame-level automatic disfluency detection systems for four types of disfluencies and the effect of SDC features on the detection of each disfluency type is examined using MFCC and SFFCC cepstral representations. Overall, it is found that using MFCC+SDC features gives an absolute improvement of 2.98% and 6.02% for stuttered and spontaneous speech disfluency detection respectively over the static MFCC features, while SFFCC+SDC features give an absolute improvement of 4.62% for stutter disfluencies and 7.28% for spontaneous speech disfluencies over the static SFFCC features, showing the importance of considering temporal variations for disfluency detection.
利用移位δ倒谱系数改进语音不流利度检测
在本文中,我们研究了用移位的δ倒谱(SDC)系数来检测两种类型的语音——口吃语音和自发讲课模式语音的不流畅性。SDC特征可以在多个帧中有效地捕捉语音信号的时间变化。本文使用UCLASS结巴语音数据集和IIITH-IED数据集开发了针对四种不流畅类型的帧级自动不流畅检测系统,并使用MFCC和SFFCC倒谱表示检查了SDC特征对每种不流畅类型检测的影响。总体而言,研究发现,与静态MFCC特征相比,使用MFCC+SDC特征对口吃和自发语音不流畅检测的绝对提高分别为2.98%和6.02%,而使用SFFCC+SDC特征对口吃不流畅检测的绝对提高为4.62%,对自发语音不流畅检测的绝对提高为7.28%,这表明了考虑时间变化对不流畅检测的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信