{"title":"Repetition Detection using Spectral Parameters and Multi tapering features","authors":"Drakshayini K B, Anusuya M A","doi":"10.21817/indjcse/2023/v14i4/231404068","DOIUrl":null,"url":null,"abstract":"Handling and addressing the issues in disfluent speech is a challenging task. It is very tedious to identify and remove repetition at the pre-processing step. Many speech related applications such as speech to text alignment, voice based interactive system face these hurdles while designing an automatic disfluent speech recognition system. Since speaker can utter the repeated words partially or miss some words in between makes it challenging. Spectral parameters such as Energy, Entropy, Zero Crossing Rate and centroid are used to detect repetitions. The similarity scores between phonemes and syllabus are detected and computed by employing Dynamic time warping (DTW) and polynomial curve fitting (PCF) approaches. The reconstructed speech signal features are extracted using SWEC-multi tapering window of MFCC procedure. The extracted features are modelled using SVM yielding 85% of recognition accuracy with repetition detection accuracy as 78.04% automatically.","PeriodicalId":52250,"journal":{"name":"Indian Journal of Computer Science and Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian Journal of Computer Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21817/indjcse/2023/v14i4/231404068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
Abstract
Handling and addressing the issues in disfluent speech is a challenging task. It is very tedious to identify and remove repetition at the pre-processing step. Many speech related applications such as speech to text alignment, voice based interactive system face these hurdles while designing an automatic disfluent speech recognition system. Since speaker can utter the repeated words partially or miss some words in between makes it challenging. Spectral parameters such as Energy, Entropy, Zero Crossing Rate and centroid are used to detect repetitions. The similarity scores between phonemes and syllabus are detected and computed by employing Dynamic time warping (DTW) and polynomial curve fitting (PCF) approaches. The reconstructed speech signal features are extracted using SWEC-multi tapering window of MFCC procedure. The extracted features are modelled using SVM yielding 85% of recognition accuracy with repetition detection accuracy as 78.04% automatically.