基于时频域特征融合的语音信号分类

2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) Pub Date : 2021-12-16 DOI:10.1109/ISRITI54043.2021.9702870

Domy Kristomo, Fx Henry Nugroho

{"title":"基于时频域特征融合的语音信号分类","authors":"Domy Kristomo, Fx Henry Nugroho","doi":"10.1109/ISRITI54043.2021.9702870","DOIUrl":null,"url":null,"abstract":"The design of a speech recognition system requires a reliable feature extraction process. It has an essential function since a good feature can help to improve the classification rate. Nowadays, the classification of stop consonant is a challenging task, due to the several factors that influence the accuracy of classification. Research that focuses on words formed by stop consonant syllables has not been widely studied by previous local researchers. Feature fusion is one way that can be done in improving the performance of the pattern recognition and classification system. In this paper, we propose three feature sets of the feature fusion by using Discrete Wavelet Transform (DWT) at 7th level decomposition with Daubechies2, Wavelet Packet Transform (WPT) at 4th level decomposition with Daubechies2, Autoregressive Power Spectral Density (AR-PSD), and Statistical method to classify stop consonant word speech signal. According to the experimental results, the classification accuracy for WPT + Statistical, DWT + Statistical, and AR-PSD + Statistical were 94.72%, 92.22%, and 76.38% respectively.","PeriodicalId":156265,"journal":{"name":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classification of Speech Signal based on Feature Fusion in Time and Frequency Domain\",\"authors\":\"Domy Kristomo, Fx Henry Nugroho\",\"doi\":\"10.1109/ISRITI54043.2021.9702870\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The design of a speech recognition system requires a reliable feature extraction process. It has an essential function since a good feature can help to improve the classification rate. Nowadays, the classification of stop consonant is a challenging task, due to the several factors that influence the accuracy of classification. Research that focuses on words formed by stop consonant syllables has not been widely studied by previous local researchers. Feature fusion is one way that can be done in improving the performance of the pattern recognition and classification system. In this paper, we propose three feature sets of the feature fusion by using Discrete Wavelet Transform (DWT) at 7th level decomposition with Daubechies2, Wavelet Packet Transform (WPT) at 4th level decomposition with Daubechies2, Autoregressive Power Spectral Density (AR-PSD), and Statistical method to classify stop consonant word speech signal. According to the experimental results, the classification accuracy for WPT + Statistical, DWT + Statistical, and AR-PSD + Statistical were 94.72%, 92.22%, and 76.38% respectively.\",\"PeriodicalId\":156265,\"journal\":{\"name\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISRITI54043.2021.9702870\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI54043.2021.9702870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语音识别系统的设计需要可靠的特征提取过程。它有一个重要的功能，因为一个好的特征可以帮助提高分类率。目前，由于影响分音准确性的因素较多，顿音的分类是一项具有挑战性的任务。以往国内学者对顿音音节构成的单词的研究并不广泛。特征融合是提高模式识别和分类系统性能的一种方法。本文提出了基于Daubechies2的7级离散小波变换(DWT)、基于Daubechies2的4级小波包变换(WPT)、自回归功率谱密度(AR-PSD)和统计学方法的特征融合的3个特征集，对辅音停止词语音信号进行分类。实验结果表明，WPT + Statistical、DWT + Statistical和AR-PSD + Statistical的分类准确率分别为94.72%、92.22%和76.38%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classification of Speech Signal based on Feature Fusion in Time and Frequency Domain

The design of a speech recognition system requires a reliable feature extraction process. It has an essential function since a good feature can help to improve the classification rate. Nowadays, the classification of stop consonant is a challenging task, due to the several factors that influence the accuracy of classification. Research that focuses on words formed by stop consonant syllables has not been widely studied by previous local researchers. Feature fusion is one way that can be done in improving the performance of the pattern recognition and classification system. In this paper, we propose three feature sets of the feature fusion by using Discrete Wavelet Transform (DWT) at 7th level decomposition with Daubechies2, Wavelet Packet Transform (WPT) at 4th level decomposition with Daubechies2, Autoregressive Power Spectral Density (AR-PSD), and Statistical method to classify stop consonant word speech signal. According to the experimental results, the classification accuracy for WPT + Statistical, DWT + Statistical, and AR-PSD + Statistical were 94.72%, 92.22%, and 76.38% respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)

自引率

0.00%

发文量