{"title":"基于时频域特征融合的语音信号分类","authors":"Domy Kristomo, Fx Henry Nugroho","doi":"10.1109/ISRITI54043.2021.9702870","DOIUrl":null,"url":null,"abstract":"The design of a speech recognition system requires a reliable feature extraction process. It has an essential function since a good feature can help to improve the classification rate. Nowadays, the classification of stop consonant is a challenging task, due to the several factors that influence the accuracy of classification. Research that focuses on words formed by stop consonant syllables has not been widely studied by previous local researchers. Feature fusion is one way that can be done in improving the performance of the pattern recognition and classification system. In this paper, we propose three feature sets of the feature fusion by using Discrete Wavelet Transform (DWT) at 7th level decomposition with Daubechies2, Wavelet Packet Transform (WPT) at 4th level decomposition with Daubechies2, Autoregressive Power Spectral Density (AR-PSD), and Statistical method to classify stop consonant word speech signal. According to the experimental results, the classification accuracy for WPT + Statistical, DWT + Statistical, and AR-PSD + Statistical were 94.72%, 92.22%, and 76.38% respectively.","PeriodicalId":156265,"journal":{"name":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classification of Speech Signal based on Feature Fusion in Time and Frequency Domain\",\"authors\":\"Domy Kristomo, Fx Henry Nugroho\",\"doi\":\"10.1109/ISRITI54043.2021.9702870\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The design of a speech recognition system requires a reliable feature extraction process. It has an essential function since a good feature can help to improve the classification rate. Nowadays, the classification of stop consonant is a challenging task, due to the several factors that influence the accuracy of classification. Research that focuses on words formed by stop consonant syllables has not been widely studied by previous local researchers. Feature fusion is one way that can be done in improving the performance of the pattern recognition and classification system. In this paper, we propose three feature sets of the feature fusion by using Discrete Wavelet Transform (DWT) at 7th level decomposition with Daubechies2, Wavelet Packet Transform (WPT) at 4th level decomposition with Daubechies2, Autoregressive Power Spectral Density (AR-PSD), and Statistical method to classify stop consonant word speech signal. According to the experimental results, the classification accuracy for WPT + Statistical, DWT + Statistical, and AR-PSD + Statistical were 94.72%, 92.22%, and 76.38% respectively.\",\"PeriodicalId\":156265,\"journal\":{\"name\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISRITI54043.2021.9702870\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI54043.2021.9702870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Classification of Speech Signal based on Feature Fusion in Time and Frequency Domain
The design of a speech recognition system requires a reliable feature extraction process. It has an essential function since a good feature can help to improve the classification rate. Nowadays, the classification of stop consonant is a challenging task, due to the several factors that influence the accuracy of classification. Research that focuses on words formed by stop consonant syllables has not been widely studied by previous local researchers. Feature fusion is one way that can be done in improving the performance of the pattern recognition and classification system. In this paper, we propose three feature sets of the feature fusion by using Discrete Wavelet Transform (DWT) at 7th level decomposition with Daubechies2, Wavelet Packet Transform (WPT) at 4th level decomposition with Daubechies2, Autoregressive Power Spectral Density (AR-PSD), and Statistical method to classify stop consonant word speech signal. According to the experimental results, the classification accuracy for WPT + Statistical, DWT + Statistical, and AR-PSD + Statistical were 94.72%, 92.22%, and 76.38% respectively.