Aunhel John M. Adoptante, A. M. Baes, John Carlo A. Catilo, Patrick Kendrex L. Lucero, Anton Louise P. De Ocampo, Alvin S. Alon, Rhowel M. Dellosa
{"title":"SPOKEN-DIGIT CLASSIFICATION USING ARTIFICIAL NEURAL NETWORK","authors":"Aunhel John M. Adoptante, A. M. Baes, John Carlo A. Catilo, Patrick Kendrex L. Lucero, Anton Louise P. De Ocampo, Alvin S. Alon, Rhowel M. Dellosa","doi":"10.11113/aej.v13.18388","DOIUrl":null,"url":null,"abstract":"Audio classification has been one of the most popular applications of Artificial Neural Networks. This process is at the center of modern AI technology, such as virtual assistants, automatic speech recognition, and text-to-speech applications. There have been studies about spoken digit classification and its applications. However, to the best of the author's knowledge, very few works focusing on English spoken digit recognition that implemented ANN classification have been done. In this study, the authors utilized the Mel-Frequency Cepstral Coefficients (MFCC) features of the audio recording and Artificial Neural Network (ANN) as the classifier to recognize the spoken digit by the speaker. The Audio MNIST dataset was used as training and test data while the Free-Spoken Digit Dataset was used as additional validation data. The model showed an F-1 score of 99.56% accuracy for the test data and an F1 score of 81.92% accuracy for the validation data.","PeriodicalId":36749,"journal":{"name":"ASEAN Engineering Journal","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASEAN Engineering Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/aej.v13.18388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 1
Abstract
Audio classification has been one of the most popular applications of Artificial Neural Networks. This process is at the center of modern AI technology, such as virtual assistants, automatic speech recognition, and text-to-speech applications. There have been studies about spoken digit classification and its applications. However, to the best of the author's knowledge, very few works focusing on English spoken digit recognition that implemented ANN classification have been done. In this study, the authors utilized the Mel-Frequency Cepstral Coefficients (MFCC) features of the audio recording and Artificial Neural Network (ANN) as the classifier to recognize the spoken digit by the speaker. The Audio MNIST dataset was used as training and test data while the Free-Spoken Digit Dataset was used as additional validation data. The model showed an F-1 score of 99.56% accuracy for the test data and an F1 score of 81.92% accuracy for the validation data.