SPOKEN-DIGIT CLASSIFICATION USING ARTIFICIAL NEURAL NETWORK

Q4 Earth and Planetary Sciences
Aunhel John M. Adoptante, A. M. Baes, John Carlo A. Catilo, Patrick Kendrex L. Lucero, Anton Louise P. De Ocampo, Alvin S. Alon, Rhowel M. Dellosa
{"title":"SPOKEN-DIGIT CLASSIFICATION USING ARTIFICIAL NEURAL NETWORK","authors":"Aunhel John M. Adoptante, A. M. Baes, John Carlo A. Catilo, Patrick Kendrex L. Lucero, Anton Louise P. De Ocampo, Alvin S. Alon, Rhowel M. Dellosa","doi":"10.11113/aej.v13.18388","DOIUrl":null,"url":null,"abstract":"Audio classification has been one of the most popular applications of Artificial Neural Networks. This process is at the center of modern AI technology, such as virtual assistants, automatic speech recognition, and text-to-speech applications. There have been studies about spoken digit classification and its applications. However, to the best of the author's knowledge, very few works focusing on English spoken digit recognition that implemented ANN classification have been done. In this study, the authors utilized the Mel-Frequency Cepstral Coefficients (MFCC) features of the audio recording and Artificial Neural Network (ANN) as the classifier to recognize the spoken digit by the speaker. The Audio MNIST dataset was used as training and test data while the Free-Spoken Digit Dataset was used as additional validation data. The model showed an F-1 score of 99.56% accuracy for the test data and an F1 score of 81.92% accuracy for the validation data.","PeriodicalId":36749,"journal":{"name":"ASEAN Engineering Journal","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASEAN Engineering Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/aej.v13.18388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 1

Abstract

Audio classification has been one of the most popular applications of Artificial Neural Networks. This process is at the center of modern AI technology, such as virtual assistants, automatic speech recognition, and text-to-speech applications. There have been studies about spoken digit classification and its applications. However, to the best of the author's knowledge, very few works focusing on English spoken digit recognition that implemented ANN classification have been done. In this study, the authors utilized the Mel-Frequency Cepstral Coefficients (MFCC) features of the audio recording and Artificial Neural Network (ANN) as the classifier to recognize the spoken digit by the speaker. The Audio MNIST dataset was used as training and test data while the Free-Spoken Digit Dataset was used as additional validation data. The model showed an F-1 score of 99.56% accuracy for the test data and an F1 score of 81.92% accuracy for the validation data.
基于人工神经网络的轮辐数字分类
音频分类一直是人工神经网络最受欢迎的应用之一。这个过程是现代人工智能技术的核心,比如虚拟助手、自动语音识别和文本到语音的应用。关于语音数字分类及其应用的研究已经有很多。然而,据笔者所知,针对实现人工神经网络分类的英语口语数字识别的研究还很少。在本研究中,作者利用音频记录的Mel-Frequency Cepstral Coefficients (MFCC)特征和人工神经网络(ANN)作为分类器来识别说话人所说的数字。Audio MNIST数据集用作训练和测试数据,而Free-Spoken Digit数据集用作附加验证数据。该模型对测试数据的F-1评分准确率为99.56%,对验证数据的F1评分准确率为81.92%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ASEAN Engineering Journal
ASEAN Engineering Journal Engineering-Engineering (all)
CiteScore
0.60
自引率
0.00%
发文量
75
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信