Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM

Wening Mustikarini, Risanuri Hidayat, Agus Bejo
{"title":"Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM","authors":"Wening Mustikarini, Risanuri Hidayat, Agus Bejo","doi":"10.22146/IJITEE.49426","DOIUrl":null,"url":null,"abstract":"Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words \"atas\" (up), \"bawah\" (down), \"kanan\" (right), and \"kiri\" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word \"atas\", 86% for the word \"bawah\", 81% for the word \"kanan\", and 100% for the word \"kiri\". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the \"atas\" class, 54% for the \"bawah\" class, 44% for the \"kanan\" class, and 100% for the \"kiri\" class.","PeriodicalId":292390,"journal":{"name":"IJITEE (International Journal of Information Technology and Electrical Engineering)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJITEE (International Journal of Information Technology and Electrical Engineering)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/IJITEE.49426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class.
基于MFCC算法和python支持向量机的实时印尼语语音识别
摘要:自动语音识别(ASR)是一种利用机器处理和识别人类声音的技术。提高识别率的一种方法是使用你想要识别的语言模型。本文介绍了一种语音识别应用程序,用于识别单词“atas”(上)、“bawah”(下)、“kanan”(右)和“kiri”(左)。本研究使用了400个语音数据样本,训练数据为每个单词75个样本,测试数据为每个单词25个样本。该语音识别系统采用多达13个Mel频率倒谱系数(MFCC)作为特征,支持向量机(SVM)作为标识符。该系统使用线性核函数和RBF、不同的成本值和三种样本量(n = 25、75、50)进行了测试。SVM采用线性核,代价值为100,每类75个样本组成数据集,得到最佳平均精度值。在训练阶段,系统对“atas”一词的准确率和召回率之间的权衡值为80%,“bawah”一词为86%,“kanan”一词为81%,“kiri”一词为100%。然而,在系统测试阶段,每个类使用25个新样本,“atas”类的f1得分为76%,“bawah”类为54%,“kanan”类为44%,“kiri”类为100%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信