Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM

IJITEE (International Journal of Information Technology and Electrical Engineering) Pub Date : 2019-10-29 DOI:10.22146/IJITEE.49426

Wening Mustikarini, Risanuri Hidayat, Agus Bejo

{"title":"Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM","authors":"Wening Mustikarini, Risanuri Hidayat, Agus Bejo","doi":"10.22146/IJITEE.49426","DOIUrl":null,"url":null,"abstract":"Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words \"atas\" (up), \"bawah\" (down), \"kanan\" (right), and \"kiri\" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word \"atas\", 86% for the word \"bawah\", 81% for the word \"kanan\", and 100% for the word \"kiri\". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the \"atas\" class, 54% for the \"bawah\" class, 44% for the \"kanan\" class, and 100% for the \"kiri\" class.","PeriodicalId":292390,"journal":{"name":"IJITEE (International Journal of Information Technology and Electrical Engineering)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJITEE (International Journal of Information Technology and Electrical Engineering)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22146/IJITEE.49426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class.

查看原文本刊更多论文

基于MFCC算法和python支持向量机的实时印尼语语音识别

摘要:自动语音识别(ASR)是一种利用机器处理和识别人类声音的技术。提高识别率的一种方法是使用你想要识别的语言模型。本文介绍了一种语音识别应用程序，用于识别单词“atas”(上)、“bawah”(下)、“kanan”(右)和“kiri”(左)。本研究使用了400个语音数据样本，训练数据为每个单词75个样本，测试数据为每个单词25个样本。该语音识别系统采用多达13个Mel频率倒谱系数(MFCC)作为特征，支持向量机(SVM)作为标识符。该系统使用线性核函数和RBF、不同的成本值和三种样本量(n = 25、75、50)进行了测试。SVM采用线性核，代价值为100，每类75个样本组成数据集，得到最佳平均精度值。在训练阶段，系统对“atas”一词的准确率和召回率之间的权衡值为80%，“bawah”一词为86%，“kanan”一词为81%，“kiri”一词为100%。然而，在系统测试阶段，每个类使用25个新样本，“atas”类的f1得分为76%，“bawah”类为54%，“kanan”类为44%，“kiri”类为100%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IJITEE (International Journal of Information Technology and Electrical Engineering)

自引率

0.00%

发文量