Alim Misbullah, Laina Farsiah, Nazaruddin, Furqan Hermawan
{"title":"Voice-Zikr:基于深度学习的免提Zikr语音识别系统实现","authors":"Alim Misbullah, Laina Farsiah, Nazaruddin, Furqan Hermawan","doi":"10.1109/CyberneticsCom55287.2022.9865318","DOIUrl":null,"url":null,"abstract":"Speech recognition is a branch of pattern recog-nition that has been widely implemented in products. Some well-known products that used speech recognition systems include Google Assistant, Apple Siri, and Alexa which have high accuracy to produce output with user expectations. Recently, deep learning is one of the techniques that is often used to build models in speech recognition systems. The technique works to keep information in its hidden layers from audio frames as input features and phones as output labels respectively. Zikr is one of the Muslim worship activities that can be done at any time. Several tools and applications have been created to count the zikr words while repeatedly speaking them. In this research, the speech recognition system is implemented to create an application called voice-zikr that is used to count the zikr words spoken by Muslim people. The speech recognition model is trained using time delay neural networks with 5 hidden layers. The dataset was collected from different ages of speakers who read “Subhanallah”,” Alhamdulillah”, “Lailahaillallah”, and”Allahuakbar”. The model performance can reach 1.04 %WER on recorded audio testing and work perfectly on microphone testing,","PeriodicalId":178279,"journal":{"name":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Voice-Zikr: A Speech Recognition System Implementation for Hands-Free Zikr Based on Deep Learning\",\"authors\":\"Alim Misbullah, Laina Farsiah, Nazaruddin, Furqan Hermawan\",\"doi\":\"10.1109/CyberneticsCom55287.2022.9865318\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech recognition is a branch of pattern recog-nition that has been widely implemented in products. Some well-known products that used speech recognition systems include Google Assistant, Apple Siri, and Alexa which have high accuracy to produce output with user expectations. Recently, deep learning is one of the techniques that is often used to build models in speech recognition systems. The technique works to keep information in its hidden layers from audio frames as input features and phones as output labels respectively. Zikr is one of the Muslim worship activities that can be done at any time. Several tools and applications have been created to count the zikr words while repeatedly speaking them. In this research, the speech recognition system is implemented to create an application called voice-zikr that is used to count the zikr words spoken by Muslim people. The speech recognition model is trained using time delay neural networks with 5 hidden layers. The dataset was collected from different ages of speakers who read “Subhanallah”,” Alhamdulillah”, “Lailahaillallah”, and”Allahuakbar”. The model performance can reach 1.04 %WER on recorded audio testing and work perfectly on microphone testing,\",\"PeriodicalId\":178279,\"journal\":{\"name\":\"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CyberneticsCom55287.2022.9865318\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberneticsCom55287.2022.9865318","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
语音识别是模式识别的一个分支,已经在产品中得到了广泛的应用。一些使用语音识别系统的知名产品包括Google Assistant, Apple Siri和Alexa,它们具有很高的准确性,可以产生符合用户期望的输出。近年来,深度学习是语音识别系统中常用的模型构建技术之一。该技术的工作原理是将隐藏层中的信息分别保存为音频帧的输入特征和手机的输出标签。朝拜是穆斯林在任何时候都可以进行的礼拜活动之一。人们已经创建了一些工具和应用程序,可以在重复说zikr单词的同时统计这些单词。在本研究中,实现语音识别系统来创建一个名为voice-zikr的应用程序,用于统计穆斯林所说的zikr单词。语音识别模型采用具有5个隐藏层的时滞神经网络进行训练。数据集是从阅读“Subhanallah”、“Alhamdulillah”、“Lailahaillallah”和“Allahuakbar”的不同年龄的说话者中收集的。该模型在录制音频测试中性能达到1.04%,在麦克风测试中表现良好。
Voice-Zikr: A Speech Recognition System Implementation for Hands-Free Zikr Based on Deep Learning
Speech recognition is a branch of pattern recog-nition that has been widely implemented in products. Some well-known products that used speech recognition systems include Google Assistant, Apple Siri, and Alexa which have high accuracy to produce output with user expectations. Recently, deep learning is one of the techniques that is often used to build models in speech recognition systems. The technique works to keep information in its hidden layers from audio frames as input features and phones as output labels respectively. Zikr is one of the Muslim worship activities that can be done at any time. Several tools and applications have been created to count the zikr words while repeatedly speaking them. In this research, the speech recognition system is implemented to create an application called voice-zikr that is used to count the zikr words spoken by Muslim people. The speech recognition model is trained using time delay neural networks with 5 hidden layers. The dataset was collected from different ages of speakers who read “Subhanallah”,” Alhamdulillah”, “Lailahaillallah”, and”Allahuakbar”. The model performance can reach 1.04 %WER on recorded audio testing and work perfectly on microphone testing,