结合CNN和RNN的基于物联网的土著语言识别系统

2023 3rd International Conference on Smart Data Intelligence (ICSMDI) Pub Date : 2023-03-01 DOI:10.1109/ICSMDI57622.2023.00086

P. Cerna, Charisma S. Ututalum, R. S. Evangelista, Aldaruhz T. Darkis, Masnona Sabdani Asiri, Jehana A. Muallam-Darkis

{"title":"结合CNN和RNN的基于物联网的土著语言识别系统","authors":"P. Cerna, Charisma S. Ututalum, R. S. Evangelista, Aldaruhz T. Darkis, Masnona Sabdani Asiri, Jehana A. Muallam-Darkis","doi":"10.1109/ICSMDI57622.2023.00086","DOIUrl":null,"url":null,"abstract":"Automatic Speech Recognition (ASR) aims to establish communication between humans and computers in a more natural way. The main aim of this study is to build hardware-based automatic speech recognition for Indigenous People (IP)'s ancestral dialects, in particular for Manobo, Mandaya, and B'laan using Raspberry Pi. Jasper is an open source toolkit used for creating voice-activated, always-on applications. The researcher recording audio data from research participants, the study's participants will be located in Davao Occidental and Sarangani for B'laan, Agusan Del Sur for Manobo, and Davao Oriental for Mandaya. A functional microphone and raspberry pi boards serve as the experiment's hardware where audi o input is being fine-tuned from a raspberry pi-powered device that records audio in waveform format, which includes Mandaya, Manobo, and Malita words and phrases. The Tensorflow STFT technique will be used to analyze, generate, transform, and characterize audio signals. JiWER plugins for Similarity measures will also be used The WER output is 98.53%, an acceptable percentage for the number of datasets used","PeriodicalId":373017,"journal":{"name":"2023 3rd International Conference on Smart Data Intelligence (ICSMDI)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An IOT-based Language Recognition System for Indigenous Languages using Integrated CNN and RNN\",\"authors\":\"P. Cerna, Charisma S. Ututalum, R. S. Evangelista, Aldaruhz T. Darkis, Masnona Sabdani Asiri, Jehana A. Muallam-Darkis\",\"doi\":\"10.1109/ICSMDI57622.2023.00086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Speech Recognition (ASR) aims to establish communication between humans and computers in a more natural way. The main aim of this study is to build hardware-based automatic speech recognition for Indigenous People (IP)'s ancestral dialects, in particular for Manobo, Mandaya, and B'laan using Raspberry Pi. Jasper is an open source toolkit used for creating voice-activated, always-on applications. The researcher recording audio data from research participants, the study's participants will be located in Davao Occidental and Sarangani for B'laan, Agusan Del Sur for Manobo, and Davao Oriental for Mandaya. A functional microphone and raspberry pi boards serve as the experiment's hardware where audi o input is being fine-tuned from a raspberry pi-powered device that records audio in waveform format, which includes Mandaya, Manobo, and Malita words and phrases. The Tensorflow STFT technique will be used to analyze, generate, transform, and characterize audio signals. JiWER plugins for Similarity measures will also be used The WER output is 98.53%, an acceptable percentage for the number of datasets used\",\"PeriodicalId\":373017,\"journal\":{\"name\":\"2023 3rd International Conference on Smart Data Intelligence (ICSMDI)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 3rd International Conference on Smart Data Intelligence (ICSMDI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSMDI57622.2023.00086\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Smart Data Intelligence (ICSMDI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSMDI57622.2023.00086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自动语音识别(ASR)旨在以一种更自然的方式建立人与计算机之间的交流。本研究的主要目的是建立基于硬件的原住民(IP)祖先方言的自动语音识别，特别是使用树莓派的Manobo, Mandaya和B'laan。Jasper是一个开源工具包，用于创建语音激活的、永远在线的应用程序。研究人员记录了研究参与者的音频数据，该研究的参与者将位于B'laan的Davao Occidental和Sarangani, Manobo的Agusan Del Sur和Mandaya的Davao Oriental。一个功能麦克风和树莓派板作为实验的硬件，在这里，从树莓派供电的设备上输入的音频被微调，该设备以波形格式记录音频，包括Mandaya, Manobo和Malita单词和短语。Tensorflow STFT技术将用于分析、生成、转换和表征音频信号。我们还将使用JiWER插件进行相似性度量。WER输出为98.53%，对于所使用的数据集数量来说，这是一个可以接受的百分比

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An IOT-based Language Recognition System for Indigenous Languages using Integrated CNN and RNN

Automatic Speech Recognition (ASR) aims to establish communication between humans and computers in a more natural way. The main aim of this study is to build hardware-based automatic speech recognition for Indigenous People (IP)'s ancestral dialects, in particular for Manobo, Mandaya, and B'laan using Raspberry Pi. Jasper is an open source toolkit used for creating voice-activated, always-on applications. The researcher recording audio data from research participants, the study's participants will be located in Davao Occidental and Sarangani for B'laan, Agusan Del Sur for Manobo, and Davao Oriental for Mandaya. A functional microphone and raspberry pi boards serve as the experiment's hardware where audi o input is being fine-tuned from a raspberry pi-powered device that records audio in waveform format, which includes Mandaya, Manobo, and Malita words and phrases. The Tensorflow STFT technique will be used to analyze, generate, transform, and characterize audio signals. JiWER plugins for Similarity measures will also be used The WER output is 98.53%, an acceptable percentage for the number of datasets used

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 3rd International Conference on Smart Data Intelligence (ICSMDI)

自引率

0.00%

发文量