基于深度神经网络的智能系统语音命令识别

2019 IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI) Pub Date : 1900-01-01 DOI:10.1109/SAMI.2019.8782755

Artem Sokolov, A. Savchenko

{"title":"基于深度神经网络的智能系统语音命令识别","authors":"Artem Sokolov, A. Savchenko","doi":"10.1109/SAMI.2019.8782755","DOIUrl":null,"url":null,"abstract":"In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and-of-vocabulary words. In addition, we use single arc connected beginning and ending of the grammar in order to filter unknown commands. As a result, the grammar is resistant to distortions and unexpected words near or inside of command. We implemented the proposed approach using Finite State Transducers in the Kaldi framework and examined it using self-recorded noised data with various level of signal-to-noise ratio. We compared recognition accuracy and average decision-making time of our approach with the state-of-the-art continuous speech recognition engines based on language models. It was experimentally shown that our approach is characterized by up to 60% higher accuracy than conventional offline speech recognition methods based on language models. The speed of utterance recognition is 3 times higher than speed of traditional continuous speech recognition algorithms.","PeriodicalId":240256,"journal":{"name":"2019 IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Voice command recognition in intelligent systems using deep neural networks\",\"authors\":\"Artem Sokolov, A. Savchenko\",\"doi\":\"10.1109/SAMI.2019.8782755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and-of-vocabulary words. In addition, we use single arc connected beginning and ending of the grammar in order to filter unknown commands. As a result, the grammar is resistant to distortions and unexpected words near or inside of command. We implemented the proposed approach using Finite State Transducers in the Kaldi framework and examined it using self-recorded noised data with various level of signal-to-noise ratio. We compared recognition accuracy and average decision-making time of our approach with the state-of-the-art continuous speech recognition engines based on language models. It was experimentally shown that our approach is characterized by up to 60% higher accuracy than conventional offline speech recognition methods based on language models. The speed of utterance recognition is 3 times higher than speed of traditional continuous speech recognition algorithms.\",\"PeriodicalId\":240256,\"journal\":{\"name\":\"2019 IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAMI.2019.8782755\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMI.2019.8782755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文主要研究自主人机和智能机器人系统的孤立语音命令识别。我们建议为一个小的测试命令集创建一个语法模型，每个状态都有自循环，以返回噪声和词汇表单词的空白符号。此外，我们使用单弧连接语法的开始和结束，以过滤未知的命令。因此，语法可以抵抗命令附近或命令内部的扭曲和意外单词。我们在Kaldi框架中使用有限状态换能器实现了所提出的方法，并使用具有不同信噪比水平的自记录噪声数据对其进行了检查。我们将我们的方法与基于语言模型的最先进的连续语音识别引擎的识别精度和平均决策时间进行了比较。实验表明，该方法的准确率比传统的基于语言模型的离线语音识别方法高出60%。语音识别速度是传统连续语音识别算法的3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Voice command recognition in intelligent systems using deep neural networks

In this article, we focus on the isolated voice command recognition for autonomous man-machine and intelligent robotic systems. We propose to create a grammar model for a small testing command set with self-loops for each state to return blank symbols for noise and-of-vocabulary words. In addition, we use single arc connected beginning and ending of the grammar in order to filter unknown commands. As a result, the grammar is resistant to distortions and unexpected words near or inside of command. We implemented the proposed approach using Finite State Transducers in the Kaldi framework and examined it using self-recorded noised data with various level of signal-to-noise ratio. We compared recognition accuracy and average decision-making time of our approach with the state-of-the-art continuous speech recognition engines based on language models. It was experimentally shown that our approach is characterized by up to 60% higher accuracy than conventional offline speech recognition methods based on language models. The speed of utterance recognition is 3 times higher than speed of traditional continuous speech recognition algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI)

自引率

0.00%

发文量