{"title":"在真实世界音频上构建可用于生产的关键词检测系统","authors":"Eugene Zhmakin, Grach Mkrtchian","doi":"10.3103/S0146411624700561","DOIUrl":null,"url":null,"abstract":"<p>This paper deals with the problem of creating a keyword spotting (KWS) system with real-world audio data. The paper describes the different methods used to build KWS systems, deep learning models such as convolutional neural networks (CNN), transformers, etc. The paper also discusses the mainstream dataset for training and testing KWS models, Google Speech Commands. We conduct experiments on Google Speech Commands dataset and propose our method of creating a KWS dataset and that helps neural networks achieve better results in training on relatively small amounts of data. We also introduce an idea of a hybrid KWS inference system architecture that uses voice detection and light-weight speech recognition framework in attempt to boost its computational performance and accuracy. We conclude by noting that KWS is an important challenge in the field of speech recognition, and suggest that their method can be used to improve the performance of KWS systems in the circumstances of low amounts of training data. We also note that future research could focus on bettering the process of evaluating the models and improving the overall performance of KWS systems.</p>","PeriodicalId":46238,"journal":{"name":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","volume":"58 4","pages":"454 - 458"},"PeriodicalIF":0.6000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building a Production-Ready Keyword Detection System on a Real-World Audio\",\"authors\":\"Eugene Zhmakin, Grach Mkrtchian\",\"doi\":\"10.3103/S0146411624700561\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper deals with the problem of creating a keyword spotting (KWS) system with real-world audio data. The paper describes the different methods used to build KWS systems, deep learning models such as convolutional neural networks (CNN), transformers, etc. The paper also discusses the mainstream dataset for training and testing KWS models, Google Speech Commands. We conduct experiments on Google Speech Commands dataset and propose our method of creating a KWS dataset and that helps neural networks achieve better results in training on relatively small amounts of data. We also introduce an idea of a hybrid KWS inference system architecture that uses voice detection and light-weight speech recognition framework in attempt to boost its computational performance and accuracy. We conclude by noting that KWS is an important challenge in the field of speech recognition, and suggest that their method can be used to improve the performance of KWS systems in the circumstances of low amounts of training data. We also note that future research could focus on bettering the process of evaluating the models and improving the overall performance of KWS systems.</p>\",\"PeriodicalId\":46238,\"journal\":{\"name\":\"AUTOMATIC CONTROL AND COMPUTER SCIENCES\",\"volume\":\"58 4\",\"pages\":\"454 - 458\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC CONTROL AND COMPUTER SCIENCES\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0146411624700561\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC CONTROL AND COMPUTER SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0146411624700561","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Building a Production-Ready Keyword Detection System on a Real-World Audio
This paper deals with the problem of creating a keyword spotting (KWS) system with real-world audio data. The paper describes the different methods used to build KWS systems, deep learning models such as convolutional neural networks (CNN), transformers, etc. The paper also discusses the mainstream dataset for training and testing KWS models, Google Speech Commands. We conduct experiments on Google Speech Commands dataset and propose our method of creating a KWS dataset and that helps neural networks achieve better results in training on relatively small amounts of data. We also introduce an idea of a hybrid KWS inference system architecture that uses voice detection and light-weight speech recognition framework in attempt to boost its computational performance and accuracy. We conclude by noting that KWS is an important challenge in the field of speech recognition, and suggest that their method can be used to improve the performance of KWS systems in the circumstances of low amounts of training data. We also note that future research could focus on bettering the process of evaluating the models and improving the overall performance of KWS systems.
期刊介绍:
Automatic Control and Computer Sciences is a peer reviewed journal that publishes articles on• Control systems, cyber-physical system, real-time systems, robotics, smart sensors, embedded intelligence • Network information technologies, information security, statistical methods of data processing, distributed artificial intelligence, complex systems modeling, knowledge representation, processing and management • Signal and image processing, machine learning, machine perception, computer vision