Edoardo Bucheli-Susarrey, Miguel González-Mendoza, Oscar Herrera-Alcántara
{"title":"使用紧凑的深度学习模型检测语音命令","authors":"Edoardo Bucheli-Susarrey, Miguel González-Mendoza, Oscar Herrera-Alcántara","doi":"10.13053/rcs-148-7-26","DOIUrl":null,"url":null,"abstract":"The Keyword Detection problem consists in localizing a small vocabulary of words embedded in some stream of audio. Keyword Detection constantly runs in the background of many mobile devices and thus it becomes a requirement to create models with a small memory footprint and low computational power. Using the Simple Speech Commands Detection data set, we present a comparative study using two types of layers. Hand-Engineered layers are created from audio feature extraction models based on the Fourier Transform and Mel Filterbanks. Learned layers belong to the Deep Learning literature and include dense layers, recurrent layers and convolutional layers. Using the Deep Learning Pipeline, we organize these layers to solve the problem.","PeriodicalId":220522,"journal":{"name":"Res. Comput. Sci.","volume":"424 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detección de comandos de voz con modelos compactos de aprendizaje profundo\",\"authors\":\"Edoardo Bucheli-Susarrey, Miguel González-Mendoza, Oscar Herrera-Alcántara\",\"doi\":\"10.13053/rcs-148-7-26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Keyword Detection problem consists in localizing a small vocabulary of words embedded in some stream of audio. Keyword Detection constantly runs in the background of many mobile devices and thus it becomes a requirement to create models with a small memory footprint and low computational power. Using the Simple Speech Commands Detection data set, we present a comparative study using two types of layers. Hand-Engineered layers are created from audio feature extraction models based on the Fourier Transform and Mel Filterbanks. Learned layers belong to the Deep Learning literature and include dense layers, recurrent layers and convolutional layers. Using the Deep Learning Pipeline, we organize these layers to solve the problem.\",\"PeriodicalId\":220522,\"journal\":{\"name\":\"Res. Comput. Sci.\",\"volume\":\"424 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Res. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.13053/rcs-148-7-26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Res. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13053/rcs-148-7-26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detección de comandos de voz con modelos compactos de aprendizaje profundo
The Keyword Detection problem consists in localizing a small vocabulary of words embedded in some stream of audio. Keyword Detection constantly runs in the background of many mobile devices and thus it becomes a requirement to create models with a small memory footprint and low computational power. Using the Simple Speech Commands Detection data set, we present a comparative study using two types of layers. Hand-Engineered layers are created from audio feature extraction models based on the Fourier Transform and Mel Filterbanks. Learned layers belong to the Deep Learning literature and include dense layers, recurrent layers and convolutional layers. Using the Deep Learning Pipeline, we organize these layers to solve the problem.