{"title":"用预并行卷积神经网络改进变压器语音识别系统","authors":"Qi Yue, Zhang Han, Jing Chu, Xiaokai Han, Peiwen Li, Xuhui Deng","doi":"10.1109/ICMA54519.2022.9855999","DOIUrl":null,"url":null,"abstract":"In recent years, both convolution neural network and Transformer neural network have high popularity in the field of deep learning. These two kinds of neural networks have their own characteristics and are widely used in the field of speech recognition. Convolution neural network is good at dealing with local feature information, and the core module of Transformer is self-attention mechanism, so it has a good control of global information. In this paper, we combine these two kinds of networks, give full play to their respective advantages, use convolution neural network to extract the feature information from the spectrogram, and then give it to the Transformer network for global processing, so as to achieve a good recognition effect. End-to-end neural network often has some problems such as slow training speed and difficulty in training. in order to solve this problem, the spectrogram is used as the input of the network to reduce the amount of information processing of the network. on the other hand, the techniques such as batch normalization, layer normalization and residual network are applied in the model to speed up the training of the model and prevent the occurrence of over-fitting phenomenon.","PeriodicalId":120073,"journal":{"name":"2022 IEEE International Conference on Mechatronics and Automation (ICMA)","volume":"07 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech recognition system of transformer improved by pre-parallel convolution Neural Network\",\"authors\":\"Qi Yue, Zhang Han, Jing Chu, Xiaokai Han, Peiwen Li, Xuhui Deng\",\"doi\":\"10.1109/ICMA54519.2022.9855999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, both convolution neural network and Transformer neural network have high popularity in the field of deep learning. These two kinds of neural networks have their own characteristics and are widely used in the field of speech recognition. Convolution neural network is good at dealing with local feature information, and the core module of Transformer is self-attention mechanism, so it has a good control of global information. In this paper, we combine these two kinds of networks, give full play to their respective advantages, use convolution neural network to extract the feature information from the spectrogram, and then give it to the Transformer network for global processing, so as to achieve a good recognition effect. End-to-end neural network often has some problems such as slow training speed and difficulty in training. in order to solve this problem, the spectrogram is used as the input of the network to reduce the amount of information processing of the network. on the other hand, the techniques such as batch normalization, layer normalization and residual network are applied in the model to speed up the training of the model and prevent the occurrence of over-fitting phenomenon.\",\"PeriodicalId\":120073,\"journal\":{\"name\":\"2022 IEEE International Conference on Mechatronics and Automation (ICMA)\",\"volume\":\"07 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Mechatronics and Automation (ICMA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMA54519.2022.9855999\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Mechatronics and Automation (ICMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMA54519.2022.9855999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech recognition system of transformer improved by pre-parallel convolution Neural Network
In recent years, both convolution neural network and Transformer neural network have high popularity in the field of deep learning. These two kinds of neural networks have their own characteristics and are widely used in the field of speech recognition. Convolution neural network is good at dealing with local feature information, and the core module of Transformer is self-attention mechanism, so it has a good control of global information. In this paper, we combine these two kinds of networks, give full play to their respective advantages, use convolution neural network to extract the feature information from the spectrogram, and then give it to the Transformer network for global processing, so as to achieve a good recognition effect. End-to-end neural network often has some problems such as slow training speed and difficulty in training. in order to solve this problem, the spectrogram is used as the input of the network to reduce the amount of information processing of the network. on the other hand, the techniques such as batch normalization, layer normalization and residual network are applied in the model to speed up the training of the model and prevent the occurrence of over-fitting phenomenon.