{"title":"Speech recognition system of transformer improved by pre-parallel convolution Neural Network","authors":"Qi Yue, Zhang Han, Jing Chu, Xiaokai Han, Peiwen Li, Xuhui Deng","doi":"10.1109/ICMA54519.2022.9855999","DOIUrl":null,"url":null,"abstract":"In recent years, both convolution neural network and Transformer neural network have high popularity in the field of deep learning. These two kinds of neural networks have their own characteristics and are widely used in the field of speech recognition. Convolution neural network is good at dealing with local feature information, and the core module of Transformer is self-attention mechanism, so it has a good control of global information. In this paper, we combine these two kinds of networks, give full play to their respective advantages, use convolution neural network to extract the feature information from the spectrogram, and then give it to the Transformer network for global processing, so as to achieve a good recognition effect. End-to-end neural network often has some problems such as slow training speed and difficulty in training. in order to solve this problem, the spectrogram is used as the input of the network to reduce the amount of information processing of the network. on the other hand, the techniques such as batch normalization, layer normalization and residual network are applied in the model to speed up the training of the model and prevent the occurrence of over-fitting phenomenon.","PeriodicalId":120073,"journal":{"name":"2022 IEEE International Conference on Mechatronics and Automation (ICMA)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Mechatronics and Automation (ICMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMA54519.2022.9855999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, both convolution neural network and Transformer neural network have high popularity in the field of deep learning. These two kinds of neural networks have their own characteristics and are widely used in the field of speech recognition. Convolution neural network is good at dealing with local feature information, and the core module of Transformer is self-attention mechanism, so it has a good control of global information. In this paper, we combine these two kinds of networks, give full play to their respective advantages, use convolution neural network to extract the feature information from the spectrogram, and then give it to the Transformer network for global processing, so as to achieve a good recognition effect. End-to-end neural network often has some problems such as slow training speed and difficulty in training. in order to solve this problem, the spectrogram is used as the input of the network to reduce the amount of information processing of the network. on the other hand, the techniques such as batch normalization, layer normalization and residual network are applied in the model to speed up the training of the model and prevent the occurrence of over-fitting phenomenon.