{"title":"Research on Voice Wake Based on Depthwise Separable Convolutional Neural Network","authors":"R. Liu, Penghao Wang, Lin Zhou, Youxi Luo","doi":"10.1109/EEI59236.2023.10212965","DOIUrl":null,"url":null,"abstract":"Aiming at the study of voice wake-up, this paper builds a 12-layer deep separable convolutional neural network- DSCNN based on deep separable convolutions. It determines whether wake words are recognized by binary classification of the feature spectrum after feature extraction. Choosing, 'HelloMia” as the wake-up word, the training set contains 7982 positive sample speeches with the label (1,0), negative sample speech 1315 with the label $(0,1)$, by introducing the batch normalization layer (BN layer), the model converges at 0.3 epochs, the accuracy rate is 0.9994 on the test set of 10,000 positive samples, and the accuracy rate is 0.9889 on the test set of 2362 negative samples. The wake-up rate is 99.94%, and the false wake-up rate is only 1.11%. Compared with ordinary convolutional models, it is found that DSCNN greatly reduces the number of parameters and memory consumption, while the convergence speed and training effect have not decreased.","PeriodicalId":363603,"journal":{"name":"2023 5th International Conference on Electronic Engineering and Informatics (EEI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Electronic Engineering and Informatics (EEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EEI59236.2023.10212965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aiming at the study of voice wake-up, this paper builds a 12-layer deep separable convolutional neural network- DSCNN based on deep separable convolutions. It determines whether wake words are recognized by binary classification of the feature spectrum after feature extraction. Choosing, 'HelloMia” as the wake-up word, the training set contains 7982 positive sample speeches with the label (1,0), negative sample speech 1315 with the label $(0,1)$, by introducing the batch normalization layer (BN layer), the model converges at 0.3 epochs, the accuracy rate is 0.9994 on the test set of 10,000 positive samples, and the accuracy rate is 0.9889 on the test set of 2362 negative samples. The wake-up rate is 99.94%, and the false wake-up rate is only 1.11%. Compared with ordinary convolutional models, it is found that DSCNN greatly reduces the number of parameters and memory consumption, while the convergence speed and training effect have not decreased.