{"title":"基于混合音频分割和深度学习的亚语音检测与识别","authors":"Xiaolei Zhao, Chenyin Wang, Xibin Xu","doi":"10.1145/3366194.3366219","DOIUrl":null,"url":null,"abstract":"Sub-voice (crying, laughter, sigh, etc.) carries a large amount of effective information of speakers, and has a huge auxiliary role in emotion recognition, behavior recognition, physiological and psychology research. Correct detection and recognition of subvoice is the premise of research and application. The method is divided into two phases: sub-voice detection and sub-voice recognition. The high-efficiency hybrid audio segmentation algorithm based on likelihood ratio and model pre-judgment is used to realize sub-voice detection. After detecting sub-voice, we extract grayscale spectrograms, and input them into the PCANET network to automatically extract features. The SVM model is then used for identification. The experimental results show that the detection accuracy of the proposed detection method is as high as 94.2%, and the proposed recognition method is 7.7% higher than the traditional artificial statistical feature recognition method.","PeriodicalId":105852,"journal":{"name":"Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sub-voice Detection and Recognition based on Hybrid Audio Segmentation and Deep Learning\",\"authors\":\"Xiaolei Zhao, Chenyin Wang, Xibin Xu\",\"doi\":\"10.1145/3366194.3366219\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sub-voice (crying, laughter, sigh, etc.) carries a large amount of effective information of speakers, and has a huge auxiliary role in emotion recognition, behavior recognition, physiological and psychology research. Correct detection and recognition of subvoice is the premise of research and application. The method is divided into two phases: sub-voice detection and sub-voice recognition. The high-efficiency hybrid audio segmentation algorithm based on likelihood ratio and model pre-judgment is used to realize sub-voice detection. After detecting sub-voice, we extract grayscale spectrograms, and input them into the PCANET network to automatically extract features. The SVM model is then used for identification. The experimental results show that the detection accuracy of the proposed detection method is as high as 94.2%, and the proposed recognition method is 7.7% higher than the traditional artificial statistical feature recognition method.\",\"PeriodicalId\":105852,\"journal\":{\"name\":\"Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3366194.3366219\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366194.3366219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sub-voice Detection and Recognition based on Hybrid Audio Segmentation and Deep Learning
Sub-voice (crying, laughter, sigh, etc.) carries a large amount of effective information of speakers, and has a huge auxiliary role in emotion recognition, behavior recognition, physiological and psychology research. Correct detection and recognition of subvoice is the premise of research and application. The method is divided into two phases: sub-voice detection and sub-voice recognition. The high-efficiency hybrid audio segmentation algorithm based on likelihood ratio and model pre-judgment is used to realize sub-voice detection. After detecting sub-voice, we extract grayscale spectrograms, and input them into the PCANET network to automatically extract features. The SVM model is then used for identification. The experimental results show that the detection accuracy of the proposed detection method is as high as 94.2%, and the proposed recognition method is 7.7% higher than the traditional artificial statistical feature recognition method.