{"title":"基于多模态深度学习的网民情感识别研究","authors":"Nan Jia, Tianhao Yao","doi":"10.1117/12.3032110","DOIUrl":null,"url":null,"abstract":"With the rapid popularity of social media and the Internet, network security issues are becoming increasingly prominent. More and more people are accustomed to expressing their emotions and opinions online, and the expression of netizens’ emotions is becoming more and more diversified. Accurate analysis of netizens’ emotions is particularly important. Traditional emotion recognition methods are mainly based on text analysis, but with the diversification of network media, single text analysis has been unable to meet the actual needs. Therefore, continuously exploring the application of multimodal deep learning in netizen emotion recognition has become an inevitable choice for public security organs. This paper aims to explore the application of multimodal deep learning in netizen emotion recognition research. Therefore, this study uses multimodal datasets of text and images, and constructs BERT and VGG-16(fine-tuning) models to extract emotional features from text mode and image mode respectively. By introducing the multi-head attention mechanism, the two modes are combined to establish a fusion model, and explores how to combine them to improve classification performance. The final accuracy of text modality is 0.70, the accuracy of image modality is 0.58, and the accuracy of multimodal fusion model is 0.73, which is 0.03 and 0.15 higher than that of text modality and image modality, respectively, proving the scientific nature of multimodal fusion model. It can provide new ideas and methods for the analysis and early warning of public security organs, and also provide reference and inspiration for the research in other fields.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on netizen sentiment recognition based on multimodal deep learning\",\"authors\":\"Nan Jia, Tianhao Yao\",\"doi\":\"10.1117/12.3032110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid popularity of social media and the Internet, network security issues are becoming increasingly prominent. More and more people are accustomed to expressing their emotions and opinions online, and the expression of netizens’ emotions is becoming more and more diversified. Accurate analysis of netizens’ emotions is particularly important. Traditional emotion recognition methods are mainly based on text analysis, but with the diversification of network media, single text analysis has been unable to meet the actual needs. Therefore, continuously exploring the application of multimodal deep learning in netizen emotion recognition has become an inevitable choice for public security organs. This paper aims to explore the application of multimodal deep learning in netizen emotion recognition research. Therefore, this study uses multimodal datasets of text and images, and constructs BERT and VGG-16(fine-tuning) models to extract emotional features from text mode and image mode respectively. By introducing the multi-head attention mechanism, the two modes are combined to establish a fusion model, and explores how to combine them to improve classification performance. The final accuracy of text modality is 0.70, the accuracy of image modality is 0.58, and the accuracy of multimodal fusion model is 0.73, which is 0.03 and 0.15 higher than that of text modality and image modality, respectively, proving the scientific nature of multimodal fusion model. It can provide new ideas and methods for the analysis and early warning of public security organs, and also provide reference and inspiration for the research in other fields.\",\"PeriodicalId\":342847,\"journal\":{\"name\":\"International Conference on Algorithms, Microchips and Network Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithms, Microchips and Network Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3032110\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on netizen sentiment recognition based on multimodal deep learning
With the rapid popularity of social media and the Internet, network security issues are becoming increasingly prominent. More and more people are accustomed to expressing their emotions and opinions online, and the expression of netizens’ emotions is becoming more and more diversified. Accurate analysis of netizens’ emotions is particularly important. Traditional emotion recognition methods are mainly based on text analysis, but with the diversification of network media, single text analysis has been unable to meet the actual needs. Therefore, continuously exploring the application of multimodal deep learning in netizen emotion recognition has become an inevitable choice for public security organs. This paper aims to explore the application of multimodal deep learning in netizen emotion recognition research. Therefore, this study uses multimodal datasets of text and images, and constructs BERT and VGG-16(fine-tuning) models to extract emotional features from text mode and image mode respectively. By introducing the multi-head attention mechanism, the two modes are combined to establish a fusion model, and explores how to combine them to improve classification performance. The final accuracy of text modality is 0.70, the accuracy of image modality is 0.58, and the accuracy of multimodal fusion model is 0.73, which is 0.03 and 0.15 higher than that of text modality and image modality, respectively, proving the scientific nature of multimodal fusion model. It can provide new ideas and methods for the analysis and early warning of public security organs, and also provide reference and inspiration for the research in other fields.