基于多模态深度学习的网民情感识别研究

International Conference on Algorithms, Microchips and Network Applications Pub Date : 2024-06-08 DOI:10.1117/12.3032110

Nan Jia, Tianhao Yao

{"title":"基于多模态深度学习的网民情感识别研究","authors":"Nan Jia, Tianhao Yao","doi":"10.1117/12.3032110","DOIUrl":null,"url":null,"abstract":"With the rapid popularity of social media and the Internet, network security issues are becoming increasingly prominent. More and more people are accustomed to expressing their emotions and opinions online, and the expression of netizens’ emotions is becoming more and more diversified. Accurate analysis of netizens’ emotions is particularly important. Traditional emotion recognition methods are mainly based on text analysis, but with the diversification of network media, single text analysis has been unable to meet the actual needs. Therefore, continuously exploring the application of multimodal deep learning in netizen emotion recognition has become an inevitable choice for public security organs. This paper aims to explore the application of multimodal deep learning in netizen emotion recognition research. Therefore, this study uses multimodal datasets of text and images, and constructs BERT and VGG-16(fine-tuning) models to extract emotional features from text mode and image mode respectively. By introducing the multi-head attention mechanism, the two modes are combined to establish a fusion model, and explores how to combine them to improve classification performance. The final accuracy of text modality is 0.70, the accuracy of image modality is 0.58, and the accuracy of multimodal fusion model is 0.73, which is 0.03 and 0.15 higher than that of text modality and image modality, respectively, proving the scientific nature of multimodal fusion model. It can provide new ideas and methods for the analysis and early warning of public security organs, and also provide reference and inspiration for the research in other fields.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on netizen sentiment recognition based on multimodal deep learning\",\"authors\":\"Nan Jia, Tianhao Yao\",\"doi\":\"10.1117/12.3032110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid popularity of social media and the Internet, network security issues are becoming increasingly prominent. More and more people are accustomed to expressing their emotions and opinions online, and the expression of netizens’ emotions is becoming more and more diversified. Accurate analysis of netizens’ emotions is particularly important. Traditional emotion recognition methods are mainly based on text analysis, but with the diversification of network media, single text analysis has been unable to meet the actual needs. Therefore, continuously exploring the application of multimodal deep learning in netizen emotion recognition has become an inevitable choice for public security organs. This paper aims to explore the application of multimodal deep learning in netizen emotion recognition research. Therefore, this study uses multimodal datasets of text and images, and constructs BERT and VGG-16(fine-tuning) models to extract emotional features from text mode and image mode respectively. By introducing the multi-head attention mechanism, the two modes are combined to establish a fusion model, and explores how to combine them to improve classification performance. The final accuracy of text modality is 0.70, the accuracy of image modality is 0.58, and the accuracy of multimodal fusion model is 0.73, which is 0.03 and 0.15 higher than that of text modality and image modality, respectively, proving the scientific nature of multimodal fusion model. It can provide new ideas and methods for the analysis and early warning of public security organs, and also provide reference and inspiration for the research in other fields.\",\"PeriodicalId\":342847,\"journal\":{\"name\":\"International Conference on Algorithms, Microchips and Network Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithms, Microchips and Network Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3032110\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3032110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着社交媒体和互联网的迅速普及，网络安全问题日益突出。越来越多的人习惯在网上表达自己的情绪和观点，网民情绪的表达方式也越来越多样化。对网民情绪的准确分析显得尤为重要。传统的情感识别方法主要基于文本分析，但随着网络媒体的多样化，单一的文本分析已经不能满足实际需要。因此，不断探索多模态深度学习在网民情绪识别中的应用已成为公安机关的必然选择。本文旨在探索多模态深度学习在网民情感识别研究中的应用。因此，本研究使用文本和图像的多模态数据集，构建了BERT和VGG-16（微调）模型，分别从文本模式和图像模式中提取情感特征。通过引入多头关注机制，将两种模式结合起来建立融合模型，并探索如何将它们结合起来以提高分类性能。最终文本模式的准确率为 0.70，图像模式的准确率为 0.58，多模态融合模型的准确率为 0.73，分别比文本模式和图像模式的准确率高出 0.03 和 0.15，证明了多模态融合模型的科学性。它可以为公安机关的分析预警提供新的思路和方法，也可以为其他领域的研究提供借鉴和启发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on netizen sentiment recognition based on multimodal deep learning

With the rapid popularity of social media and the Internet, network security issues are becoming increasingly prominent. More and more people are accustomed to expressing their emotions and opinions online, and the expression of netizens’ emotions is becoming more and more diversified. Accurate analysis of netizens’ emotions is particularly important. Traditional emotion recognition methods are mainly based on text analysis, but with the diversification of network media, single text analysis has been unable to meet the actual needs. Therefore, continuously exploring the application of multimodal deep learning in netizen emotion recognition has become an inevitable choice for public security organs. This paper aims to explore the application of multimodal deep learning in netizen emotion recognition research. Therefore, this study uses multimodal datasets of text and images, and constructs BERT and VGG-16(fine-tuning) models to extract emotional features from text mode and image mode respectively. By introducing the multi-head attention mechanism, the two modes are combined to establish a fusion model, and explores how to combine them to improve classification performance. The final accuracy of text modality is 0.70, the accuracy of image modality is 0.58, and the accuracy of multimodal fusion model is 0.73, which is 0.03 and 0.15 higher than that of text modality and image modality, respectively, proving the scientific nature of multimodal fusion model. It can provide new ideas and methods for the analysis and early warning of public security organs, and also provide reference and inspiration for the research in other fields.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Algorithms, Microchips and Network Applications

自引率

0.00%

发文量