基于混合注意和CLIP模型的多模态融合敏感信息分类

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shuaina Huang, Zhiyong Zhang, Bin Song, Yueheng Mao
{"title":"基于混合注意和CLIP模型的多模态融合敏感信息分类","authors":"Shuaina Huang, Zhiyong Zhang, Bin Song, Yueheng Mao","doi":"10.3233/jifs-233508","DOIUrl":null,"url":null,"abstract":"Social network attackers leverage images and text to disseminate sensitive information associated with pornography, politics, and terrorism,causing adverse effects on society.The current sensitive information classification model does not focus on feature fusion between images and text, greatly reducing recognition accuracy.To address this problem, we propose an attentive cross-modal fusion model (ACMF), which utilizes mixed attention mechanism and the Contrastive Language-Image Pre-training model.Specifically, we employ a deep neural network with a mixed attention mechanism as a visual feature extractor. This allows us to progressively extract features at different levels. We combine these visual features with those obtained from a text feature extractor and incorporate image-text frequency domain information at various levels to enable fine-grained modeling. Additionally, we introduce a cyclic attention mechanism and integrate the Contrastive Language-Image Pre-training model to establish stronger connections between modalities, thereby enhancing classification performance.Experimental evaluations conducted on sensitive information datasets collected demonstrate the superiority of our method over other baseline models. The model achieves an accuracy rate of 91.4% and an F1-score of 0.9145. These results validate the effectiveness of the mixed attention mechanism in enhancing the utilization of important features. Furthermore, the effective fusion of text and image features significantly improves the classification ability of the deep neural network.","PeriodicalId":54795,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":"22 1","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal fusion sensitive information classification based on mixed attention and CLIP model1\",\"authors\":\"Shuaina Huang, Zhiyong Zhang, Bin Song, Yueheng Mao\",\"doi\":\"10.3233/jifs-233508\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social network attackers leverage images and text to disseminate sensitive information associated with pornography, politics, and terrorism,causing adverse effects on society.The current sensitive information classification model does not focus on feature fusion between images and text, greatly reducing recognition accuracy.To address this problem, we propose an attentive cross-modal fusion model (ACMF), which utilizes mixed attention mechanism and the Contrastive Language-Image Pre-training model.Specifically, we employ a deep neural network with a mixed attention mechanism as a visual feature extractor. This allows us to progressively extract features at different levels. We combine these visual features with those obtained from a text feature extractor and incorporate image-text frequency domain information at various levels to enable fine-grained modeling. Additionally, we introduce a cyclic attention mechanism and integrate the Contrastive Language-Image Pre-training model to establish stronger connections between modalities, thereby enhancing classification performance.Experimental evaluations conducted on sensitive information datasets collected demonstrate the superiority of our method over other baseline models. The model achieves an accuracy rate of 91.4% and an F1-score of 0.9145. These results validate the effectiveness of the mixed attention mechanism in enhancing the utilization of important features. Furthermore, the effective fusion of text and image features significantly improves the classification ability of the deep neural network.\",\"PeriodicalId\":54795,\"journal\":{\"name\":\"Journal of Intelligent & Fuzzy Systems\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent & Fuzzy Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/jifs-233508\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-233508","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

社交网络攻击者利用图像和文本传播与色情、政治和恐怖主义有关的敏感信息,对社会造成不良影响。目前的敏感信息分类模型不注重图像和文本之间的特征融合,大大降低了识别精度。为了解决这一问题,我们提出了一个注意跨模态融合模型(ACMF),该模型利用混合注意机制和对比语言-图像预训练模型。具体来说,我们采用了一个具有混合注意机制的深度神经网络作为视觉特征提取器。这允许我们逐步提取不同层次的特征。我们将这些视觉特征与从文本特征提取器获得的特征结合起来,并在不同级别合并图像-文本频域信息,以实现细粒度建模。此外,我们引入了循环注意机制,并整合了对比语言-图像预训练模型,以建立更强的模态之间的联系,从而提高分类性能。对收集的敏感信息数据集进行的实验评估表明,我们的方法优于其他基线模型。模型的准确率为91.4%,f1得分为0.9145。这些结果验证了混合注意机制在提高重要特征利用率方面的有效性。此外,文本和图像特征的有效融合显著提高了深度神经网络的分类能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multimodal fusion sensitive information classification based on mixed attention and CLIP model1
Social network attackers leverage images and text to disseminate sensitive information associated with pornography, politics, and terrorism,causing adverse effects on society.The current sensitive information classification model does not focus on feature fusion between images and text, greatly reducing recognition accuracy.To address this problem, we propose an attentive cross-modal fusion model (ACMF), which utilizes mixed attention mechanism and the Contrastive Language-Image Pre-training model.Specifically, we employ a deep neural network with a mixed attention mechanism as a visual feature extractor. This allows us to progressively extract features at different levels. We combine these visual features with those obtained from a text feature extractor and incorporate image-text frequency domain information at various levels to enable fine-grained modeling. Additionally, we introduce a cyclic attention mechanism and integrate the Contrastive Language-Image Pre-training model to establish stronger connections between modalities, thereby enhancing classification performance.Experimental evaluations conducted on sensitive information datasets collected demonstrate the superiority of our method over other baseline models. The model achieves an accuracy rate of 91.4% and an F1-score of 0.9145. These results validate the effectiveness of the mixed attention mechanism in enhancing the utilization of important features. Furthermore, the effective fusion of text and image features significantly improves the classification ability of the deep neural network.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Intelligent & Fuzzy Systems
Journal of Intelligent & Fuzzy Systems 工程技术-计算机:人工智能
CiteScore
3.40
自引率
10.00%
发文量
965
审稿时长
5.1 months
期刊介绍: The purpose of the Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology is to foster advancements of knowledge and help disseminate results concerning recent applications and case studies in the areas of fuzzy logic, intelligent systems, and web-based applications among working professionals and professionals in education and research, covering a broad cross-section of technical disciplines.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信