Ying Guo;Bingxin Li;Kexin Zhen;Jie Liu;Gaolei Li;Qi Wang;Yong-Jin Liu
{"title":"基于跨模态匹配的一致性-异质性平衡假新闻检测","authors":"Ying Guo;Bingxin Li;Kexin Zhen;Jie Liu;Gaolei Li;Qi Wang;Yong-Jin Liu","doi":"10.1109/TAI.2025.3527921","DOIUrl":null,"url":null,"abstract":"Generating synthetic content through generative AI (GAI) presents considerable hurdles for current fake news detection methodologies. Many existing detection approaches concentrate on feature-based multimodal fusion, neglecting semantic relationships such as correlations and diversities. In this study, we introduce an innovative cross-modal matching-driven approach to reconcile semantic relevance (text–image consistency) and semantic gap (text–image heterogeneity) in multimodal fake news detection. Unlike the conventional paradigm of multimodal fusion followed by detection, our approach integrates textual modality, visual modality (images), and text embedded within images (auxiliary modality) to construct an end-to-end framework. This framework considers the relevance of contents across different modalities while simultaneously addressing the gap in structures, achieving a delicate balance between consistency and heterogeneity. Consistency is fostered by evaluating intermodality correlation via pairwise-similarity scores, while heterogeneity is addressed by employing cross-attention mechanisms to account for intermodality diversity. To achieve equilibrium between consistency and heterogeneity, we employ attention-guided enhanced modality interaction and similarity-based dynamic weight assignment to establish robust frameworks. Comparative experiments conducted on the Chinese Weibo dataset and the English Twitter dataset demonstrate the effectiveness of our approach, surpassing the state-of-the-art by 7% to 13%.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 7","pages":"1787-1796"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Consistency-Heterogenity Balanced Fake News Detection via Cross-Modal Matching\",\"authors\":\"Ying Guo;Bingxin Li;Kexin Zhen;Jie Liu;Gaolei Li;Qi Wang;Yong-Jin Liu\",\"doi\":\"10.1109/TAI.2025.3527921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generating synthetic content through generative AI (GAI) presents considerable hurdles for current fake news detection methodologies. Many existing detection approaches concentrate on feature-based multimodal fusion, neglecting semantic relationships such as correlations and diversities. In this study, we introduce an innovative cross-modal matching-driven approach to reconcile semantic relevance (text–image consistency) and semantic gap (text–image heterogeneity) in multimodal fake news detection. Unlike the conventional paradigm of multimodal fusion followed by detection, our approach integrates textual modality, visual modality (images), and text embedded within images (auxiliary modality) to construct an end-to-end framework. This framework considers the relevance of contents across different modalities while simultaneously addressing the gap in structures, achieving a delicate balance between consistency and heterogeneity. Consistency is fostered by evaluating intermodality correlation via pairwise-similarity scores, while heterogeneity is addressed by employing cross-attention mechanisms to account for intermodality diversity. To achieve equilibrium between consistency and heterogeneity, we employ attention-guided enhanced modality interaction and similarity-based dynamic weight assignment to establish robust frameworks. Comparative experiments conducted on the Chinese Weibo dataset and the English Twitter dataset demonstrate the effectiveness of our approach, surpassing the state-of-the-art by 7% to 13%.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 7\",\"pages\":\"1787-1796\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10838616/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10838616/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Consistency-Heterogenity Balanced Fake News Detection via Cross-Modal Matching
Generating synthetic content through generative AI (GAI) presents considerable hurdles for current fake news detection methodologies. Many existing detection approaches concentrate on feature-based multimodal fusion, neglecting semantic relationships such as correlations and diversities. In this study, we introduce an innovative cross-modal matching-driven approach to reconcile semantic relevance (text–image consistency) and semantic gap (text–image heterogeneity) in multimodal fake news detection. Unlike the conventional paradigm of multimodal fusion followed by detection, our approach integrates textual modality, visual modality (images), and text embedded within images (auxiliary modality) to construct an end-to-end framework. This framework considers the relevance of contents across different modalities while simultaneously addressing the gap in structures, achieving a delicate balance between consistency and heterogeneity. Consistency is fostered by evaluating intermodality correlation via pairwise-similarity scores, while heterogeneity is addressed by employing cross-attention mechanisms to account for intermodality diversity. To achieve equilibrium between consistency and heterogeneity, we employ attention-guided enhanced modality interaction and similarity-based dynamic weight assignment to establish robust frameworks. Comparative experiments conducted on the Chinese Weibo dataset and the English Twitter dataset demonstrate the effectiveness of our approach, surpassing the state-of-the-art by 7% to 13%.