{"title":"Enhancing video rumor detection through multimodal deep feature fusion with time-sync comments","authors":"Ming Yin , Wei Chen , Dan Zhu , Jijiao Jiang","doi":"10.1016/j.ipm.2024.103935","DOIUrl":null,"url":null,"abstract":"<div><div>Rumors in videos have a stronger propagation compared to traditional text or image rumors. Most current studies on video rumor detection often rely on combining user and video modal information while neglecting the internal multimodal aspects of the video and the relationship between user comments and local segment of the video. To address this problem, we propose a method called Time-Sync Comment Enhanced Multimodal Deep Feature Fusion Model (TSC-MDFFM). It introduces time-sync comments to enhance the propagation structure of videos on social networks, supplementing missing contextual or additional information in videos. Time-sync comments focus on expressing users' views on specific points in time in the video, which helps to obtain more valuable segments from videos with high density information. The time interval from one keyframe to the next in a video is defined as a local segment. We thoroughly described this segment using time-sync comments, video keyframes, and video subtitle texts. The local segment sequences are ordered based on the video timeline and assigned time information, then fused to create the local feature representation of the video. Subsequently, we fused the text features, video motion features, and visual features of video comments at the feature level to represent the global features of the video. This feature not only captures the overall propagation trend of video content, but also provides a deep understanding of the overall features of the video. Finally, we will integrate local and global features for video rumor classification, to combine the local and global information of the video. We created a dataset called TSC-VRD, which includes time-sync comments and encompasses all visible information in videos. Extensive experimental results have shown superior performance of our proposed model compared to existing methods on the TSC-VRD dataset.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 1","pages":"Article 103935"},"PeriodicalIF":7.4000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324002942","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Rumors in videos have a stronger propagation compared to traditional text or image rumors. Most current studies on video rumor detection often rely on combining user and video modal information while neglecting the internal multimodal aspects of the video and the relationship between user comments and local segment of the video. To address this problem, we propose a method called Time-Sync Comment Enhanced Multimodal Deep Feature Fusion Model (TSC-MDFFM). It introduces time-sync comments to enhance the propagation structure of videos on social networks, supplementing missing contextual or additional information in videos. Time-sync comments focus on expressing users' views on specific points in time in the video, which helps to obtain more valuable segments from videos with high density information. The time interval from one keyframe to the next in a video is defined as a local segment. We thoroughly described this segment using time-sync comments, video keyframes, and video subtitle texts. The local segment sequences are ordered based on the video timeline and assigned time information, then fused to create the local feature representation of the video. Subsequently, we fused the text features, video motion features, and visual features of video comments at the feature level to represent the global features of the video. This feature not only captures the overall propagation trend of video content, but also provides a deep understanding of the overall features of the video. Finally, we will integrate local and global features for video rumor classification, to combine the local and global information of the video. We created a dataset called TSC-VRD, which includes time-sync comments and encompasses all visible information in videos. Extensive experimental results have shown superior performance of our proposed model compared to existing methods on the TSC-VRD dataset.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.