Self-Supervised Distilled Learning for Multi-modal Misinformation Identification

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2023-01-01 DOI:10.1109/WACV56688.2023.00284

Michael Mu, S. Bhattacharjee, Junsong Yuan

{"title":"Self-Supervised Distilled Learning for Multi-modal Misinformation Identification","authors":"Michael Mu, S. Bhattacharjee, Junsong Yuan","doi":"10.1109/WACV56688.2023.00284","DOIUrl":null,"url":null,"abstract":"Rapid dissemination of misinformation is a major societal problem receiving increasing attention. Unlike Deep-fake, Out-of-Context misinformation, in which the unaltered unimode contents (e.g. image, text) of a multi-modal news sample are combined in an out-of-context manner to generate deception, requires limited technical expertise to create. Therefore, it is more prevalent a means to confuse readers. Most existing approaches extract features from its uni-mode counterparts to concatenate and train a model for the misinformation classification task. In this paper, we design a self-supervised feature representation learning strategy that aims to attain the multi-task objectives: (1) task-agnostic, which evaluates the intra- and inter-mode representational consistencies for improved alignments across related models; (2) task-specific, which estimates the category-specific multi-modal knowledge to enable the classifier to derive more discriminative predictive distributions. To compensate for the dearth of annotated data representing varied types of misinformation, the proposed Self-Supervised Distilled Learner (SSDL) utilizes a Teacher network to weakly guide a Student network to mimic a similar decision pattern as the teacher. The two-phased learning of SSDL can be summarized as: initial pretraining of the Student model using a combination of contrastive self-supervised task-agnostic objective and supervised task-specific adjustment in parallel; finetuning the Student model via self-supervised knowledge distillation blended with the supervised objective of decision alignment. In addition to the consistent out-performances over the existing baselines that demonstrate the feasibility of our approach, the explainability capacity of the proposed SSDL also helps users visualize the reasoning behind a specific prediction made by the model.","PeriodicalId":270631,"journal":{"name":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV56688.2023.00284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Rapid dissemination of misinformation is a major societal problem receiving increasing attention. Unlike Deep-fake, Out-of-Context misinformation, in which the unaltered unimode contents (e.g. image, text) of a multi-modal news sample are combined in an out-of-context manner to generate deception, requires limited technical expertise to create. Therefore, it is more prevalent a means to confuse readers. Most existing approaches extract features from its uni-mode counterparts to concatenate and train a model for the misinformation classification task. In this paper, we design a self-supervised feature representation learning strategy that aims to attain the multi-task objectives: (1) task-agnostic, which evaluates the intra- and inter-mode representational consistencies for improved alignments across related models; (2) task-specific, which estimates the category-specific multi-modal knowledge to enable the classifier to derive more discriminative predictive distributions. To compensate for the dearth of annotated data representing varied types of misinformation, the proposed Self-Supervised Distilled Learner (SSDL) utilizes a Teacher network to weakly guide a Student network to mimic a similar decision pattern as the teacher. The two-phased learning of SSDL can be summarized as: initial pretraining of the Student model using a combination of contrastive self-supervised task-agnostic objective and supervised task-specific adjustment in parallel; finetuning the Student model via self-supervised knowledge distillation blended with the supervised objective of decision alignment. In addition to the consistent out-performances over the existing baselines that demonstrate the feasibility of our approach, the explainability capacity of the proposed SSDL also helps users visualize the reasoning behind a specific prediction made by the model.

查看原文本刊更多论文

多模态错误信息识别的自监督蒸馏学习

错误信息的迅速传播是一个日益受到关注的重大社会问题。不像深度假，脱离上下文的错误信息，在这种情况下，多模态新闻样本的未改变的单模内容(例如图像，文本)以脱离上下文的方式组合在一起以产生欺骗，需要有限的技术专长来创建。因此，它是更普遍的一种迷惑读者的手段。现有的方法大多是从单模特征中提取特征，并对模型进行拼接和训练，用于错误信息分类任务。在本文中，我们设计了一种旨在实现多任务目标的自监督特征表示学习策略:(1)任务不可知，它评估模式内和模式间的表征一致性，以改进相关模型之间的对齐;(2) task-specific，它估计特定类别的多模态知识，使分类器能够推导出更具判别性的预测分布。为了弥补代表不同类型错误信息的注释数据的缺乏，提出的自监督蒸馏学习者(SSDL)利用教师网络弱引导学生网络模仿与教师相似的决策模式。SSDL的两阶段学习可以概括为:学生模型的初始预训练，采用对比自我监督任务不可知目标和监督任务特定调整的并行组合;利用自监督知识精馏与决策对齐监督目标相结合的方法对学生模型进行微调。除了在现有基线上的一致表现证明了我们方法的可行性之外，所提议的SSDL的可解释性能力还帮助用户可视化模型做出的特定预测背后的推理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量