Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing Pub Date : 2023-03-27 DOI:10.48550/arXiv.2303.15016

Chunpu Xu, Jing Li

{"title":"Borrowing Human Senses: Comment-Aware Self-Training for Social Media Multimodal Classification","authors":"Chunpu Xu, Jing Li","doi":"10.48550/arXiv.2303.15016","DOIUrl":null,"url":null,"abstract":"Social media is daily creating massive multimedia content with paired image and text, presenting the pressing need to automate the vision and language understanding for various multimodal classification tasks. Compared to the commonly researched visual-lingual data, social media posts tend to exhibit more implicit image-text relations. To better glue the cross-modal semantics therein, we capture hinting features from user comments, which are retrieved via jointly leveraging visual and lingual similarity. Afterwards, the classification tasks are explored via self-training in a teacher-student framework, motivated by the usually limited labeled data scales in existing benchmarks. Substantial experiments are conducted on four multimodal social media benchmarks for image-text relation classification, sarcasm detection, sentiment classification, and hate speech detection. The results show that our method further advances the performance of previous state-of-the-art models, which do not employ comment modeling or self-training.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"1 1","pages":"5644-5656"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.15016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Social media is daily creating massive multimedia content with paired image and text, presenting the pressing need to automate the vision and language understanding for various multimodal classification tasks. Compared to the commonly researched visual-lingual data, social media posts tend to exhibit more implicit image-text relations. To better glue the cross-modal semantics therein, we capture hinting features from user comments, which are retrieved via jointly leveraging visual and lingual similarity. Afterwards, the classification tasks are explored via self-training in a teacher-student framework, motivated by the usually limited labeled data scales in existing benchmarks. Substantial experiments are conducted on four multimodal social media benchmarks for image-text relation classification, sarcasm detection, sentiment classification, and hate speech detection. The results show that our method further advances the performance of previous state-of-the-art models, which do not employ comment modeling or self-training.

查看原文本刊更多论文

借用人类感官:社交媒体多模态分类的评论感知自我训练

社交媒体每天都在创造大量的图像和文本配对的多媒体内容，迫切需要为各种多模态分类任务实现视觉和语言理解的自动化。与通常研究的视觉语言数据相比，社交媒体帖子往往表现出更隐含的图像-文本关系。为了更好地粘合其中的跨模态语义，我们从用户评论中捕获暗示特征，这些特征通过联合利用视觉和语言相似性来检索。然后，在现有基准中通常有限的标记数据尺度的激励下，通过教师-学生框架中的自我训练来探索分类任务。在图像-文本关系分类、讽刺检测、情感分类和仇恨言论检测四个多模态社交媒体基准上进行了大量实验。结果表明，我们的方法进一步提高了以前最先进的模型的性能，这些模型不使用评论建模或自我训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

自引率

0.00%

发文量