Javier Huertas-Tato, Christos Koutlis, Symeon Papadopoulos, David Camacho, Ioannis Kompatsiaris
{"title":"A CLIP-based siamese approach for meme classification","authors":"Javier Huertas-Tato, Christos Koutlis, Symeon Papadopoulos, David Camacho, Ioannis Kompatsiaris","doi":"arxiv-2409.05772","DOIUrl":null,"url":null,"abstract":"Memes are an increasingly prevalent element of online discourse in social\nnetworks, especially among young audiences. They carry ideas and messages that\nrange from humorous to hateful, and are widely consumed. Their potentially high\nimpact requires adequate means of control to moderate their use in large scale.\nIn this work, we propose SimCLIP a deep learning-based architecture for\ncross-modal understanding of memes, leveraging a pre-trained CLIP encoder to\nproduce context-aware embeddings and a Siamese fusion technique to capture the\ninteractions between text and image. We perform an extensive experimentation on\nseven meme classification tasks across six datasets. We establish a new state\nof the art in Memotion7k with a 7.25% relative F1-score improvement, and\nachieve super-human performance on Harm-P with 13.73% F1-Score improvement. Our\napproach demonstrates the potential for compact meme classification models,\nenabling accurate and efficient meme monitoring. We share our code at\nhttps://github.com/jahuerta92/meme-classification-simclip","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05772","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Memes are an increasingly prevalent element of online discourse in social
networks, especially among young audiences. They carry ideas and messages that
range from humorous to hateful, and are widely consumed. Their potentially high
impact requires adequate means of control to moderate their use in large scale.
In this work, we propose SimCLIP a deep learning-based architecture for
cross-modal understanding of memes, leveraging a pre-trained CLIP encoder to
produce context-aware embeddings and a Siamese fusion technique to capture the
interactions between text and image. We perform an extensive experimentation on
seven meme classification tasks across six datasets. We establish a new state
of the art in Memotion7k with a 7.25% relative F1-score improvement, and
achieve super-human performance on Harm-P with 13.73% F1-Score improvement. Our
approach demonstrates the potential for compact meme classification models,
enabling accurate and efficient meme monitoring. We share our code at
https://github.com/jahuerta92/meme-classification-simclip