社交网络上对抗图像净化鲁棒语义表示的自适应聚类

International Conference on Web and Social Media Pub Date : 2022-05-31 DOI:10.1609/icwsm.v16i1.19350

S. Silva, Arun Das, A. Alaeddini, Peyman Najafirad

{"title":"社交网络上对抗图像净化鲁棒语义表示的自适应聚类","authors":"S. Silva, Arun Das, A. Alaeddini, Peyman Najafirad","doi":"10.1609/icwsm.v16i1.19350","DOIUrl":null,"url":null,"abstract":"Advances in Artificial Intelligence (AI) have made it possible to automate human-level visual search and perception tasks on the massive sets of image data shared on social media on a daily basis. However, AI-based automated filters are highly susceptible to deliberate image attacks that can lead to content misclassification of cyberbulling, child sexual abuse material (CSAM), adult content, and deepfakes.\nOne of the most effective methods to defend against such disturbances is adversarial training, but this comes at the cost of generalization for unseen attacks and transferability across models. In this article, we propose a robust defense against adversarial image attacks, which is model agnostic and generalizable to unseen adversaries. We begin with a baseline model, extracting the latent representations for each class and adaptively clustering the latent representations that share a semantic similarity. Next, we obtain the distributions for these clustered latent representations along with their originating images. We then learn semantic reconstruction dictionaries (SRD). We adversarially train a new model constraining the latent space representation to minimize the distance between the adversarial latent representation and the true cluster distribution. To purify the image, we decompose the input into low and high-frequency components. The high-frequency component is reconstructed based on the best SRD from the clean dataset. In order to evaluate the best SRD, we rely on the distance between the robust latent representations and semantic cluster distributions. The output is a purified image with no perturbations. \nEvaluations using comprehensive datasets including image benchmarks and social media images demonstrate that our proposed purification approach guards and enhances the accuracy of AI-based image filters for unlawful and harmful perturbed images considerably.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Clustering of Robust Semantic Representations for Adversarial Image Purification on Social Networks\",\"authors\":\"S. Silva, Arun Das, A. Alaeddini, Peyman Najafirad\",\"doi\":\"10.1609/icwsm.v16i1.19350\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advances in Artificial Intelligence (AI) have made it possible to automate human-level visual search and perception tasks on the massive sets of image data shared on social media on a daily basis. However, AI-based automated filters are highly susceptible to deliberate image attacks that can lead to content misclassification of cyberbulling, child sexual abuse material (CSAM), adult content, and deepfakes.\\nOne of the most effective methods to defend against such disturbances is adversarial training, but this comes at the cost of generalization for unseen attacks and transferability across models. In this article, we propose a robust defense against adversarial image attacks, which is model agnostic and generalizable to unseen adversaries. We begin with a baseline model, extracting the latent representations for each class and adaptively clustering the latent representations that share a semantic similarity. Next, we obtain the distributions for these clustered latent representations along with their originating images. We then learn semantic reconstruction dictionaries (SRD). We adversarially train a new model constraining the latent space representation to minimize the distance between the adversarial latent representation and the true cluster distribution. To purify the image, we decompose the input into low and high-frequency components. The high-frequency component is reconstructed based on the best SRD from the clean dataset. In order to evaluate the best SRD, we rely on the distance between the robust latent representations and semantic cluster distributions. The output is a purified image with no perturbations. \\nEvaluations using comprehensive datasets including image benchmarks and social media images demonstrate that our proposed purification approach guards and enhances the accuracy of AI-based image filters for unlawful and harmful perturbed images considerably.\",\"PeriodicalId\":175641,\"journal\":{\"name\":\"International Conference on Web and Social Media\",\"volume\":\"96 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Web and Social Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/icwsm.v16i1.19350\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v16i1.19350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人工智能(AI)的进步使得对社交媒体上每天共享的大量图像数据进行人类水平的视觉搜索和感知任务自动化成为可能。然而，基于人工智能的自动过滤器非常容易受到故意的图像攻击，这可能导致对网络欺凌、儿童性虐待材料(CSAM)、成人内容和深度假货的内容错误分类。防御这种干扰的最有效方法之一是对抗性训练，但这是以不可见攻击的泛化和模型之间的可转移性为代价的。在本文中，我们提出了一种针对对抗性图像攻击的鲁棒防御，该防御与模型无关，并可推广到看不见的对手。我们从基线模型开始，提取每个类的潜在表示，并自适应地聚类共享语义相似性的潜在表示。接下来，我们获得这些聚类潜在表征及其原始图像的分布。然后我们学习语义重构字典(SRD)。我们对抗性训练了一个约束潜在空间表示的新模型，以最小化对抗性潜在表示与真实聚类分布之间的距离。为了对图像进行净化，我们将输入信号分解为低频和高频分量。基于干净数据集的最佳SRD重构高频分量。为了评估最佳的SRD，我们依赖于鲁棒潜在表示和语义聚类分布之间的距离。输出是没有扰动的纯化图像。使用包括图像基准和社交媒体图像在内的综合数据集进行的评估表明，我们提出的净化方法可以大大保护和提高基于人工智能的图像过滤器对非法和有害干扰图像的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Clustering of Robust Semantic Representations for Adversarial Image Purification on Social Networks

Advances in Artificial Intelligence (AI) have made it possible to automate human-level visual search and perception tasks on the massive sets of image data shared on social media on a daily basis. However, AI-based automated filters are highly susceptible to deliberate image attacks that can lead to content misclassification of cyberbulling, child sexual abuse material (CSAM), adult content, and deepfakes. One of the most effective methods to defend against such disturbances is adversarial training, but this comes at the cost of generalization for unseen attacks and transferability across models. In this article, we propose a robust defense against adversarial image attacks, which is model agnostic and generalizable to unseen adversaries. We begin with a baseline model, extracting the latent representations for each class and adaptively clustering the latent representations that share a semantic similarity. Next, we obtain the distributions for these clustered latent representations along with their originating images. We then learn semantic reconstruction dictionaries (SRD). We adversarially train a new model constraining the latent space representation to minimize the distance between the adversarial latent representation and the true cluster distribution. To purify the image, we decompose the input into low and high-frequency components. The high-frequency component is reconstructed based on the best SRD from the clean dataset. In order to evaluate the best SRD, we rely on the distance between the robust latent representations and semantic cluster distributions. The output is a purified image with no perturbations. Evaluations using comprehensive datasets including image benchmarks and social media images demonstrate that our proposed purification approach guards and enhances the accuracy of AI-based image filters for unlawful and harmful perturbed images considerably.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Web and Social Media

自引率

0.00%

发文量