SpliceMix：用于多标签图像分类的跨尺度和语义混合增强策略

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-01-28 DOI:10.1109/TMM.2025.3535387

Lei Wang;Yibing Zhan;Leilei Ma;Dapeng Tao;Liang Ding;Chen Gong

{"title":"SpliceMix：用于多标签图像分类的跨尺度和语义混合增强策略","authors":"Lei Wang;Yibing Zhan;Leilei Ma;Dapeng Tao;Liang Ding;Chen Gong","doi":"10.1109/TMM.2025.3535387","DOIUrl":null,"url":null,"abstract":"Recently, Mix-style data augmentation methods (<italic>e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, <italic>i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous multi-label image classification (MLIC) methods tend to design elaborate models, bringing expensive computation. In this article, we introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix. The “splice” in our method is two-fold: <italic>1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; <italic>2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together. Furthermore, such splice in our SpliceMixed mini-batch enables interactions between mixed images and original regular images. We also provide a simple and non-parametric extension based on consistency learning (SpliceMix-CL) to show the potential of extending our SpliceMix. Extensive experiments on various tasks demonstrate that only using SpliceMix with a baseline model (<italic>e.g., ResNet) achieves better performance than state-of-the-art methods. Moreover, the generalizability of our SpliceMix is further validated by the improvements in current MLIC methods when married with our SpliceMix.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"3251-3265"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SpliceMix: A Cross-Scale and Semantic Blending Augmentation Strategy for Multi-Label Image Classification\",\"authors\":\"Lei Wang;Yibing Zhan;Leilei Ma;Dapeng Tao;Liang Ding;Chen Gong\",\"doi\":\"10.1109/TMM.2025.3535387\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Mix-style data augmentation methods (<italic>e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, <italic>i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous multi-label image classification (MLIC) methods tend to design elaborate models, bringing expensive computation. In this article, we introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix. The “splice” in our method is two-fold: <italic>1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; <italic>2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together. Furthermore, such splice in our SpliceMixed mini-batch enables interactions between mixed images and original regular images. We also provide a simple and non-parametric extension based on consistency learning (SpliceMix-CL) to show the potential of extending our SpliceMix. Extensive experiments on various tasks demonstrate that only using SpliceMix with a baseline model (<italic>e.g., ResNet) achieves better performance than state-of-the-art methods. Moreover, the generalizability of our SpliceMix is further validated by the improvements in current MLIC methods when married with our SpliceMix.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"3251-3265\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10856374/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10856374/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

最近，混合风格的数据增强方法（例如Mixup和CutMix）在各种视觉任务中表现出了很好的性能。然而，这些方法主要是针对单标签图像设计的，忽略了单标签和多标签图像之间的巨大差异，即多标签图像涉及多个共同出现的类别和变化无常的对象尺度。另一方面，以往的多标签图像分类（MLIC）方法往往设计复杂的模型，计算量大。在本文中，我们介绍了一种简单而有效的多标签图像分类增强策略，即SpliceMix。我们的方法中的“拼接”是双重的：1)每个混合图像是以网格形式的几个下采样图像的拼接，其中参与混合的图像的语义被混合而没有对象缺陷以减轻共发生的偏差；2)我们将混合图像与原始的mini-batch进行拼接，形成一个新的SpliceMixed mini-batch，允许不同尺度的图像一起为训练做出贡献。此外，我们的SpliceMixed迷你批处理中的这种拼接使混合图像和原始常规图像之间的交互成为可能。我们还提供了一个基于一致性学习的简单非参数扩展（SpliceMix- cl），以展示扩展SpliceMix的潜力。在各种任务上进行的大量实验表明，只有将SpliceMix与基线模型（例如ResNet）一起使用才能获得比最先进的方法更好的性能。此外，当与SpliceMix结合时，现有MLIC方法的改进进一步验证了SpliceMix的泛化性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SpliceMix: A Cross-Scale and Semantic Blending Augmentation Strategy for Multi-Label Image Classification

Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous multi-label image classification (MLIC) methods tend to design elaborate models, bringing expensive computation. In this article, we introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix. The “splice” in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together. Furthermore, such splice in our SpliceMixed mini-batch enables interactions between mixed images and original regular images. We also provide a simple and non-parametric extension based on consistency learning (SpliceMix-CL) to show the potential of extending our SpliceMix. Extensive experiments on various tasks demonstrate that only using SpliceMix with a baseline model (e.g., ResNet) achieves better performance than state-of-the-art methods. Moreover, the generalizability of our SpliceMix is further validated by the improvements in current MLIC methods when married with our SpliceMix.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.