MinoritySalMix and adaptive semantic weight compensation for long-tailed classification

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2024-10-25 DOI:10.1016/j.imavis.2024.105307

Wu Zeng, Zheng-ying Xiao

{"title":"MinoritySalMix and adaptive semantic weight compensation for long-tailed classification","authors":"Wu Zeng, Zheng-ying Xiao","doi":"10.1016/j.imavis.2024.105307","DOIUrl":null,"url":null,"abstract":"<div><div>In real-world datasets, the widespread presence of a long-tailed distribution often leads models to become overly biased towards majority class samples while ignoring minority class samples. We propose a strategy called MASW (MinoritySalMix and adaptive semantic weight compensation) to improve this problem. First, we propose a data augmentation method called MinoritySalMix (minority-saliency-mixing), which uses significance detection techniques to select significant regions from minority class samples as cropping regions and paste them into the same regions of majority class samples to generate brand new samples, thereby amplifying images containing important regions of minority class samples. Second, in order to make the label value information of the newly generated samples more consistent with the image content of the newly generated samples, we propose an adaptive semantic compensation factor. This factor provides more label value compensation for minority samples based on the different cropping areas, thereby making the new label values closer to the content of the newly generated samples. Improve model performance by generating more accurate new label value information. Finally, considering that some current re-sampling strategies generally lack flexibility in handling class sampling weight allocation and frequently require manual adjustment. We designed an adaptive weight function and incorporated it into the re-sampling strategy to achieve better sampling. The experimental results on three long-tailed datasets show that our method can effectively improve the performance of the model and is superior to most advanced long-tailed methods. Furthermore, we extended MinoritySalMix’s strategy to three balanced datasets for experimentation, and the results indicated that our method surpassed several advanced data augmentation techniques.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105307"},"PeriodicalIF":4.2000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004128","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In real-world datasets, the widespread presence of a long-tailed distribution often leads models to become overly biased towards majority class samples while ignoring minority class samples. We propose a strategy called MASW (MinoritySalMix and adaptive semantic weight compensation) to improve this problem. First, we propose a data augmentation method called MinoritySalMix (minority-saliency-mixing), which uses significance detection techniques to select significant regions from minority class samples as cropping regions and paste them into the same regions of majority class samples to generate brand new samples, thereby amplifying images containing important regions of minority class samples. Second, in order to make the label value information of the newly generated samples more consistent with the image content of the newly generated samples, we propose an adaptive semantic compensation factor. This factor provides more label value compensation for minority samples based on the different cropping areas, thereby making the new label values closer to the content of the newly generated samples. Improve model performance by generating more accurate new label value information. Finally, considering that some current re-sampling strategies generally lack flexibility in handling class sampling weight allocation and frequently require manual adjustment. We designed an adaptive weight function and incorporated it into the re-sampling strategy to achieve better sampling. The experimental results on three long-tailed datasets show that our method can effectively improve the performance of the model and is superior to most advanced long-tailed methods. Furthermore, we extended MinoritySalMix’s strategy to three balanced datasets for experimentation, and the results indicated that our method surpassed several advanced data augmentation techniques.

查看原文本刊更多论文

用于长尾分类的 MinoritySalMix 和自适应语义权重补偿

在现实世界的数据集中，长尾分布的广泛存在往往会导致模型过度偏向多数类样本，而忽略少数类样本。我们提出了一种名为 MASW（MinoritySalMix 和自适应语义权重补偿）的策略来改善这一问题。首先，我们提出了一种名为 MinoritySalMix（少数-稀释-混合）的数据扩增方法，该方法利用显著性检测技术从少数类样本中选取重要区域作为裁剪区域，并将其粘贴到多数类样本的相同区域中生成全新样本，从而放大包含少数类样本重要区域的图像。其次，为了使新生成样本的标签值信息与新生成样本的图像内容更加一致，我们提出了自适应语义补偿因子。该因子根据不同的裁剪区域为少数族群样本提供更多的标签值补偿，从而使新标签值更接近新生成样本的内容。通过生成更准确的新标签值信息来提高模型性能。最后，考虑到目前的一些再采样策略在处理类采样权重分配时普遍缺乏灵活性，经常需要人工调整。我们设计了一种自适应权重函数，并将其纳入再采样策略，以实现更好的采样效果。在三个长尾数据集上的实验结果表明，我们的方法能有效提高模型的性能，并优于大多数先进的长尾方法。此外，我们还将 MinoritySalMix 的策略扩展到三个平衡数据集上进行实验，结果表明我们的方法超越了几种先进的数据增强技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.