SIMSE：一种结合样本重要性度量和语义增强的对比学习方法

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-05-08 DOI:10.1016/j.eswa.2025.128045

Yangyang Gao , Zhi Zheng , Wenjun Huang , Xiaomin Lin

{"title":"SIMSE：一种结合样本重要性度量和语义增强的对比学习方法","authors":"Yangyang Gao , Zhi Zheng , Wenjun Huang , Xiaomin Lin","doi":"10.1016/j.eswa.2025.128045","DOIUrl":null,"url":null,"abstract":"<div><div>In complex application scenarios, the lack of accurate labels for training samples has positioned contrastive learning a key focus in self-supervised learning. This approach effectively extracts meaningful representations from unlabeled data. A significant challenge lies in identifying high-value samples and exploring their semantic features to improve performance. Most existing self-supervised learning methods do not focus much on mining or augmenting valuable samples, relying mainly on contrastive learning between original samples. As a result, performance suffers in cases with limited or sparse data. To address this, we propose a contrastive learning method that combines sample importance measurement and semantic enhancement, overcoming the limitations of traditional methods that only use original samples. First, we generate additional valuable samples using interpolation and apply a similarity-based strategy to better distinguish positive and negative samples, refining the sample partitioning process. Second, we design a semantic enhancement mechanism to better capture and strengthen shared high-level semantic features between samples. Third, we introduce a new metric to evaluate sample value by measuring oscillations in the learning model caused by gradients, determining the importance of each sample. We also use confidence learning to identify and correct mislabeled samples. Extensive evaluations conduct on multiple benchmark datasets demonstrate that our method improves linear classification accuracy by 2.23 %, 4.3 %, 1.6 %, 3.73 %, and 4.69 % on ImageNet-100, ImageNet-10, CIFAR-10, CIFAR-100, and STL-10, respectively. Additionally, accelerates convergence speed by 1.5x and effectively detects mislabeled samples.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"286 ","pages":"Article 128045"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SIMSE: a contrastive learning method combining sample importance metric and semantic enhancement\",\"authors\":\"Yangyang Gao , Zhi Zheng , Wenjun Huang , Xiaomin Lin\",\"doi\":\"10.1016/j.eswa.2025.128045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In complex application scenarios, the lack of accurate labels for training samples has positioned contrastive learning a key focus in self-supervised learning. This approach effectively extracts meaningful representations from unlabeled data. A significant challenge lies in identifying high-value samples and exploring their semantic features to improve performance. Most existing self-supervised learning methods do not focus much on mining or augmenting valuable samples, relying mainly on contrastive learning between original samples. As a result, performance suffers in cases with limited or sparse data. To address this, we propose a contrastive learning method that combines sample importance measurement and semantic enhancement, overcoming the limitations of traditional methods that only use original samples. First, we generate additional valuable samples using interpolation and apply a similarity-based strategy to better distinguish positive and negative samples, refining the sample partitioning process. Second, we design a semantic enhancement mechanism to better capture and strengthen shared high-level semantic features between samples. Third, we introduce a new metric to evaluate sample value by measuring oscillations in the learning model caused by gradients, determining the importance of each sample. We also use confidence learning to identify and correct mislabeled samples. Extensive evaluations conduct on multiple benchmark datasets demonstrate that our method improves linear classification accuracy by 2.23 %, 4.3 %, 1.6 %, 3.73 %, and 4.69 % on ImageNet-100, ImageNet-10, CIFAR-10, CIFAR-100, and STL-10, respectively. Additionally, accelerates convergence speed by 1.5x and effectively detects mislabeled samples.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"286 \",\"pages\":\"Article 128045\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425016665\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425016665","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在复杂的应用场景中，训练样本缺乏准确的标签使得对比学习成为自监督学习的一个重点。这种方法有效地从未标记的数据中提取有意义的表示。一个重要的挑战在于识别高价值的样本并探索它们的语义特征以提高性能。大多数现有的自监督学习方法不太关注有价值样本的挖掘或扩增，主要依赖于原始样本之间的对比学习。因此，在数据有限或稀疏的情况下，性能会受到影响。为了解决这个问题，我们提出了一种结合样本重要性测量和语义增强的对比学习方法，克服了传统方法仅使用原始样本的局限性。首先，我们使用插值生成额外的有价值的样本，并应用基于相似性的策略来更好地区分正样本和负样本，改进样本划分过程。其次，我们设计了一种语义增强机制，以更好地捕获和增强样本之间共享的高级语义特征。第三，我们引入了一个新的度量来评估样本值，通过测量由梯度引起的学习模型中的振荡，确定每个样本的重要性。我们还使用信心学习来识别和纠正错误标记的样本。在多个基准数据集上进行的广泛评估表明，我们的方法在ImageNet-100、ImageNet-10、CIFAR-10、CIFAR-100和STL-10上的线性分类准确率分别提高了2.23%、4.3%、1.6%、3.73%和4.69%。将收敛速度提高1.5倍，有效检测错标样本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SIMSE: a contrastive learning method combining sample importance metric and semantic enhancement

In complex application scenarios, the lack of accurate labels for training samples has positioned contrastive learning a key focus in self-supervised learning. This approach effectively extracts meaningful representations from unlabeled data. A significant challenge lies in identifying high-value samples and exploring their semantic features to improve performance. Most existing self-supervised learning methods do not focus much on mining or augmenting valuable samples, relying mainly on contrastive learning between original samples. As a result, performance suffers in cases with limited or sparse data. To address this, we propose a contrastive learning method that combines sample importance measurement and semantic enhancement, overcoming the limitations of traditional methods that only use original samples. First, we generate additional valuable samples using interpolation and apply a similarity-based strategy to better distinguish positive and negative samples, refining the sample partitioning process. Second, we design a semantic enhancement mechanism to better capture and strengthen shared high-level semantic features between samples. Third, we introduce a new metric to evaluate sample value by measuring oscillations in the learning model caused by gradients, determining the importance of each sample. We also use confidence learning to identify and correct mislabeled samples. Extensive evaluations conduct on multiple benchmark datasets demonstrate that our method improves linear classification accuracy by 2.23 %, 4.3 %, 1.6 %, 3.73 %, and 4.69 % on ImageNet-100, ImageNet-10, CIFAR-10, CIFAR-100, and STL-10, respectively. Additionally, accelerates convergence speed by 1.5x and effectively detects mislabeled samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.