{"title":"SIMSE:一种结合样本重要性度量和语义增强的对比学习方法","authors":"Yangyang Gao , Zhi Zheng , Wenjun Huang , Xiaomin Lin","doi":"10.1016/j.eswa.2025.128045","DOIUrl":null,"url":null,"abstract":"<div><div>In complex application scenarios, the lack of accurate labels for training samples has positioned contrastive learning a key focus in self-supervised learning. This approach effectively extracts meaningful representations from unlabeled data. A significant challenge lies in identifying high-value samples and exploring their semantic features to improve performance. Most existing self-supervised learning methods do not focus much on mining or augmenting valuable samples, relying mainly on contrastive learning between original samples. As a result, performance suffers in cases with limited or sparse data. To address this, we propose a contrastive learning method that combines sample importance measurement and semantic enhancement, overcoming the limitations of traditional methods that only use original samples. First, we generate additional valuable samples using interpolation and apply a similarity-based strategy to better distinguish positive and negative samples, refining the sample partitioning process. Second, we design a semantic enhancement mechanism to better capture and strengthen shared high-level semantic features between samples. Third, we introduce a new metric to evaluate sample value by measuring oscillations in the learning model caused by gradients, determining the importance of each sample. We also use confidence learning to identify and correct mislabeled samples. Extensive evaluations conduct on multiple benchmark datasets demonstrate that our method improves linear classification accuracy by 2.23 %, 4.3 %, 1.6 %, 3.73 %, and 4.69 % on ImageNet-100, ImageNet-10, CIFAR-10, CIFAR-100, and STL-10, respectively. Additionally, accelerates convergence speed by 1.5x and effectively detects mislabeled samples.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"286 ","pages":"Article 128045"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SIMSE: a contrastive learning method combining sample importance metric and semantic enhancement\",\"authors\":\"Yangyang Gao , Zhi Zheng , Wenjun Huang , Xiaomin Lin\",\"doi\":\"10.1016/j.eswa.2025.128045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In complex application scenarios, the lack of accurate labels for training samples has positioned contrastive learning a key focus in self-supervised learning. This approach effectively extracts meaningful representations from unlabeled data. A significant challenge lies in identifying high-value samples and exploring their semantic features to improve performance. Most existing self-supervised learning methods do not focus much on mining or augmenting valuable samples, relying mainly on contrastive learning between original samples. As a result, performance suffers in cases with limited or sparse data. To address this, we propose a contrastive learning method that combines sample importance measurement and semantic enhancement, overcoming the limitations of traditional methods that only use original samples. First, we generate additional valuable samples using interpolation and apply a similarity-based strategy to better distinguish positive and negative samples, refining the sample partitioning process. Second, we design a semantic enhancement mechanism to better capture and strengthen shared high-level semantic features between samples. Third, we introduce a new metric to evaluate sample value by measuring oscillations in the learning model caused by gradients, determining the importance of each sample. We also use confidence learning to identify and correct mislabeled samples. Extensive evaluations conduct on multiple benchmark datasets demonstrate that our method improves linear classification accuracy by 2.23 %, 4.3 %, 1.6 %, 3.73 %, and 4.69 % on ImageNet-100, ImageNet-10, CIFAR-10, CIFAR-100, and STL-10, respectively. Additionally, accelerates convergence speed by 1.5x and effectively detects mislabeled samples.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"286 \",\"pages\":\"Article 128045\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425016665\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425016665","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
SIMSE: a contrastive learning method combining sample importance metric and semantic enhancement
In complex application scenarios, the lack of accurate labels for training samples has positioned contrastive learning a key focus in self-supervised learning. This approach effectively extracts meaningful representations from unlabeled data. A significant challenge lies in identifying high-value samples and exploring their semantic features to improve performance. Most existing self-supervised learning methods do not focus much on mining or augmenting valuable samples, relying mainly on contrastive learning between original samples. As a result, performance suffers in cases with limited or sparse data. To address this, we propose a contrastive learning method that combines sample importance measurement and semantic enhancement, overcoming the limitations of traditional methods that only use original samples. First, we generate additional valuable samples using interpolation and apply a similarity-based strategy to better distinguish positive and negative samples, refining the sample partitioning process. Second, we design a semantic enhancement mechanism to better capture and strengthen shared high-level semantic features between samples. Third, we introduce a new metric to evaluate sample value by measuring oscillations in the learning model caused by gradients, determining the importance of each sample. We also use confidence learning to identify and correct mislabeled samples. Extensive evaluations conduct on multiple benchmark datasets demonstrate that our method improves linear classification accuracy by 2.23 %, 4.3 %, 1.6 %, 3.73 %, and 4.69 % on ImageNet-100, ImageNet-10, CIFAR-10, CIFAR-100, and STL-10, respectively. Additionally, accelerates convergence speed by 1.5x and effectively detects mislabeled samples.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.