{"title":"Distill & Contrast: A New Graph Self-Supervised Method With Approximating Nature Data Relationships","authors":"Dongxiao He;Jitao Zhao;Rui Guo;Zhiyong Feng;Cuiying Huo;Di Jin;Witold Pedrycz;Weixiong Zhang","doi":"10.1109/TKDE.2025.3554524","DOIUrl":null,"url":null,"abstract":"Contrastive Learning (CL) has emerged as a popular self-supervised representation learning paradigm that has been shown in many applications to perform similarly to traditional supervised learning methods. A key component of CL is mining the latent discriminative relationships between positive and negative samples and using them as self-supervised labels. We argue that this discriminative contrastive task is, in essence, similar to a classification task, and the “either positive or negative” hard label sampling strategies are arbitrary. To solve this problem, we explore ideas from data distillation, which considers probabilistic logit vectors as soft labels to transfer model knowledge. We attempt to abandon the classical hard sampling labels in CL and instead explore self-supervised soft labels. We adopt soft sampling labels that are extracted, without supervision, from the inherent relationships in data pairs to retain more information. We propose a new self-supervised graph learning method, Distill and Contrast (D&C), for learning representations that closely approximate natural data relationships. D&C extracts node similarities from the features and structures to derive soft sampling labels, which also eliminate noise in the data to increase robustness. Extensive experimental results on real-world datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3284-3297"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10938656/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Contrastive Learning (CL) has emerged as a popular self-supervised representation learning paradigm that has been shown in many applications to perform similarly to traditional supervised learning methods. A key component of CL is mining the latent discriminative relationships between positive and negative samples and using them as self-supervised labels. We argue that this discriminative contrastive task is, in essence, similar to a classification task, and the “either positive or negative” hard label sampling strategies are arbitrary. To solve this problem, we explore ideas from data distillation, which considers probabilistic logit vectors as soft labels to transfer model knowledge. We attempt to abandon the classical hard sampling labels in CL and instead explore self-supervised soft labels. We adopt soft sampling labels that are extracted, without supervision, from the inherent relationships in data pairs to retain more information. We propose a new self-supervised graph learning method, Distill and Contrast (D&C), for learning representations that closely approximate natural data relationships. D&C extracts node similarities from the features and structures to derive soft sampling labels, which also eliminate noise in the data to increase robustness. Extensive experimental results on real-world datasets demonstrate the effectiveness of the proposed method.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.