Bin Deng , Huifang Ma , Ruijia Zhang , Zhixin Li , Liang Chang
{"title":"KDCS-PPI: Knowledge distillation with counterfactual sampling for Protein-Protein Interaction prediction","authors":"Bin Deng , Huifang Ma , Ruijia Zhang , Zhixin Li , Liang Chang","doi":"10.1016/j.eswa.2025.127896","DOIUrl":null,"url":null,"abstract":"<div><div>As the core of various biochemical reactions in life, Protein-Protein Interactions (PPIs) play a crucial role in maintaining the homeostasis of cellular functions, making the accurate prediction of PPIs particularly important. Traditional wet lab methods for predicting PPIs are time-consuming and costly. In contrast, PPI prediction methods utilizing Graph Neural Networks (GNNs) have exhibited promising performance and have increasingly emerged as the predominant approach in recent years. While GNNs rely on neighbor message aggregation, which can result in computational inefficiencies, Multilayer Perceptron (MLP) stand out for their time efficiency, as they do not require intricate handling of relational knowledge. However, MLPs often exhibit comparatively lower prediction accuracy. To leverage the advantages of both GNNs and MLPs in terms of effectiveness and efficiency, knowledge distillation techniques can be used to transfer the knowledge learned by GNNs to MLPs. During the knowledge distillation process, the knowledge transfer usually involves node feature embeddings rather than the interaction relationship knowledge between PPIs. Moreover, current methods frequently choose positive and negative samples for anchor nodes via random sampling, leading to suboptimal accuracy, especially for negative samples. To address this, we propose <em><strong>K</strong>nowledge <strong>D</strong>istillation with <strong>C</strong>ounterfactual <strong>S</strong>ampling for <strong>P</strong>rotein-<strong>P</strong>rotein <strong>I</strong>nteraction prediction</em> (KDCS-PPI). Our method facilitates the transfer of diverse relational knowledge between proteins during the knowledge distillation process and utilizes a counterfactual sampling strategy to select more pertinent positive and negative examples. Extensive experiments on three datasets demonstrate that KDCS-PPI can be applied to large-scale PPI prediction tasks and achieves significant improvements in both effectiveness and computational efficiency compared to other benchmark methods. Our source codes will be publicly available at <span><span>https://github.com/bin-db/KDCS-PPI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"285 ","pages":"Article 127896"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425015180","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
As the core of various biochemical reactions in life, Protein-Protein Interactions (PPIs) play a crucial role in maintaining the homeostasis of cellular functions, making the accurate prediction of PPIs particularly important. Traditional wet lab methods for predicting PPIs are time-consuming and costly. In contrast, PPI prediction methods utilizing Graph Neural Networks (GNNs) have exhibited promising performance and have increasingly emerged as the predominant approach in recent years. While GNNs rely on neighbor message aggregation, which can result in computational inefficiencies, Multilayer Perceptron (MLP) stand out for their time efficiency, as they do not require intricate handling of relational knowledge. However, MLPs often exhibit comparatively lower prediction accuracy. To leverage the advantages of both GNNs and MLPs in terms of effectiveness and efficiency, knowledge distillation techniques can be used to transfer the knowledge learned by GNNs to MLPs. During the knowledge distillation process, the knowledge transfer usually involves node feature embeddings rather than the interaction relationship knowledge between PPIs. Moreover, current methods frequently choose positive and negative samples for anchor nodes via random sampling, leading to suboptimal accuracy, especially for negative samples. To address this, we propose Knowledge Distillation with Counterfactual Sampling for Protein-Protein Interaction prediction (KDCS-PPI). Our method facilitates the transfer of diverse relational knowledge between proteins during the knowledge distillation process and utilizes a counterfactual sampling strategy to select more pertinent positive and negative examples. Extensive experiments on three datasets demonstrate that KDCS-PPI can be applied to large-scale PPI prediction tasks and achieves significant improvements in both effectiveness and computational efficiency compared to other benchmark methods. Our source codes will be publicly available at https://github.com/bin-db/KDCS-PPI.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.