FedKD-CPI: Combining the federated knowledge distillation technique to accomplish synergistic compound-protein interaction prediction

IF 4.2 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Methods Pub Date : 2025-02-01 DOI:10.1016/j.ymeth.2024.12.014

Xuetao Wang , Qichang Zhao , Jianxin Wang

{"title":"FedKD-CPI: Combining the federated knowledge distillation technique to accomplish synergistic compound-protein interaction prediction","authors":"Xuetao Wang , Qichang Zhao , Jianxin Wang","doi":"10.1016/j.ymeth.2024.12.014","DOIUrl":null,"url":null,"abstract":"<div><div>Compound-protein interaction (CPI) prediction is critical in the early stages of drug discovery, narrowing the search space for CPIs and reducing the cost and time required for traditional high-throughput screening. However, CPI-related data are usually distributed across different institutions and their sharing is restricted because of data privacy and intellectual property rights. Constructing a scheme that enhances multi-institutional collaboration to improve prediction accuracy while protecting data privacy is essential. To this end, we propose FedKD-CPI, the first framework based on federated knowledge distillation, to effectively facilitate multi-party CPI collaborative prediction and ensure data privacy and security. FedKD-CPI uses knowledge distillation technology to extract the updated knowledge of all client models and train the model on the server to achieve knowledge aggregation, which can effectively utilize the knowledge contained in public and private data. We evaluate FedKD-CPI on three benchmark datasets and compare it with four baselines. The results show that FedKD-CPI is very close to centralized learning and significantly better than localized learning. Furthermore, FedKD-CPI outperforms federated learning-based baselines on independent and identically distributed data and non-independent and identically distributed data. Overall, FedKD-CPI improves the CPI prediction while ensuring data security and promoting institutions' collaboration to accelerate drug discovery.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"234 ","pages":"Pages 275-283"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202325000076","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Compound-protein interaction (CPI) prediction is critical in the early stages of drug discovery, narrowing the search space for CPIs and reducing the cost and time required for traditional high-throughput screening. However, CPI-related data are usually distributed across different institutions and their sharing is restricted because of data privacy and intellectual property rights. Constructing a scheme that enhances multi-institutional collaboration to improve prediction accuracy while protecting data privacy is essential. To this end, we propose FedKD-CPI, the first framework based on federated knowledge distillation, to effectively facilitate multi-party CPI collaborative prediction and ensure data privacy and security. FedKD-CPI uses knowledge distillation technology to extract the updated knowledge of all client models and train the model on the server to achieve knowledge aggregation, which can effectively utilize the knowledge contained in public and private data. We evaluate FedKD-CPI on three benchmark datasets and compare it with four baselines. The results show that FedKD-CPI is very close to centralized learning and significantly better than localized learning. Furthermore, FedKD-CPI outperforms federated learning-based baselines on independent and identically distributed data and non-independent and identically distributed data. Overall, FedKD-CPI improves the CPI prediction while ensuring data security and promoting institutions' collaboration to accelerate drug discovery.

查看原文本刊更多论文

FedKD-CPI：结合联邦知识蒸馏技术实现化合物-蛋白质相互作用协同预测。

化合物-蛋白质相互作用（CPI）预测在药物发现的早期阶段至关重要，它缩小了CPI的搜索空间，减少了传统高通量筛选所需的成本和时间。然而，cpi相关数据通常分布在不同的机构之间，由于数据隐私和知识产权的原因，它们的共享受到限制。构建一个方案，加强多机构协作，提高预测精度，同时保护数据隐私是必不可少的。为此，我们提出了首个基于联邦知识蒸馏的框架FedKD-CPI，有效促进多方CPI协同预测，保证数据的隐私性和安全性。FedKD-CPI采用知识蒸馏技术提取所有客户端模型的更新知识，并在服务器端对模型进行训练，实现知识聚合，可以有效地利用公共和私有数据中包含的知识。我们在三个基准数据集上评估了FedKD-CPI，并将其与四个基线进行了比较。结果表明，FedKD-CPI非常接近集中式学习，明显优于局部学习。此外，FedKD-CPI在独立和相同分布的数据以及非独立和相同分布的数据上优于基于联邦学习的基线。总体而言，FedKD-CPI在提高CPI预测的同时，确保了数据的安全性，促进了机构间的合作，加速了药物的发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Methods 生物-生化研究方法

CiteScore

9.80

自引率

2.10%

发文量

222

审稿时长

11.3 weeks

期刊介绍： Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.