LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification

IF 7.1 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2024-07-07 DOI:10.1186/s13321-024-00871-8

Ruifeng Zhou, Jing Fan, Sishu Li, Wenjie Zeng, Yilun Chen, Xiaoshan Zheng, Hongyang Chen, Jun Liao

{"title":"LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification","authors":"Ruifeng Zhou, Jing Fan, Sishu Li, Wenjie Zeng, Yilun Chen, Xiaoshan Zheng, Hongyang Chen, Jun Liao","doi":"10.1186/s13321-024-00871-8","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally exhibit varying biological functions, whereas those within the same structural class share similar functional attributes.</p><h3>Results</h3><p>We proposed LVPocket, a novel method that synergistically captures both local and global information of protein structure through the integration of Transformer encoders, which help the model achieve better performance in binding pockets prediction. And then we tailored prediction models for data of four distinct structural classes of proteins using the transfer learning. The four fine-tuned models were trained on the baseline LVPocket model which was trained on the sc-PDB dataset. LVPocket exhibits superior performance on three independent datasets compared to current state-of-the-art methods. Additionally, the fine-tuned model outperforms the baseline model in terms of performance.</p><h3>Scientific contribution</h3><p>We present a novel model structure for predicting protein binding pockets that provides a solution for relying on extensive convolutional computation while neglecting global information about protein structures. Furthermore, we tackle the impact of different protein folding structures on binding pocket prediction tasks through the application of transfer learning methods.</p><h3>Graphical Abstract</h3>\n<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00871-8","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00871-8","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally exhibit varying biological functions, whereas those within the same structural class share similar functional attributes.

Results

We proposed LVPocket, a novel method that synergistically captures both local and global information of protein structure through the integration of Transformer encoders, which help the model achieve better performance in binding pockets prediction. And then we tailored prediction models for data of four distinct structural classes of proteins using the transfer learning. The four fine-tuned models were trained on the baseline LVPocket model which was trained on the sc-PDB dataset. LVPocket exhibits superior performance on three independent datasets compared to current state-of-the-art methods. Additionally, the fine-tuned model outperforms the baseline model in terms of performance.

Scientific contribution

We present a novel model structure for predicting protein binding pockets that provides a solution for relying on extensive convolutional computation while neglecting global information about protein structures. Furthermore, we tackle the impact of different protein folding structures on binding pocket prediction tasks through the application of transfer learning methods.

Graphical Abstract

查看原文本刊更多论文

LVPocket：通过蛋白质结构分类的迁移学习，综合三维全局-局部信息预测蛋白质结合口袋。

背景：以往预测蛋白质结合口袋的深度学习方法主要采用三维卷积，然而大量的卷积操作可能会导致模型过度优先考虑局部信息，从而忽略全局信息。此外，我们还必须考虑到不同蛋白质折叠结构类别的影响。因为不同结构分类的蛋白质具有不同的生物功能，而同一结构分类的蛋白质则具有相似的功能属性：我们提出了 LVPocket 这种新方法，它通过整合 Transformer 编码器协同捕捉蛋白质结构的局部和全局信息，从而帮助模型在结合口袋预测中取得更好的性能。然后，我们利用迁移学习为四种不同结构类别的蛋白质数据定制了预测模型。这四个微调模型是在基线 LVPocket 模型的基础上训练的，而基线 LVPocket 模型是在 sc-PDB 数据集上训练的。与目前最先进的方法相比，LVPocket 在三个独立的数据集上表现出更优越的性能。此外，经过微调的模型在性能上也优于基线模型：我们提出了一种用于预测蛋白质结合口袋的新型模型结构，它为依赖大量卷积计算而忽略蛋白质结构的全局信息提供了一种解决方案。此外，我们还通过应用迁移学习方法解决了不同蛋白质折叠结构对结合口袋预测任务的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.