PatchProt：利用蛋白质基础模型预测疏水斑块。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2024-10-14 eCollection Date: 2024-01-01 DOI:10.1093/bioadv/vbae154

Dea Gogishvili, Emmanuel Minois-Genin, Jan van Eck, Sanne Abeln

{"title":"PatchProt：利用蛋白质基础模型预测疏水斑块。","authors":"Dea Gogishvili, Emmanuel Minois-Genin, Jan van Eck, Sanne Abeln","doi":"10.1093/bioadv/vbae154","DOIUrl":null,"url":null,"abstract":"Motivation: Hydrophobic patches on protein surfaces play important functional roles in protein-protein and protein-ligand interactions. Large hydrophobic surfaces are also involved in the progression of aggregation diseases. Predicting exposed hydrophobic patches from a protein sequence has shown to be a difficult task. Fine-tuning foundation models allows for adapting a model to the specific nuances of a new task using a much smaller dataset. Additionally, multitask deep learning offers a promising solution for addressing data gaps, simultaneously outperforming single-task methods.Results: In this study, we harnessed a recently released leading large language model Evolutionary Scale Models (ESM-2). Efficient fine-tuning of ESM-2 was achieved by leveraging a recently developed parameter-efficient fine-tuning method. This approach enabled comprehensive training of model layers without excessive parameters and without the need to include a computationally expensive multiple sequence analysis. We explored several related tasks, at local (residue) and global (protein) levels, to improve the representation of the model. As a result, our model, PatchProt, cannot only predict hydrophobic patch areas but also outperforms existing methods at predicting primary tasks, including secondary structure and surface accessibility predictions. Importantly, our analysis shows that including related local tasks can improve predictions on more difficult global tasks. This research sets a new standard for sequence-based protein property prediction and highlights the remarkable potential of fine-tuning foundation models enriching the model representation by training over related tasks.Availability and implementation: https://github.com/Deagogishvili/chapter-multi-task.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae154"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525051/pdf/","citationCount":"0","resultStr":"{\"title\":\"PatchProt: hydrophobic patch prediction using protein foundation models.\",\"authors\":\"Dea Gogishvili, Emmanuel Minois-Genin, Jan van Eck, Sanne Abeln\",\"doi\":\"10.1093/bioadv/vbae154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Hydrophobic patches on protein surfaces play important functional roles in protein-protein and protein-ligand interactions. Large hydrophobic surfaces are also involved in the progression of aggregation diseases. Predicting exposed hydrophobic patches from a protein sequence has shown to be a difficult task. Fine-tuning foundation models allows for adapting a model to the specific nuances of a new task using a much smaller dataset. Additionally, multitask deep learning offers a promising solution for addressing data gaps, simultaneously outperforming single-task methods.Results: In this study, we harnessed a recently released leading large language model Evolutionary Scale Models (ESM-2). Efficient fine-tuning of ESM-2 was achieved by leveraging a recently developed parameter-efficient fine-tuning method. This approach enabled comprehensive training of model layers without excessive parameters and without the need to include a computationally expensive multiple sequence analysis. We explored several related tasks, at local (residue) and global (protein) levels, to improve the representation of the model. As a result, our model, PatchProt, cannot only predict hydrophobic patch areas but also outperforms existing methods at predicting primary tasks, including secondary structure and surface accessibility predictions. Importantly, our analysis shows that including related local tasks can improve predictions on more difficult global tasks. This research sets a new standard for sequence-based protein property prediction and highlights the remarkable potential of fine-tuning foundation models enriching the model representation by training over related tasks.Availability and implementation: https://github.com/Deagogishvili/chapter-multi-task.\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"4 1\",\"pages\":\"vbae154\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525051/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbae154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

动机蛋白质表面的疏水斑块在蛋白质-蛋白质和蛋白质-配体相互作用中发挥着重要的功能作用。大面积的疏水表面也与聚集性疾病的发展有关。根据蛋白质序列预测暴露的疏水斑块是一项艰巨的任务。通过对基础模型进行微调，可以使用更小的数据集使模型适应新任务的具体细微差别。此外，多任务深度学习为解决数据缺口问题提供了一种前景广阔的解决方案，同时还优于单任务方法：在这项研究中，我们利用了最近发布的领先大型语言模型 Evolutionary Scale Models（ESM-2）。通过利用最近开发的参数高效微调方法，实现了对 ESM-2 的高效微调。这种方法能够对模型层进行全面训练，无需过多参数，也无需进行计算成本高昂的多序列分析。我们在局部（残基）和全局（蛋白质）层面探索了几项相关任务，以改进模型的表示。因此，我们的模型 PatchProt 不仅能预测疏水斑块区域，而且在预测二级结构和表面可及性预测等主要任务方面也优于现有方法。重要的是，我们的分析表明，包含相关的局部任务可以改善对更困难的全局任务的预测。这项研究为基于序列的蛋白质性质预测设定了一个新标准，并凸显了通过对相关任务进行训练来丰富模型表征的微调基础模型的巨大潜力。可用性与实现：https://github.com/Deagogishvili/chapter-multi-task。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PatchProt: hydrophobic patch prediction using protein foundation models.

Motivation: Hydrophobic patches on protein surfaces play important functional roles in protein-protein and protein-ligand interactions. Large hydrophobic surfaces are also involved in the progression of aggregation diseases. Predicting exposed hydrophobic patches from a protein sequence has shown to be a difficult task. Fine-tuning foundation models allows for adapting a model to the specific nuances of a new task using a much smaller dataset. Additionally, multitask deep learning offers a promising solution for addressing data gaps, simultaneously outperforming single-task methods.

Results: In this study, we harnessed a recently released leading large language model Evolutionary Scale Models (ESM-2). Efficient fine-tuning of ESM-2 was achieved by leveraging a recently developed parameter-efficient fine-tuning method. This approach enabled comprehensive training of model layers without excessive parameters and without the need to include a computationally expensive multiple sequence analysis. We explored several related tasks, at local (residue) and global (protein) levels, to improve the representation of the model. As a result, our model, PatchProt, cannot only predict hydrophobic patch areas but also outperforms existing methods at predicting primary tasks, including secondary structure and surface accessibility predictions. Importantly, our analysis shows that including related local tasks can improve predictions on more difficult global tasks. This research sets a new standard for sequence-based protein property prediction and highlights the remarkable potential of fine-tuning foundation models enriching the model representation by training over related tasks.

Availability and implementation: https://github.com/Deagogishvili/chapter-multi-task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量