Sitetack：利用已知 PTM 改进 PTM 预测的深度学习模型。

Bioinformatics (Oxford, England) Pub Date : 2024-11-01 DOI:10.1093/bioinformatics/btae602

Clair S Gutierrez, Alia A Kassim, Benjamin D Gutierrez, Ronald T Raines

{"title":"Sitetack：利用已知 PTM 改进 PTM 预测的深度学习模型。","authors":"Clair S Gutierrez, Alia A Kassim, Benjamin D Gutierrez, Ronald T Raines","doi":"10.1093/bioinformatics/btae602","DOIUrl":null,"url":null,"abstract":"Motivation: Post-translational modifications (PTMs) increase the diversity of the proteome and are vital to organismal life and therapeutic strategies. Deep learning has been used to predict PTM locations. Still, limitations in datasets and their analyses compromise success.Results: We evaluated the use of known PTM sites in prediction via sequence-based deep learning algorithms. For each PTM, known locations of that PTM were encoded as a separate amino acid before sequences were encoded via word embedding and passed into a convolutional neural network that predicts the probability of that PTM at a given site. Without labeling known PTMs, our models are on par with others. With labeling, however, we improved significantly upon extant models. Moreover, knowing PTM locations can increase the predictability of a different PTM. Our findings highlight the importance of PTMs for the installation of additional PTMs. We anticipate that including known PTM locations will enhance the performance of other proteomic machine learning algorithms.Availability and implementation: Sitetack is available as a web tool at https://sitetack.net; the source code, representative datasets, instructions for local use, and select models are available at https://github.com/clair-gutierrez/sitetack.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552626/pdf/","citationCount":"0","resultStr":"{\"title\":\"Sitetack: a deep learning model that improves PTM prediction by using known PTMs.\",\"authors\":\"Clair S Gutierrez, Alia A Kassim, Benjamin D Gutierrez, Ronald T Raines\",\"doi\":\"10.1093/bioinformatics/btae602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Post-translational modifications (PTMs) increase the diversity of the proteome and are vital to organismal life and therapeutic strategies. Deep learning has been used to predict PTM locations. Still, limitations in datasets and their analyses compromise success.Results: We evaluated the use of known PTM sites in prediction via sequence-based deep learning algorithms. For each PTM, known locations of that PTM were encoded as a separate amino acid before sequences were encoded via word embedding and passed into a convolutional neural network that predicts the probability of that PTM at a given site. Without labeling known PTMs, our models are on par with others. With labeling, however, we improved significantly upon extant models. Moreover, knowing PTM locations can increase the predictability of a different PTM. Our findings highlight the importance of PTMs for the installation of additional PTMs. We anticipate that including known PTM locations will enhance the performance of other proteomic machine learning algorithms.Availability and implementation: Sitetack is available as a web tool at https://sitetack.net; the source code, representative datasets, instructions for local use, and select models are available at https://github.com/clair-gutierrez/sitetack.\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552626/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae602\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

动机翻译后修饰（PTM）增加了蛋白质组的多样性，对生物体生命和治疗策略至关重要。深度学习已被用于预测PTM位置。然而，数据集及其分析的局限性影响了成功率：我们评估了通过基于序列的深度学习算法预测已知 PTM 位点的使用情况。对于每个 PTM，在通过词嵌入对序列进行编码之前，先将该 PTM 的已知位置编码为单独的氨基酸，然后将其输入卷积神经网络，该网络可预测给定位置上该 PTM 的概率。在不标记已知 PTM 的情况下，我们的模型与其他模型相当。但是，在标注后，我们的模型比现有模型有了显著提高。此外，了解 PTM 的位置可以提高对不同 PTM 的预测能力。我们的发现凸显了 PTM 对于安装其他 PTM 的重要性。我们预计，加入已知的 PTM 位置将提高其他蛋白质组机器学习算法的性能：Sitetack 是一种网络工具，可在 https://sitetack.net 网站上获取；源代码、代表性数据集、本地使用说明和精选模型可在 https://github.com/clair-gutierrez/sitetack.Supplementary 信息网站上获取：补充数据可在 Bioinformatics online 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sitetack: a deep learning model that improves PTM prediction by using known PTMs.

Motivation: Post-translational modifications (PTMs) increase the diversity of the proteome and are vital to organismal life and therapeutic strategies. Deep learning has been used to predict PTM locations. Still, limitations in datasets and their analyses compromise success.

Results: We evaluated the use of known PTM sites in prediction via sequence-based deep learning algorithms. For each PTM, known locations of that PTM were encoded as a separate amino acid before sequences were encoded via word embedding and passed into a convolutional neural network that predicts the probability of that PTM at a given site. Without labeling known PTMs, our models are on par with others. With labeling, however, we improved significantly upon extant models. Moreover, knowing PTM locations can increase the predictability of a different PTM. Our findings highlight the importance of PTMs for the installation of additional PTMs. We anticipate that including known PTM locations will enhance the performance of other proteomic machine learning algorithms.

Availability and implementation: Sitetack is available as a web tool at https://sitetack.net; the source code, representative datasets, instructions for local use, and select models are available at https://github.com/clair-gutierrez/sitetack.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量