基于循环神经网络预测哺乳动物蛋白质中的 O-GlcNAcylation 位点

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Pedro Seber, Richard D. Braatz
{"title":"基于循环神经网络预测哺乳动物蛋白质中的 O-GlcNAcylation 位点","authors":"Pedro Seber,&nbsp;Richard D. Braatz","doi":"10.1016/j.compchemeng.2024.108818","DOIUrl":null,"url":null,"abstract":"<div><p>O-GlcNAcylation has the potential to be an important target for therapeutics, but a motif or an algorithm to reliably predict O-GlcNAcylation sites is not available. Current predictive models are insufficient as they fail to generalize, and many are no longer available. This article constructs recurrent neural network models to predict O-GlcNAcylation sites based on protein sequences. Different datasets are evaluated separately and assessed in terms of strengths and issues. Within a given dataset, results are robust to changes in cross-validation and test data as determined by nested validation. The best model achieves an F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> score of 36% (more than 3.5-fold greater than the previous best model) and a Matthews Correlation Coefficient of 35% (more than 4.5-fold greater than the previous best model), and, for the F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> score, 7.6-fold higher than when not using any model. Shapley values are used to interpret the model’s predictions and provide biological insight into O-GlcNAcylation.</p></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"189 ","pages":"Article 108818"},"PeriodicalIF":3.9000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recurrent neural network-based prediction of O-GlcNAcylation sites in mammalian proteins\",\"authors\":\"Pedro Seber,&nbsp;Richard D. Braatz\",\"doi\":\"10.1016/j.compchemeng.2024.108818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>O-GlcNAcylation has the potential to be an important target for therapeutics, but a motif or an algorithm to reliably predict O-GlcNAcylation sites is not available. Current predictive models are insufficient as they fail to generalize, and many are no longer available. This article constructs recurrent neural network models to predict O-GlcNAcylation sites based on protein sequences. Different datasets are evaluated separately and assessed in terms of strengths and issues. Within a given dataset, results are robust to changes in cross-validation and test data as determined by nested validation. The best model achieves an F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> score of 36% (more than 3.5-fold greater than the previous best model) and a Matthews Correlation Coefficient of 35% (more than 4.5-fold greater than the previous best model), and, for the F<span><math><msub><mrow></mrow><mrow><mn>1</mn></mrow></msub></math></span> score, 7.6-fold higher than when not using any model. Shapley values are used to interpret the model’s predictions and provide biological insight into O-GlcNAcylation.</p></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"189 \",\"pages\":\"Article 108818\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135424002369\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424002369","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

O-GlcNAcylation 有可能成为治疗药物的重要靶点,但目前还没有可靠预测 O-GlcNAcylation 位点的主题或算法。目前的预测模型不够充分,因为它们无法推广,而且许多模型已不再可用。本文构建了递归神经网络模型,根据蛋白质序列预测 O-GlcNAcylation 位点。本文分别对不同的数据集进行了评估,并从优势和问题两个方面进行了评价。在给定的数据集中,结果对交叉验证和测试数据的变化是稳健的,这是由嵌套验证决定的。最佳模型的 F1 分数达到 36%(比之前的最佳模型高出 3.5 倍以上),马修斯相关系数达到 35%(比之前的最佳模型高出 4.5 倍以上),F1 分数比不使用任何模型时高出 7.6 倍。Shapley 值用于解释模型的预测结果,并提供有关 O-GlcNAcylation 的生物学见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Recurrent neural network-based prediction of O-GlcNAcylation sites in mammalian proteins

O-GlcNAcylation has the potential to be an important target for therapeutics, but a motif or an algorithm to reliably predict O-GlcNAcylation sites is not available. Current predictive models are insufficient as they fail to generalize, and many are no longer available. This article constructs recurrent neural network models to predict O-GlcNAcylation sites based on protein sequences. Different datasets are evaluated separately and assessed in terms of strengths and issues. Within a given dataset, results are robust to changes in cross-validation and test data as determined by nested validation. The best model achieves an F1 score of 36% (more than 3.5-fold greater than the previous best model) and a Matthews Correlation Coefficient of 35% (more than 4.5-fold greater than the previous best model), and, for the F1 score, 7.6-fold higher than when not using any model. Shapley values are used to interpret the model’s predictions and provide biological insight into O-GlcNAcylation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信