AtSubP-2.0: An integrated web server for the annotation of Arabidopsis proteome subcellular localization using deep learning.

IF 3.9 2区 生物学 Q1 GENETICS & HEREDITY
Plant Genome Pub Date : 2025-03-01 DOI:10.1002/tpg2.20536
Naveen Duhan, Rakesh Kaundal
{"title":"AtSubP-2.0: An integrated web server for the annotation of Arabidopsis proteome subcellular localization using deep learning.","authors":"Naveen Duhan, Rakesh Kaundal","doi":"10.1002/tpg2.20536","DOIUrl":null,"url":null,"abstract":"<p><p>The organization of subcellular components in a cell is critical for its function and studying cellular processes, protein-protein interactions, identifying potential drug targets, network analysis, and other systems biology mechanisms. Determining protein localization experimentally is time-consuming and expensive. Due to the need for meticulous experimentation, validation, and data analysis, computational methods provide a quick and accurate alternative. Arabidopsis thaliana, a beneficial model organism in plant biology, facilitates experimentation and applies to other plants. Predicting its proteins' subcellular localization can improve our understanding of cellular processes and have applications in crop improvement and biotechnology. We propose AtSubP-2.0, an extension of our previously developed and widely used AtSubP v1.0 tool for annotating the Arabidopsis proteome. For precise protein subcellular localization prediction, AtSubP-2.0 employs a four-phase strategy. The first phase differentiates between single and dual localization with accuracy (97.66% in fivefold training/testing, 98.10% on independent data) and high Matthews correlation coefficient (0.88 training, 0.90 independent). Single localized proteins are classified into 12 locations at the second phase, with accuracy (98.37% in fivefold training/testing, 97.43% on independent data) and Matthews correlation coefficient (0.94 training, 0.91 independent). The third phase categorizes dual location proteins into nine classes with accuracy (99.65% in fivefold training/testing, 98.16% on independent data) and Matthews correlation coefficient (0.92 training, 0.87 independent). We also employed a fourth phase that classifies the membrane type proteins predicted in phase I into single-pass and multi-pass membrane with accuracy (98% in fivefold training/testing, 98.55% on independent data) and a high Matthews correlation coefficient (0.95 training, 0.97 independent). A web-based prediction server has been implemented for community use and is freely available at https://kaabil.net/AtSubP2/, including a standalone version. AtSubP2 will help researchers to better understand organelle-specific functions, cellular processes, and regulatory mechanisms important for plant growth, development, and response to environmental stimuli.</p>","PeriodicalId":49002,"journal":{"name":"Plant Genome","volume":"18 1","pages":"e20536"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11807733/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/tpg2.20536","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

The organization of subcellular components in a cell is critical for its function and studying cellular processes, protein-protein interactions, identifying potential drug targets, network analysis, and other systems biology mechanisms. Determining protein localization experimentally is time-consuming and expensive. Due to the need for meticulous experimentation, validation, and data analysis, computational methods provide a quick and accurate alternative. Arabidopsis thaliana, a beneficial model organism in plant biology, facilitates experimentation and applies to other plants. Predicting its proteins' subcellular localization can improve our understanding of cellular processes and have applications in crop improvement and biotechnology. We propose AtSubP-2.0, an extension of our previously developed and widely used AtSubP v1.0 tool for annotating the Arabidopsis proteome. For precise protein subcellular localization prediction, AtSubP-2.0 employs a four-phase strategy. The first phase differentiates between single and dual localization with accuracy (97.66% in fivefold training/testing, 98.10% on independent data) and high Matthews correlation coefficient (0.88 training, 0.90 independent). Single localized proteins are classified into 12 locations at the second phase, with accuracy (98.37% in fivefold training/testing, 97.43% on independent data) and Matthews correlation coefficient (0.94 training, 0.91 independent). The third phase categorizes dual location proteins into nine classes with accuracy (99.65% in fivefold training/testing, 98.16% on independent data) and Matthews correlation coefficient (0.92 training, 0.87 independent). We also employed a fourth phase that classifies the membrane type proteins predicted in phase I into single-pass and multi-pass membrane with accuracy (98% in fivefold training/testing, 98.55% on independent data) and a high Matthews correlation coefficient (0.95 training, 0.97 independent). A web-based prediction server has been implemented for community use and is freely available at https://kaabil.net/AtSubP2/, including a standalone version. AtSubP2 will help researchers to better understand organelle-specific functions, cellular processes, and regulatory mechanisms important for plant growth, development, and response to environmental stimuli.

AtSubP-2.0:一个集成的web服务器,用于使用深度学习对拟南芥蛋白质组亚细胞定位进行注释。
细胞中亚细胞成分的组织对于细胞的功能和研究细胞过程、蛋白质-蛋白质相互作用、识别潜在的药物靶点、网络分析和其他系统生物学机制至关重要。通过实验确定蛋白质定位既耗时又昂贵。由于需要细致的实验、验证和数据分析,计算方法提供了一种快速而准确的替代方法。拟南芥(Arabidopsis thaliana)是植物生物学中一种有益的模式生物,具有实验方便和应用价值。预测其蛋白质的亚细胞定位可以提高我们对细胞过程的理解,并在作物改良和生物技术方面具有应用价值。我们提出了AtSubP-2.0,这是我们之前开发并广泛使用的用于注释拟南芥蛋白质组的AtSubP v1.0工具的扩展。为了精确预测蛋白质亚细胞定位,AtSubP-2.0采用了四阶段策略。第一阶段区分单一定位和双重定位,准确率(五倍训练/测试时为97.66%,独立数据时为98.10%)和高马修斯相关系数(训练0.88,独立数据为0.90)。第二阶段将单个定位蛋白划分为12个位置,准确率为98.37%(5倍训练/测试,97.43%独立数据),马修斯相关系数为0.94训练,0.91独立)。第三阶段将双定位蛋白分为9类,准确率为99.65%,独立数据为98.16%,马修斯相关系数为0.92(训练/测试),独立数据为0.87。我们还采用了第四阶段,将第一阶段预测的膜型蛋白分为单次和多次膜,准确率(五次训练/测试98%,独立数据98.55%)和高马修斯相关系数(训练0.95,独立0.97)。一个基于web的预测服务器已经实现,供社区使用,并可在https://kaabil.net/AtSubP2/免费获得,包括一个独立版本。AtSubP2将帮助研究人员更好地了解植物生长、发育和对环境刺激反应的重要细胞器特异性功能、细胞过程和调节机制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Genome
Plant Genome PLANT SCIENCES-GENETICS & HEREDITY
CiteScore
6.00
自引率
4.80%
发文量
93
审稿时长
>12 weeks
期刊介绍: The Plant Genome publishes original research investigating all aspects of plant genomics. Technical breakthroughs reporting improvements in the efficiency and speed of acquiring and interpreting plant genomics data are welcome. The editorial board gives preference to novel reports that use innovative genomic applications that advance our understanding of plant biology that may have applications to crop improvement. The journal also publishes invited review articles and perspectives that offer insight and commentary on recent advances in genomics and their potential for agronomic improvement.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信