{"title":"AtSubP-2.0: An integrated web server for the annotation of Arabidopsis proteome subcellular localization using deep learning.","authors":"Naveen Duhan, Rakesh Kaundal","doi":"10.1002/tpg2.20536","DOIUrl":null,"url":null,"abstract":"<p><p>The organization of subcellular components in a cell is critical for its function and studying cellular processes, protein-protein interactions, identifying potential drug targets, network analysis, and other systems biology mechanisms. Determining protein localization experimentally is time-consuming and expensive. Due to the need for meticulous experimentation, validation, and data analysis, computational methods provide a quick and accurate alternative. Arabidopsis thaliana, a beneficial model organism in plant biology, facilitates experimentation and applies to other plants. Predicting its proteins' subcellular localization can improve our understanding of cellular processes and have applications in crop improvement and biotechnology. We propose AtSubP-2.0, an extension of our previously developed and widely used AtSubP v1.0 tool for annotating the Arabidopsis proteome. For precise protein subcellular localization prediction, AtSubP-2.0 employs a four-phase strategy. The first phase differentiates between single and dual localization with accuracy (97.66% in fivefold training/testing, 98.10% on independent data) and high Matthews correlation coefficient (0.88 training, 0.90 independent). Single localized proteins are classified into 12 locations at the second phase, with accuracy (98.37% in fivefold training/testing, 97.43% on independent data) and Matthews correlation coefficient (0.94 training, 0.91 independent). The third phase categorizes dual location proteins into nine classes with accuracy (99.65% in fivefold training/testing, 98.16% on independent data) and Matthews correlation coefficient (0.92 training, 0.87 independent). We also employed a fourth phase that classifies the membrane type proteins predicted in phase I into single-pass and multi-pass membrane with accuracy (98% in fivefold training/testing, 98.55% on independent data) and a high Matthews correlation coefficient (0.95 training, 0.97 independent). A web-based prediction server has been implemented for community use and is freely available at https://kaabil.net/AtSubP2/, including a standalone version. AtSubP2 will help researchers to better understand organelle-specific functions, cellular processes, and regulatory mechanisms important for plant growth, development, and response to environmental stimuli.</p>","PeriodicalId":49002,"journal":{"name":"Plant Genome","volume":"18 1","pages":"e20536"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11807733/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/tpg2.20536","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
The organization of subcellular components in a cell is critical for its function and studying cellular processes, protein-protein interactions, identifying potential drug targets, network analysis, and other systems biology mechanisms. Determining protein localization experimentally is time-consuming and expensive. Due to the need for meticulous experimentation, validation, and data analysis, computational methods provide a quick and accurate alternative. Arabidopsis thaliana, a beneficial model organism in plant biology, facilitates experimentation and applies to other plants. Predicting its proteins' subcellular localization can improve our understanding of cellular processes and have applications in crop improvement and biotechnology. We propose AtSubP-2.0, an extension of our previously developed and widely used AtSubP v1.0 tool for annotating the Arabidopsis proteome. For precise protein subcellular localization prediction, AtSubP-2.0 employs a four-phase strategy. The first phase differentiates between single and dual localization with accuracy (97.66% in fivefold training/testing, 98.10% on independent data) and high Matthews correlation coefficient (0.88 training, 0.90 independent). Single localized proteins are classified into 12 locations at the second phase, with accuracy (98.37% in fivefold training/testing, 97.43% on independent data) and Matthews correlation coefficient (0.94 training, 0.91 independent). The third phase categorizes dual location proteins into nine classes with accuracy (99.65% in fivefold training/testing, 98.16% on independent data) and Matthews correlation coefficient (0.92 training, 0.87 independent). We also employed a fourth phase that classifies the membrane type proteins predicted in phase I into single-pass and multi-pass membrane with accuracy (98% in fivefold training/testing, 98.55% on independent data) and a high Matthews correlation coefficient (0.95 training, 0.97 independent). A web-based prediction server has been implemented for community use and is freely available at https://kaabil.net/AtSubP2/, including a standalone version. AtSubP2 will help researchers to better understand organelle-specific functions, cellular processes, and regulatory mechanisms important for plant growth, development, and response to environmental stimuli.
期刊介绍:
The Plant Genome publishes original research investigating all aspects of plant genomics. Technical breakthroughs reporting improvements in the efficiency and speed of acquiring and interpreting plant genomics data are welcome. The editorial board gives preference to novel reports that use innovative genomic applications that advance our understanding of plant biology that may have applications to crop improvement. The journal also publishes invited review articles and perspectives that offer insight and commentary on recent advances in genomics and their potential for agronomic improvement.