PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants.

IF 3.9 2区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Prabina Kumar Meher, Upendra Kumar Pradhan, Padma Lochan Sethi, Sanchita Naha, Ajit Gupta, Rajender Parsad
{"title":"PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants.","authors":"Prabina Kumar Meher, Upendra Kumar Pradhan, Padma Lochan Sethi, Sanchita Naha, Ajit Gupta, Rajender Parsad","doi":"10.1007/s11103-024-01500-6","DOIUrl":null,"url":null,"abstract":"<p><p>Photosynthetic proteins play a crucial role in agricultural productivity by harnessing light energy for plant growth. Understanding these proteins, especially within C<sub>3</sub> and C<sub>4</sub> pathways, holds promise for improving crops in challenging environments. Despite existing models, a comprehensive computational framework specifically targeting plant photosynthetic proteins is lacking. The underutilization of plant datasets in computational algorithms accentuates the gap this study aims to fill by introducing a novel sequence-based computational method for identifying these proteins. The scope of this study encompassed diverse plant species, ensuring comprehensive representation across C<sub>3</sub> and C<sub>4</sub> pathways. Utilizing six deep learning models and seven shallow learning algorithms, paired with six sequence-derived feature sets followed by feature selection strategy, this study developed a comprehensive model for prediction of plant-specific photosynthetic proteins. Following 5-fold cross-validation analysis, LightGBM with 65 and 90 LGBM-VIM selected features respectively emerged as the best models for C<sub>3</sub> (auROC: 91.78%, auPRC: 92.55%) and C<sub>4</sub> (auROC: 99.05%, auPRC: 99.18%) plants. Validation using an independent dataset confirmed the robustness of the proposed model for both C<sub>3</sub> (auROC: 87.23%, auPRC: 88.40%) and C<sub>4</sub> (auROC: 92.83%, auPRC: 92.29%) categories. Comparison with existing methods demonstrated the superiority of the proposed model in predicting plant-specific photosynthetic proteins. This study further established a free online prediction server PredPSP ( https://iasri-sg.icar.gov.in/predpsp/ ) to facilitate ongoing efforts for identifying photosynthetic proteins in C<sub>3</sub> and C<sub>4</sub> plants. Being first of its kind, this study offers valuable insights into predicting plant-specific photosynthetic proteins which holds significant implications for plant biology.</p>","PeriodicalId":20064,"journal":{"name":"Plant Molecular Biology","volume":"114 5","pages":"106"},"PeriodicalIF":3.9000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s11103-024-01500-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Photosynthetic proteins play a crucial role in agricultural productivity by harnessing light energy for plant growth. Understanding these proteins, especially within C3 and C4 pathways, holds promise for improving crops in challenging environments. Despite existing models, a comprehensive computational framework specifically targeting plant photosynthetic proteins is lacking. The underutilization of plant datasets in computational algorithms accentuates the gap this study aims to fill by introducing a novel sequence-based computational method for identifying these proteins. The scope of this study encompassed diverse plant species, ensuring comprehensive representation across C3 and C4 pathways. Utilizing six deep learning models and seven shallow learning algorithms, paired with six sequence-derived feature sets followed by feature selection strategy, this study developed a comprehensive model for prediction of plant-specific photosynthetic proteins. Following 5-fold cross-validation analysis, LightGBM with 65 and 90 LGBM-VIM selected features respectively emerged as the best models for C3 (auROC: 91.78%, auPRC: 92.55%) and C4 (auROC: 99.05%, auPRC: 99.18%) plants. Validation using an independent dataset confirmed the robustness of the proposed model for both C3 (auROC: 87.23%, auPRC: 88.40%) and C4 (auROC: 92.83%, auPRC: 92.29%) categories. Comparison with existing methods demonstrated the superiority of the proposed model in predicting plant-specific photosynthetic proteins. This study further established a free online prediction server PredPSP ( https://iasri-sg.icar.gov.in/predpsp/ ) to facilitate ongoing efforts for identifying photosynthetic proteins in C3 and C4 plants. Being first of its kind, this study offers valuable insights into predicting plant-specific photosynthetic proteins which holds significant implications for plant biology.

PredPSP:发现植物光合作用蛋白特异性途径的新型计算工具。
光合蛋白利用光能促进植物生长,对农业生产力起着至关重要的作用。了解这些蛋白质,尤其是 C3 和 C4 途径中的蛋白质,有望改善具有挑战性环境中的作物。尽管已有模型,但仍缺乏专门针对植物光合蛋白的综合计算框架。植物数据集在计算算法中的利用率较低,而本研究旨在通过引入一种基于序列的新型计算方法来识别这些蛋白质,从而填补这一空白。本研究的范围涵盖多种植物物种,确保全面代表 C3 和 C4 途径。本研究利用六种深度学习模型和七种浅层学习算法,配以六种序列衍生特征集和特征选择策略,开发出了用于预测植物特异性光合蛋白的综合模型。经过 5 倍交叉验证分析,分别具有 65 个和 90 个 LGBM-VIM 挑选特征的 LightGBM 成为 C3(auROC:91.78%,auPRC:92.55%)和 C4(auROC:99.05%,auPRC:99.18%)植物的最佳模型。使用独立数据集进行的验证证实了所提出的模型对 C3(auROC:87.23%,auPRC:88.40%)和 C4(auROC:92.83%,auPRC:92.29%)两类植物的稳健性。与现有方法的比较表明,所提出的模型在预测植物特异性光合蛋白质方面具有优势。该研究进一步建立了一个免费的在线预测服务器 PredPSP ( https://iasri-sg.icar.gov.in/predpsp/ ),以促进目前鉴定 C3 和 C4 植物光合蛋白的工作。作为同类研究中的首例,该研究为预测植物特异性光合蛋白提供了宝贵的见解,对植物生物学具有重要意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Molecular Biology
Plant Molecular Biology 生物-生化与分子生物学
自引率
2.00%
发文量
95
审稿时长
1.4 months
期刊介绍: Plant Molecular Biology is an international journal dedicated to rapid publication of original research articles in all areas of plant biology.The Editorial Board welcomes full-length manuscripts that address important biological problems of broad interest, including research in comparative genomics, functional genomics, proteomics, bioinformatics, computational biology, biochemical and regulatory networks, and biotechnology. Because space in the journal is limited, however, preference is given to publication of results that provide significant new insights into biological problems and that advance the understanding of structure, function, mechanisms, or regulation. Authors must ensure that results are of high quality and that manuscripts are written for a broad plant science audience.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信