Prediction of Plant Resistance Proteins Using Alignment-Based and Alignment-Free Approaches.

IF 3.4 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Proteomics Pub Date : 2024-11-24 DOI:10.1002/pmic.202400261
Pushpendra Singh Gahlot, Shubham Choudhury, Nisha Bajiya, Nishant Kumar, Gajendra P S Raghava
{"title":"Prediction of Plant Resistance Proteins Using Alignment-Based and Alignment-Free Approaches.","authors":"Pushpendra Singh Gahlot, Shubham Choudhury, Nisha Bajiya, Nishant Kumar, Gajendra P S Raghava","doi":"10.1002/pmic.202400261","DOIUrl":null,"url":null,"abstract":"<p><p>Plant disease resistance (PDR) proteins are critical in identifying plant pathogens. Predicting PDR protein is essential for understanding plant-pathogen interactions and developing strategies for crop protection. This study proposes a hybrid model for predicting and designing PDR proteins against plant-invading pathogens. Initially, we tried alignment-based approaches, such as Basic Local Alignment Search Tool (BLAST) for similarity search and MERCI for motif search. These alignment-based approaches exhibit very poor coverage or sensitivity. To overcome these limitations, we developed alignment-free or machine learning (ML)-based methods using compositional features of proteins. Our ML-based model, developed using compositional features of proteins, achieved a maximum performance area under the receiver operating characteristic curve (AUROC) of 0.91. The performance of our model improved significantly from AUROC of 0.91-0.95 when we used evolutionary information instead of protein sequence. Finally, we developed a hybrid or ensemble model that combined our best ML model with BLAST and obtained the highest AUROC of 0.98 on the validation dataset. We trained and tested our models on a training dataset and evaluated them on a validation dataset. None of the proteins in our validation dataset are more than 40% similar to proteins in the training dataset. One of the objectives of this study is to facilitate the scientific community working in plant biology. Thus, we developed an online platform for predicting and designing plant resistance proteins, \"PlantDRPpred\" (https://webs.iiitd.edu.in/raghava/plantdrppred).</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":" ","pages":"e202400261"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pmic.202400261","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Plant disease resistance (PDR) proteins are critical in identifying plant pathogens. Predicting PDR protein is essential for understanding plant-pathogen interactions and developing strategies for crop protection. This study proposes a hybrid model for predicting and designing PDR proteins against plant-invading pathogens. Initially, we tried alignment-based approaches, such as Basic Local Alignment Search Tool (BLAST) for similarity search and MERCI for motif search. These alignment-based approaches exhibit very poor coverage or sensitivity. To overcome these limitations, we developed alignment-free or machine learning (ML)-based methods using compositional features of proteins. Our ML-based model, developed using compositional features of proteins, achieved a maximum performance area under the receiver operating characteristic curve (AUROC) of 0.91. The performance of our model improved significantly from AUROC of 0.91-0.95 when we used evolutionary information instead of protein sequence. Finally, we developed a hybrid or ensemble model that combined our best ML model with BLAST and obtained the highest AUROC of 0.98 on the validation dataset. We trained and tested our models on a training dataset and evaluated them on a validation dataset. None of the proteins in our validation dataset are more than 40% similar to proteins in the training dataset. One of the objectives of this study is to facilitate the scientific community working in plant biology. Thus, we developed an online platform for predicting and designing plant resistance proteins, "PlantDRPpred" (https://webs.iiitd.edu.in/raghava/plantdrppred).

利用基于配位和无配位方法预测植物抗性蛋白
植物抗病性(PDR)蛋白是识别植物病原体的关键。预测植物抗病蛋白对于了解植物与病原体之间的相互作用和制定作物保护策略至关重要。本研究提出了一种预测和设计抗植物病原菌 PDR 蛋白的混合模型。起初,我们尝试了基于比对的方法,如用于相似性搜索的基本局部比对搜索工具(BLAST)和用于主题搜索的 MERCI。这些基于比对的方法显示出很低的覆盖率或灵敏度。为了克服这些局限性,我们利用蛋白质的组成特征开发了无配对或基于机器学习(ML)的方法。我们利用蛋白质的组成特征开发的基于 ML 的模型的接收者操作特征曲线下面积(AUROC)达到了 0.91 的最高性能。当我们使用进化信息而不是蛋白质序列时,我们模型的性能在 0.91-0.95 的 AUROC 基础上有了显著提高。最后,我们开发了一个混合模型或集合模型,将最佳 ML 模型与 BLAST 结合在一起,在验证数据集上获得了最高的 AUROC(0.98)。我们在训练数据集上对模型进行了训练和测试,并在验证数据集上对模型进行了评估。在我们的验证数据集中,没有一个蛋白质与训练数据集中的蛋白质相似度超过 40%。本研究的目标之一是为从事植物生物学研究的科学界提供便利。因此,我们开发了一个用于预测和设计植物抗性蛋白的在线平台 "PlantDRPpred" (https://webs.iiitd.edu.in/raghava/plantdrppred)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Proteomics
Proteomics 生物-生化研究方法
CiteScore
6.30
自引率
5.90%
发文量
193
审稿时长
3 months
期刊介绍: PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信