PlantLncBoost: key features for plant lncRNA identification and significant improvement in accuracy and generalization

IF 8.3 1区 生物学 Q1 PLANT SCIENCES
New Phytologist Pub Date : 2025-05-28 DOI:10.1111/nph.70211
Xue‐Chan Tian, Shuai Nie, Douglas Domingues, Alexandre Rossi Paschoal, Li‐Bo Jiang, Jian‐Feng Mao
{"title":"PlantLncBoost: key features for plant lncRNA identification and significant improvement in accuracy and generalization","authors":"Xue‐Chan Tian, Shuai Nie, Douglas Domingues, Alexandre Rossi Paschoal, Li‐Bo Jiang, Jian‐Feng Mao","doi":"10.1111/nph.70211","DOIUrl":null,"url":null,"abstract":"Summary<jats:list list-type=\"bullet\"> <jats:list-item>Long noncoding RNAs (lncRNAs) are critical regulators of numerous biological processes in plants. Nevertheless, their identification is challenging due to the low sequence conservation across various species. Existing computational methods for lncRNA identification often face difficulties in generalizing across diverse plant species, highlighting the need for more robust and versatile identification models.</jats:list-item> <jats:list-item>Here, we present PlantLncBoost, a novel computational tool designed to improve the generalization in plant lncRNA identification. By integrating advanced gradient boosting algorithms with comprehensive feature selection, our approach achieves both high accuracy and generalizability. We conducted an extensive analysis of 1662 features and identified three key features – ORF coverage, complex Fourier average, and atomic Fourier amplitude – that effectively distinguish lncRNAs from mRNAs.</jats:list-item> <jats:list-item>We assessed the performance of PlantLncBoost using comprehensive datasets from 20 plant species. The model exhibited exceptional performance, with an accuracy of 96.63%, a sensitivity of 98.42%, and a specificity of 94.93%, significantly outperforming existing tools. Further analysis revealed that the features we selected effectively capture the differences between lncRNAs and mRNAs across a variety of plant species.</jats:list-item> <jats:list-item>PlantLncBoost represents a significant advancement in plant lncRNA identification. It is freely accessible on GitHub (<jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/xuechantian/PlantLncBoost\">https://github.com/xuechantian/PlantLncBoost</jats:ext-link>) and has been integrated into a comprehensive analysis pipeline, Plant‐LncRNA‐pipeline v.2 (<jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/xuechantian/Plant-LncRNA-pipeline-v2\">https://github.com/xuechantian/Plant‐LncRNA‐pipeline‐v2</jats:ext-link>).</jats:list-item> </jats:list>","PeriodicalId":214,"journal":{"name":"New Phytologist","volume":"35 1","pages":""},"PeriodicalIF":8.3000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Phytologist","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1111/nph.70211","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Summary Long noncoding RNAs (lncRNAs) are critical regulators of numerous biological processes in plants. Nevertheless, their identification is challenging due to the low sequence conservation across various species. Existing computational methods for lncRNA identification often face difficulties in generalizing across diverse plant species, highlighting the need for more robust and versatile identification models. Here, we present PlantLncBoost, a novel computational tool designed to improve the generalization in plant lncRNA identification. By integrating advanced gradient boosting algorithms with comprehensive feature selection, our approach achieves both high accuracy and generalizability. We conducted an extensive analysis of 1662 features and identified three key features – ORF coverage, complex Fourier average, and atomic Fourier amplitude – that effectively distinguish lncRNAs from mRNAs. We assessed the performance of PlantLncBoost using comprehensive datasets from 20 plant species. The model exhibited exceptional performance, with an accuracy of 96.63%, a sensitivity of 98.42%, and a specificity of 94.93%, significantly outperforming existing tools. Further analysis revealed that the features we selected effectively capture the differences between lncRNAs and mRNAs across a variety of plant species. PlantLncBoost represents a significant advancement in plant lncRNA identification. It is freely accessible on GitHub (https://github.com/xuechantian/PlantLncBoost) and has been integrated into a comprehensive analysis pipeline, Plant‐LncRNA‐pipeline v.2 (https://github.com/xuechantian/Plant‐LncRNA‐pipeline‐v2).
PlantLncBoost:植物lncRNA鉴定的关键特性,准确性和泛化显著提高
长链非编码rna (lncRNAs)是植物中许多生物过程的重要调控因子。然而,由于不同物种之间的序列保守性较低,它们的鉴定具有挑战性。现有的lncRNA鉴定计算方法往往面临在不同植物物种中推广的困难,这突出了对更健壮和通用的鉴定模型的需求。在这里,我们提出了PlantLncBoost,一个新的计算工具,旨在提高植物lncRNA鉴定的通用性。通过将先进的梯度增强算法与综合特征选择相结合,我们的方法实现了高精度和泛化性。我们对1662个特征进行了广泛的分析,并确定了三个关键特征- ORF覆盖范围,复傅立叶平均值和原子傅立叶振幅-有效区分lncrna和mrna。我们使用来自20种植物的综合数据集来评估PlantLncBoost的性能。该模型的准确率为96.63%,灵敏度为98.42%,特异性为94.93%,明显优于现有的工具。进一步的分析表明,我们选择的特征有效地捕获了多种植物物种中lncrna和mrna之间的差异。PlantLncBoost代表了植物lncRNA鉴定的重大进展。它可以在GitHub (https://github.com/xuechantian/PlantLncBoost)上免费访问,并已集成到一个全面的分析管道中,Plant‐LncRNA‐pipeline v.2地理地理(https://github.com/xuechantian/Plant LncRNA管道检测v2)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
New Phytologist
New Phytologist 生物-植物科学
自引率
5.30%
发文量
728
期刊介绍: New Phytologist is an international electronic journal published 24 times a year. It is owned by the New Phytologist Foundation, a non-profit-making charitable organization dedicated to promoting plant science. The journal publishes excellent, novel, rigorous, and timely research and scholarship in plant science and its applications. The articles cover topics in five sections: Physiology & Development, Environment, Interaction, Evolution, and Transformative Plant Biotechnology. These sections encompass intracellular processes, global environmental change, and encourage cross-disciplinary approaches. The journal recognizes the use of techniques from molecular and cell biology, functional genomics, modeling, and system-based approaches in plant science. Abstracting and Indexing Information for New Phytologist includes Academic Search, AgBiotech News & Information, Agroforestry Abstracts, Biochemistry & Biophysics Citation Index, Botanical Pesticides, CAB Abstracts®, Environment Index, Global Health, and Plant Breeding Abstracts, and others.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信