LncRAnalyzer: a robust workflow for long non-coding RNA discovery using RNA-Seq

IF 5.7 1区 生物学 Q1 PLANT SCIENCES
Shinde Nikhil, Habeeb Shaik Mohideen, Raja Natesan Sella
{"title":"LncRAnalyzer: a robust workflow for long non-coding RNA discovery using RNA-Seq","authors":"Shinde Nikhil,&nbsp;Habeeb Shaik Mohideen,&nbsp;Raja Natesan Sella","doi":"10.1111/tpj.70509","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Long non-coding RNA (lncRNA) is a major transcript category that lacks protein-coding capabilities, with relatively low abundance and complex expression patterns. Distinguishing lncRNAs from protein-coding genes is a complex process involving multiple filtering steps. We developed an automated pipeline named LncRAnalyzer featuring retrained models for 60 species. This workflow aims to reduce the likelihood of obtaining protein-coding or partial protein-coding transcripts during lncRNA identification by utilizing eight distinct approaches. We conducted a 10-fold cross-validation of the sorghum models and training sets with their standard ones and other approaches using real-life RNA-Seq datasets and known lncRNA and CDS sequences of sorghum. The results showed that the sorghum models and training sets were outperformed. The pipeline output comprises upset plots illustrating the number of lncRNA/NPCTs identified by the approaches, commonly identified lncRNA and their classes, NPCTs, and expression count tables. A feature-level comparison and benchmarking analysis of LncRAnalyzer with four existing pipelines, namely, LncPipe, LncEvo, lncRNA-Annotation, and Plant-LncPipe, demonstrated that LncRAnalyzer is more comprehensive, easier to implement, and accurate in lncRNA predictions. This workflow also ascertains lncRNA origins from various Transposable Elements (TEs) in plants using TE annotations from APTEdb [http://apte.cp.utfpr.edu.br/]. LncRAnalyzer is publicly available on GitLab [https://gitlab.com/nikhilshinde0909/LncRAnalyzer.git] for academic users.</p>\n </div>","PeriodicalId":233,"journal":{"name":"The Plant Journal","volume":"124 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Plant Journal","FirstCategoryId":"2","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/tpj.70509","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Long non-coding RNA (lncRNA) is a major transcript category that lacks protein-coding capabilities, with relatively low abundance and complex expression patterns. Distinguishing lncRNAs from protein-coding genes is a complex process involving multiple filtering steps. We developed an automated pipeline named LncRAnalyzer featuring retrained models for 60 species. This workflow aims to reduce the likelihood of obtaining protein-coding or partial protein-coding transcripts during lncRNA identification by utilizing eight distinct approaches. We conducted a 10-fold cross-validation of the sorghum models and training sets with their standard ones and other approaches using real-life RNA-Seq datasets and known lncRNA and CDS sequences of sorghum. The results showed that the sorghum models and training sets were outperformed. The pipeline output comprises upset plots illustrating the number of lncRNA/NPCTs identified by the approaches, commonly identified lncRNA and their classes, NPCTs, and expression count tables. A feature-level comparison and benchmarking analysis of LncRAnalyzer with four existing pipelines, namely, LncPipe, LncEvo, lncRNA-Annotation, and Plant-LncPipe, demonstrated that LncRAnalyzer is more comprehensive, easier to implement, and accurate in lncRNA predictions. This workflow also ascertains lncRNA origins from various Transposable Elements (TEs) in plants using TE annotations from APTEdb [http://apte.cp.utfpr.edu.br/]. LncRAnalyzer is publicly available on GitLab [https://gitlab.com/nikhilshinde0909/LncRAnalyzer.git] for academic users.

Abstract Image

LncRAnalyzer:使用RNA- seq发现长非编码RNA的强大工作流程。
长链非编码RNA (Long non-coding RNA, lncRNA)是一类缺乏蛋白质编码能力的转录物,丰度相对较低,表达模式复杂。从蛋白质编码基因中区分lncrna是一个复杂的过程,涉及多个过滤步骤。我们开发了一个名为LncRAnalyzer的自动化管道,其中包含60个物种的重新训练模型。该工作流程旨在通过使用八种不同的方法来减少在lncRNA鉴定过程中获得蛋白质编码或部分蛋白质编码转录本的可能性。我们使用真实的RNA-Seq数据集和已知的高粱lncRNA和CDS序列,对高粱模型和训练集与其标准模型和其他方法进行了10倍交叉验证。结果表明,高粱模型和训练集具有较好的性能。管道输出包括表示通过该方法识别的lncRNA/ npct数量、通常识别的lncRNA及其类别、npct和表达计数表的翻图。将LncRAnalyzer与现有的lncrpipe、lncrevo、lncRNA- annotation和plant - lncrpipe四个管道进行特征级比较和基准分析,结果表明LncRAnalyzer在lncRNA预测方面更加全面、易于实现、准确。该工作流程还使用来自APTEdb [http://apte.cp.utfpr.edu.br/]]的TE注释确定了lncRNA来自植物中各种转座元件(TE)的起源。LncRAnalyzer在GitLab [https://gitlab.com/nikhilshinde0909/LncRAnalyzer.git]]上公开提供给学术用户。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
The Plant Journal
The Plant Journal 生物-植物科学
CiteScore
13.10
自引率
4.20%
发文量
415
审稿时长
2.3 months
期刊介绍: Publishing the best original research papers in all key areas of modern plant biology from the world"s leading laboratories, The Plant Journal provides a dynamic forum for this ever growing international research community. Plant science research is now at the forefront of research in the biological sciences, with breakthroughs in our understanding of fundamental processes in plants matching those in other organisms. The impact of molecular genetics and the availability of model and crop species can be seen in all aspects of plant biology. For publication in The Plant Journal the research must provide a highly significant new contribution to our understanding of plants and be of general interest to the plant science community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信