LncRAnalyzer：使用RNA- seq发现长非编码RNA的强大工作流程。

IF 5.7 1区生物学 Q1 PLANT SCIENCES

The Plant Journal Pub Date : 2025-10-16 DOI:10.1111/tpj.70509

Shinde Nikhil, Habeeb Shaik Mohideen, Raja Natesan Sella

{"title":"LncRAnalyzer：使用RNA- seq发现长非编码RNA的强大工作流程。","authors":"Shinde Nikhil, Habeeb Shaik Mohideen, Raja Natesan Sella","doi":"10.1111/tpj.70509","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Long non-coding RNA (lncRNA) is a major transcript category that lacks protein-coding capabilities, with relatively low abundance and complex expression patterns. Distinguishing lncRNAs from protein-coding genes is a complex process involving multiple filtering steps. We developed an automated pipeline named LncRAnalyzer featuring retrained models for 60 species. This workflow aims to reduce the likelihood of obtaining protein-coding or partial protein-coding transcripts during lncRNA identification by utilizing eight distinct approaches. We conducted a 10-fold cross-validation of the sorghum models and training sets with their standard ones and other approaches using real-life RNA-Seq datasets and known lncRNA and CDS sequences of sorghum. The results showed that the sorghum models and training sets were outperformed. The pipeline output comprises upset plots illustrating the number of lncRNA/NPCTs identified by the approaches, commonly identified lncRNA and their classes, NPCTs, and expression count tables. A feature-level comparison and benchmarking analysis of LncRAnalyzer with four existing pipelines, namely, LncPipe, LncEvo, lncRNA-Annotation, and Plant-LncPipe, demonstrated that LncRAnalyzer is more comprehensive, easier to implement, and accurate in lncRNA predictions. This workflow also ascertains lncRNA origins from various Transposable Elements (TEs) in plants using TE annotations from APTEdb [http://apte.cp.utfpr.edu.br/]. LncRAnalyzer is publicly available on GitLab [https://gitlab.com/nikhilshinde0909/LncRAnalyzer.git] for academic users.</p>\n </div>","PeriodicalId":233,"journal":{"name":"The Plant Journal","volume":"124 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LncRAnalyzer: a robust workflow for long non-coding RNA discovery using RNA-Seq\",\"authors\":\"Shinde Nikhil, Habeeb Shaik Mohideen, Raja Natesan Sella\",\"doi\":\"10.1111/tpj.70509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Long non-coding RNA (lncRNA) is a major transcript category that lacks protein-coding capabilities, with relatively low abundance and complex expression patterns. Distinguishing lncRNAs from protein-coding genes is a complex process involving multiple filtering steps. We developed an automated pipeline named LncRAnalyzer featuring retrained models for 60 species. This workflow aims to reduce the likelihood of obtaining protein-coding or partial protein-coding transcripts during lncRNA identification by utilizing eight distinct approaches. We conducted a 10-fold cross-validation of the sorghum models and training sets with their standard ones and other approaches using real-life RNA-Seq datasets and known lncRNA and CDS sequences of sorghum. The results showed that the sorghum models and training sets were outperformed. The pipeline output comprises upset plots illustrating the number of lncRNA/NPCTs identified by the approaches, commonly identified lncRNA and their classes, NPCTs, and expression count tables. A feature-level comparison and benchmarking analysis of LncRAnalyzer with four existing pipelines, namely, LncPipe, LncEvo, lncRNA-Annotation, and Plant-LncPipe, demonstrated that LncRAnalyzer is more comprehensive, easier to implement, and accurate in lncRNA predictions. This workflow also ascertains lncRNA origins from various Transposable Elements (TEs) in plants using TE annotations from APTEdb [http://apte.cp.utfpr.edu.br/]. LncRAnalyzer is publicly available on GitLab [https://gitlab.com/nikhilshinde0909/LncRAnalyzer.git] for academic users.</p>\\n </div>\",\"PeriodicalId\":233,\"journal\":{\"name\":\"The Plant Journal\",\"volume\":\"124 1\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Plant Journal\",\"FirstCategoryId\":\"2\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/tpj.70509\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PLANT SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Plant Journal","FirstCategoryId":"2","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/tpj.70509","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PLANT SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

长链非编码RNA （Long non-coding RNA, lncRNA）是一类缺乏蛋白质编码能力的转录物，丰度相对较低，表达模式复杂。从蛋白质编码基因中区分lncrna是一个复杂的过程，涉及多个过滤步骤。我们开发了一个名为LncRAnalyzer的自动化管道，其中包含60个物种的重新训练模型。该工作流程旨在通过使用八种不同的方法来减少在lncRNA鉴定过程中获得蛋白质编码或部分蛋白质编码转录本的可能性。我们使用真实的RNA-Seq数据集和已知的高粱lncRNA和CDS序列，对高粱模型和训练集与其标准模型和其他方法进行了10倍交叉验证。结果表明，高粱模型和训练集具有较好的性能。管道输出包括表示通过该方法识别的lncRNA/ npct数量、通常识别的lncRNA及其类别、npct和表达计数表的翻图。将LncRAnalyzer与现有的lncrpipe、lncrevo、lncRNA- annotation和plant - lncrpipe四个管道进行特征级比较和基准分析，结果表明LncRAnalyzer在lncRNA预测方面更加全面、易于实现、准确。该工作流程还使用来自APTEdb [http://apte.cp.utfpr.edu.br/]]的TE注释确定了lncRNA来自植物中各种转座元件（TE）的起源。LncRAnalyzer在GitLab [https://gitlab.com/nikhilshinde0909/LncRAnalyzer.git]]上公开提供给学术用户。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

LncRAnalyzer: a robust workflow for long non-coding RNA discovery using RNA-Seq

查看原文本刊更多论文

LncRAnalyzer: a robust workflow for long non-coding RNA discovery using RNA-Seq

Long non-coding RNA (lncRNA) is a major transcript category that lacks protein-coding capabilities, with relatively low abundance and complex expression patterns. Distinguishing lncRNAs from protein-coding genes is a complex process involving multiple filtering steps. We developed an automated pipeline named LncRAnalyzer featuring retrained models for 60 species. This workflow aims to reduce the likelihood of obtaining protein-coding or partial protein-coding transcripts during lncRNA identification by utilizing eight distinct approaches. We conducted a 10-fold cross-validation of the sorghum models and training sets with their standard ones and other approaches using real-life RNA-Seq datasets and known lncRNA and CDS sequences of sorghum. The results showed that the sorghum models and training sets were outperformed. The pipeline output comprises upset plots illustrating the number of lncRNA/NPCTs identified by the approaches, commonly identified lncRNA and their classes, NPCTs, and expression count tables. A feature-level comparison and benchmarking analysis of LncRAnalyzer with four existing pipelines, namely, LncPipe, LncEvo, lncRNA-Annotation, and Plant-LncPipe, demonstrated that LncRAnalyzer is more comprehensive, easier to implement, and accurate in lncRNA predictions. This workflow also ascertains lncRNA origins from various Transposable Elements (TEs) in plants using TE annotations from APTEdb [http://apte.cp.utfpr.edu.br/]. LncRAnalyzer is publicly available on GitLab [https://gitlab.com/nikhilshinde0909/LncRAnalyzer.git] for academic users.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Plant Journal 生物-植物科学

CiteScore

13.10

自引率

4.20%

发文量

415

审稿时长

2.3 months

期刊介绍： Publishing the best original research papers in all key areas of modern plant biology from the world"s leading laboratories, The Plant Journal provides a dynamic forum for this ever growing international research community. Plant science research is now at the forefront of research in the biological sciences, with breakthroughs in our understanding of fundamental processes in plants matching those in other organisms. The impact of molecular genetics and the availability of model and crop species can be seen in all aspects of plant biology. For publication in The Plant Journal the research must provide a highly significant new contribution to our understanding of plants and be of general interest to the plant science community.