利用 TERRACE 精确装配环状 RNA

IF 6.2 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research Pub Date : 2024-07-26 DOI:10.1101/gr.279106.124

Tasfia Zahin, Qian Shi, Xiaofei Carl Zang, Mingfu Shao

{"title":"利用 TERRACE 精确装配环状 RNA","authors":"Tasfia Zahin, Qian Shi, Xiaofei Carl Zang, Mingfu Shao","doi":"10.1101/gr.279106.124","DOIUrl":null,"url":null,"abstract":"Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with its 5' and 3' ends covalently bonded. circRNAs are known to be more stable than linear RNAs, admit distinct properties and functions, and have been proven to be promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a high-quality transcriptome. We present TERRACE, a new algorithm for full-length assembly of circRNAs from paired-end total RNA-seq data. TERRACE uses the splice graph as the underlying data structure that organizes the splicing and coverage information. We transform the problem of assembling circRNAs into finding paths that \"bridge\" the three fragments in the splice graph induced by back-spliced reads. We adopt a definition for optimal bridging paths and a dynamic programming algorithm to calculate such optimal paths. TERRACE features an efficient algorithm to detect back-spliced reads missed by RNA-seq aligners, contributing to its much improved sensitivity. It also incorporates a new machine-learning approach trained to assign a confidence score to each assembled circRNA, which is shown superior to using abundance for scoring. On both simulations and biological datasets TERRACE consistently outperforms existing methods by a large margin in sensitivity while maintaining better or comparable precision. In particular, when the annotations are not provided, TERRACE assembles 123%-413% more correct circRNAs than state-of-the-art methods. TERRACE presents a major leap on assembling full-length circRNAs from RNA-seq data, and we expect it to be widely used in the downstream research on circRNAs.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"58 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accurate assembly of circular RNAs with TERRACE\",\"authors\":\"Tasfia Zahin, Qian Shi, Xiaofei Carl Zang, Mingfu Shao\",\"doi\":\"10.1101/gr.279106.124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with its 5' and 3' ends covalently bonded. circRNAs are known to be more stable than linear RNAs, admit distinct properties and functions, and have been proven to be promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a high-quality transcriptome. We present TERRACE, a new algorithm for full-length assembly of circRNAs from paired-end total RNA-seq data. TERRACE uses the splice graph as the underlying data structure that organizes the splicing and coverage information. We transform the problem of assembling circRNAs into finding paths that \\\"bridge\\\" the three fragments in the splice graph induced by back-spliced reads. We adopt a definition for optimal bridging paths and a dynamic programming algorithm to calculate such optimal paths. TERRACE features an efficient algorithm to detect back-spliced reads missed by RNA-seq aligners, contributing to its much improved sensitivity. It also incorporates a new machine-learning approach trained to assign a confidence score to each assembled circRNA, which is shown superior to using abundance for scoring. On both simulations and biological datasets TERRACE consistently outperforms existing methods by a large margin in sensitivity while maintaining better or comparable precision. In particular, when the annotations are not provided, TERRACE assembles 123%-413% more correct circRNAs than state-of-the-art methods. TERRACE presents a major leap on assembling full-length circRNAs from RNA-seq data, and we expect it to be widely used in the downstream research on circRNAs.\",\"PeriodicalId\":12678,\"journal\":{\"name\":\"Genome research\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1101/gr.279106.124\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279106.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

环状 RNA（circRNA）是一类 RNA 分子，它的 5' 端和 3' 端以共价键连接，形成一个闭合的环。众所周知，环状 RNA 比线性 RNA 更稳定，具有独特的性质和功能，而且已被证明是一种很有前景的生物标记物。现有的 circRNAs 组装方法严重依赖于已注释的转录组，因此在没有高质量转录组的情况下，其准确性不能令人满意。我们介绍了一种从成对总RNA-seq数据中全长组装circRNA的新算法TERRACE。TERRACE 使用剪接图作为组织剪接和覆盖信息的底层数据结构。我们将组装 circRNA 的问题转化为寻找路径，以 "桥接 "剪接图中由反向剪接读数引起的三个片段。我们采用最优桥接路径的定义和动态编程算法来计算这种最优路径。TERRACE 采用了一种高效算法来检测 RNA-seq 比对器遗漏的反向剪接读数，从而大大提高了灵敏度。它还采用了一种新的机器学习方法，经过训练后可为每个组装的 circRNA 指定一个置信度分数，这比使用丰度进行评分更有优势。在模拟和生物数据集上，TERRACE 的灵敏度一直远远超过现有方法，同时保持了更好或相当的精确度。特别是在不提供注释的情况下，TERRACE 组装出的 circRNA 比最先进的方法多出 123%-413% 的正确率。TERRACE 在从 RNA-seq 数据组装全长 circRNA 方面实现了重大飞跃，我们期待它在 circRNA 下游研究中得到广泛应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accurate assembly of circular RNAs with TERRACE

Circular RNA (circRNA) is a class of RNA molecules that forms a closed loop with its 5' and 3' ends covalently bonded. circRNAs are known to be more stable than linear RNAs, admit distinct properties and functions, and have been proven to be promising biomarkers. Existing methods for assembling circRNAs heavily rely on the annotated transcriptomes, hence exhibiting unsatisfactory accuracy without a high-quality transcriptome. We present TERRACE, a new algorithm for full-length assembly of circRNAs from paired-end total RNA-seq data. TERRACE uses the splice graph as the underlying data structure that organizes the splicing and coverage information. We transform the problem of assembling circRNAs into finding paths that "bridge" the three fragments in the splice graph induced by back-spliced reads. We adopt a definition for optimal bridging paths and a dynamic programming algorithm to calculate such optimal paths. TERRACE features an efficient algorithm to detect back-spliced reads missed by RNA-seq aligners, contributing to its much improved sensitivity. It also incorporates a new machine-learning approach trained to assign a confidence score to each assembled circRNA, which is shown superior to using abundance for scoring. On both simulations and biological datasets TERRACE consistently outperforms existing methods by a large margin in sensitivity while maintaining better or comparable precision. In particular, when the annotations are not provided, TERRACE assembles 123%-413% more correct circRNAs than state-of-the-art methods. TERRACE presents a major leap on assembling full-length circRNAs from RNA-seq data, and we expect it to be widely used in the downstream research on circRNAs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Genome research 生物-生化与分子生物学

CiteScore

12.40

自引率

1.40%

发文量

140

审稿时长

6 months

期刊介绍： Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.