Full-length PacBio Amplicon Sequencing to Unveil RNA Editing Sites

IF 2.9 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics Pub Date : 2023-08-03 DOI:10.2174/1574893618666230803112142

Xiao-lu Zhu, Ming-ling Liao, Ya-Jie Zhu, Yun‐wei Dong

{"title":"Full-length PacBio Amplicon Sequencing to Unveil RNA Editing Sites","authors":"Xiao-lu Zhu, Ming-ling Liao, Ya-Jie Zhu, Yun‐wei Dong","doi":"10.2174/1574893618666230803112142","DOIUrl":null,"url":null,"abstract":"\n\nRNA editing enriches post-transcriptional sequence changes. Currently detecting RNA editing sites is mostly based on the Sanger sequencing platform and second-generation sequencing. However, detection with Sanger sequencing is limited by the disturbing background peaks using the direct sequencing method and the clone number using the clone sequencing method, while second-generation sequencing detection is constrained by its short read.\n\n\n\nWe aimed to design a pipeline that can accurately detect RNA editing sites for full-length long-read amplicons to meet the requirement when focusing on a few specific genes of interest.\n\n\n\nWe developed a novel high-throughput RNA editing sites detection pipeline based on the PacBio circular consensus sequences sequencing which is accurate with high-throughput and long-read coverage. We tested the pipeline on cytosolic malate dehydrogenase in the hard-shelled mussel Mytilus coruscus and further validated it using direct Sanger sequencing.\n\n\n\nData generated from the PacBio circular consensus sequences (CCS) amplicons in three mussels were first filtered by quality and then selected by open reading frame. After filtering, 225-2047 sequences of the three mussels, respectively, were used to identify RNA editing sites. With corresponding genomic DNA sequences, we extracted 227-799 candidate RNA editing sites excluding heterozygous sites. We further figured out 7-11 final RESs using a new error model specially designed for RNA editing site detection. The resulting RNA editing sites all agree with the validation using the Sanger sequencing.\n\n\n\nWe report a near-zero error rate method in identifying RNA editing sites of long-read amplicons with the use of PacBio CCS sequencing.\n","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/1574893618666230803112142","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

RNA editing enriches post-transcriptional sequence changes. Currently detecting RNA editing sites is mostly based on the Sanger sequencing platform and second-generation sequencing. However, detection with Sanger sequencing is limited by the disturbing background peaks using the direct sequencing method and the clone number using the clone sequencing method, while second-generation sequencing detection is constrained by its short read. We aimed to design a pipeline that can accurately detect RNA editing sites for full-length long-read amplicons to meet the requirement when focusing on a few specific genes of interest. We developed a novel high-throughput RNA editing sites detection pipeline based on the PacBio circular consensus sequences sequencing which is accurate with high-throughput and long-read coverage. We tested the pipeline on cytosolic malate dehydrogenase in the hard-shelled mussel Mytilus coruscus and further validated it using direct Sanger sequencing. Data generated from the PacBio circular consensus sequences (CCS) amplicons in three mussels were first filtered by quality and then selected by open reading frame. After filtering, 225-2047 sequences of the three mussels, respectively, were used to identify RNA editing sites. With corresponding genomic DNA sequences, we extracted 227-799 candidate RNA editing sites excluding heterozygous sites. We further figured out 7-11 final RESs using a new error model specially designed for RNA editing site detection. The resulting RNA editing sites all agree with the validation using the Sanger sequencing. We report a near-zero error rate method in identifying RNA editing sites of long-read amplicons with the use of PacBio CCS sequencing.

查看原文本刊更多论文

全长PacBio扩增子测序揭示RNA编辑位点

RNA编辑丰富了转录后序列的变化。目前检测RNA编辑位点主要基于Sanger测序平台和第二代测序。然而，Sanger测序的检测受到使用直接测序方法的干扰背景峰和使用克隆测序方法的克隆数量的限制，而第二代测序检测受到其短读数的限制。我们旨在设计一种管道，可以准确检测全长长读扩增子的RNA编辑位点，以满足关注少数感兴趣的特定基因的要求。我们开发了一种基于PacBio循环共有序列测序的新型高通量RNA编辑位点检测流水线，该流水线具有高通量和长读覆盖率。我们在硬壳贻贝Mytilus coruscus中测试了细胞溶质苹果酸脱氢酶，并使用直接Sanger测序进一步验证了这一点。从三种贻贝中的PacBio循环共有序列（CCS）扩增子产生的数据首先通过质量过滤，然后通过开放阅读框进行选择。过滤后，分别使用三种贻贝的225-2047个序列来鉴定RNA编辑位点。利用相应的基因组DNA序列，我们提取了227-799个候选RNA编辑位点，不包括杂合位点。我们使用专门为RNA编辑位点检测设计的新误差模型进一步计算出7-11个最终RES。得到的RNA编辑位点都与使用Sanger测序的验证一致。我们报道了一种使用PacBio-CCS测序识别长读扩增子RNA编辑位点的接近零错误率方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.