{"title":"Improving the Flexibility of RNA-Seq Data Analysis Pipelines.","authors":"John H Phan, Po-Yen Wu, May D Wang","doi":"10.1109/GENSIPS.2012.6507729","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate quantification of gene or isoform expression with RNA-Seq depends on complete knowledge of the transcriptome. Because a complete genomic annotation does not yet exist, novel isoform discovery is an important component of the RNA-Seq quantification process. Thus, a typical RNA-Seq pipeline includes a transcriptome mapping step to quantify known genes and isoforms, and a reference genome mapping step to discover new genes and isoforms. Several tools implement this approach, but are limited in that they force the use of a single mapping algorithm at both the transcriptome and reference genome mapping stages. The choice of mapping algorithm could affect quantification accuracy on a per-dataset basis. Thus, we describe a method that enables the merging of transcriptome and reference genome mapping stages provided that they conform to the standard SAM/BAM format. This procedure could potentially improve the accuracy of gene or isoform quantification by increasing flexibility when selecting RNA-Seq data analysis pipelines. We demonstrate an example of a flexible RNA-Seq pipeline by assessing its potential for novel isoform discovery and by validating its quantification performance using qRT-PCR.</p>","PeriodicalId":73289,"journal":{"name":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/GENSIPS.2012.6507729","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Workshop on Genomic Signal Processing and Statistics : [proceedings]. IEEE International Workshop on Genomic Signal Processing and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2012.6507729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Accurate quantification of gene or isoform expression with RNA-Seq depends on complete knowledge of the transcriptome. Because a complete genomic annotation does not yet exist, novel isoform discovery is an important component of the RNA-Seq quantification process. Thus, a typical RNA-Seq pipeline includes a transcriptome mapping step to quantify known genes and isoforms, and a reference genome mapping step to discover new genes and isoforms. Several tools implement this approach, but are limited in that they force the use of a single mapping algorithm at both the transcriptome and reference genome mapping stages. The choice of mapping algorithm could affect quantification accuracy on a per-dataset basis. Thus, we describe a method that enables the merging of transcriptome and reference genome mapping stages provided that they conform to the standard SAM/BAM format. This procedure could potentially improve the accuracy of gene or isoform quantification by increasing flexibility when selecting RNA-Seq data analysis pipelines. We demonstrate an example of a flexible RNA-Seq pipeline by assessing its potential for novel isoform discovery and by validating its quantification performance using qRT-PCR.