{"title":"SNAPR:高效和准确的RNA-Seq比对和分析的生物信息学管道","authors":"Andrew T. Magis;Cory C. Funk;Nathan D. Price","doi":"10.1109/LLS.2015.2465870","DOIUrl":null,"url":null,"abstract":"The process of converting raw RNA sequencing (RNA-seq) data to interpretable results can be circuitous and time-consuming, requiring multiple steps. We present an RNA-seq mapping algorithm that streamlines this process. Our algorithm utilizes a hash table approach to leverage the availability and the power of high memory machines. SNAPR, which can be run on a single library or thousands of libraries, can take compressed or uncompressed FASTQ and BAM files, and output a sorted BAM file, individual read counts, and gene fusions, and can identify exogenous RNA species in a single step. SNAPR also does native Phred score filtering of reads. SNAPR is also well suited for future sequencing platforms that generate longer reads. We show how we can analyze data from hundreds of TCGA samples in a matter of hours while identifying gene fusions and viral events at the same time. With the reference genome and transcriptome undergoing periodic updates and the need for uniform parameters when integrating multiple data sets, there is great need for a streamlined process for RNA-seq analysis. We demonstrate how SNAPR does this efficiently and accurately.","PeriodicalId":87271,"journal":{"name":"IEEE life sciences letters","volume":"1 2","pages":"22-25"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/LLS.2015.2465870","citationCount":"11","resultStr":"{\"title\":\"SNAPR: A Bioinformatics Pipeline for Efficient and Accurate RNA-Seq Alignment and Analysis\",\"authors\":\"Andrew T. Magis;Cory C. Funk;Nathan D. Price\",\"doi\":\"10.1109/LLS.2015.2465870\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The process of converting raw RNA sequencing (RNA-seq) data to interpretable results can be circuitous and time-consuming, requiring multiple steps. We present an RNA-seq mapping algorithm that streamlines this process. Our algorithm utilizes a hash table approach to leverage the availability and the power of high memory machines. SNAPR, which can be run on a single library or thousands of libraries, can take compressed or uncompressed FASTQ and BAM files, and output a sorted BAM file, individual read counts, and gene fusions, and can identify exogenous RNA species in a single step. SNAPR also does native Phred score filtering of reads. SNAPR is also well suited for future sequencing platforms that generate longer reads. We show how we can analyze data from hundreds of TCGA samples in a matter of hours while identifying gene fusions and viral events at the same time. With the reference genome and transcriptome undergoing periodic updates and the need for uniform parameters when integrating multiple data sets, there is great need for a streamlined process for RNA-seq analysis. We demonstrate how SNAPR does this efficiently and accurately.\",\"PeriodicalId\":87271,\"journal\":{\"name\":\"IEEE life sciences letters\",\"volume\":\"1 2\",\"pages\":\"22-25\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/LLS.2015.2465870\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE life sciences letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/7229277/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE life sciences letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/7229277/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SNAPR: A Bioinformatics Pipeline for Efficient and Accurate RNA-Seq Alignment and Analysis
The process of converting raw RNA sequencing (RNA-seq) data to interpretable results can be circuitous and time-consuming, requiring multiple steps. We present an RNA-seq mapping algorithm that streamlines this process. Our algorithm utilizes a hash table approach to leverage the availability and the power of high memory machines. SNAPR, which can be run on a single library or thousands of libraries, can take compressed or uncompressed FASTQ and BAM files, and output a sorted BAM file, individual read counts, and gene fusions, and can identify exogenous RNA species in a single step. SNAPR also does native Phred score filtering of reads. SNAPR is also well suited for future sequencing platforms that generate longer reads. We show how we can analyze data from hundreds of TCGA samples in a matter of hours while identifying gene fusions and viral events at the same time. With the reference genome and transcriptome undergoing periodic updates and the need for uniform parameters when integrating multiple data sets, there is great need for a streamlined process for RNA-seq analysis. We demonstrate how SNAPR does this efficiently and accurately.