{"title":"TIGERA: A New Tool for Illumina Gene Expression Reads Analysis","authors":"Xiaodong Bai, P. Grewal","doi":"10.1109/OCCBIO.2009.14","DOIUrl":null,"url":null,"abstract":"Next-generation sequencing platforms, including Illumina, 454, and SOLiD are emerging as easier, faster, and cheaper alternatives to traditional sequencing platforms. Illumina digital gene expression (DGE) tag profiling allows comprehensive analysis of differentially expressed genes in organisms. Computer programs are necessary to handle the overwhelming amount of data generated by the Illumina Genome Analyzer. Here we report the design and implementation of a program for the analysis of differential gene expression based on Illumina data. The program TIGERA (Tool for Illumina Gene Expression Reads Analysis) was written in perl utilizing newly-implemented and preexisting algorithms with a simple graphical user interface. The program performs the following tasks automatically after the required inputs are provided. The expression levels of high-quality Illumina tags for each of the two groups of libraries are determined and normalized as transcript per million (TPM). The Illumina tags are mapped to the annotated reference sequences to identify uniquely mapped tags. The mapping results are validated using information generated by digital restriction enzyme digestion of the reference sequences. Based on whether the tags matched to unique or multiple reference sequences after validation, the tags are grouped in three categories: one tag-one reference, one tag-one gene, and one tag-multiple genes. The tags within the first two categories are analyzed further to determine the reference sequences that contain unique expression levels or have potential alternative transcript splicing products. A Poisson mixture model is applied to analyze the differential expression of reference sequences with unique expression levels and the tags not being matched to the reference sequences. The progress of the analysis is monitored and reported. The analysis results are presented as text files and also deposited in a MySQL database that can be visualized and searched in Internet browsers. Two biological replicates of the DGE tag libraries of the infective juveniles of the entomopathogenic nematode Heterorhabditis bacteriophora TT01 and GPS11 strains were sequenced using Illumina platform to demonstrate the performance of the program.","PeriodicalId":231499,"journal":{"name":"2009 Ohio Collaborative Conference on Bioinformatics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Ohio Collaborative Conference on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCCBIO.2009.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Next-generation sequencing platforms, including Illumina, 454, and SOLiD are emerging as easier, faster, and cheaper alternatives to traditional sequencing platforms. Illumina digital gene expression (DGE) tag profiling allows comprehensive analysis of differentially expressed genes in organisms. Computer programs are necessary to handle the overwhelming amount of data generated by the Illumina Genome Analyzer. Here we report the design and implementation of a program for the analysis of differential gene expression based on Illumina data. The program TIGERA (Tool for Illumina Gene Expression Reads Analysis) was written in perl utilizing newly-implemented and preexisting algorithms with a simple graphical user interface. The program performs the following tasks automatically after the required inputs are provided. The expression levels of high-quality Illumina tags for each of the two groups of libraries are determined and normalized as transcript per million (TPM). The Illumina tags are mapped to the annotated reference sequences to identify uniquely mapped tags. The mapping results are validated using information generated by digital restriction enzyme digestion of the reference sequences. Based on whether the tags matched to unique or multiple reference sequences after validation, the tags are grouped in three categories: one tag-one reference, one tag-one gene, and one tag-multiple genes. The tags within the first two categories are analyzed further to determine the reference sequences that contain unique expression levels or have potential alternative transcript splicing products. A Poisson mixture model is applied to analyze the differential expression of reference sequences with unique expression levels and the tags not being matched to the reference sequences. The progress of the analysis is monitored and reported. The analysis results are presented as text files and also deposited in a MySQL database that can be visualized and searched in Internet browsers. Two biological replicates of the DGE tag libraries of the infective juveniles of the entomopathogenic nematode Heterorhabditis bacteriophora TT01 and GPS11 strains were sequenced using Illumina platform to demonstrate the performance of the program.