TIGERA: A New Tool for Illumina Gene Expression Reads Analysis

2009 Ohio Collaborative Conference on Bioinformatics Pub Date : 2009-06-15 DOI:10.1109/OCCBIO.2009.14

Xiaodong Bai, P. Grewal

{"title":"TIGERA: A New Tool for Illumina Gene Expression Reads Analysis","authors":"Xiaodong Bai, P. Grewal","doi":"10.1109/OCCBIO.2009.14","DOIUrl":null,"url":null,"abstract":"Next-generation sequencing platforms, including Illumina, 454, and SOLiD are emerging as easier, faster, and cheaper alternatives to traditional sequencing platforms. Illumina digital gene expression (DGE) tag profiling allows comprehensive analysis of differentially expressed genes in organisms. Computer programs are necessary to handle the overwhelming amount of data generated by the Illumina Genome Analyzer. Here we report the design and implementation of a program for the analysis of differential gene expression based on Illumina data. The program TIGERA (Tool for Illumina Gene Expression Reads Analysis) was written in perl utilizing newly-implemented and preexisting algorithms with a simple graphical user interface. The program performs the following tasks automatically after the required inputs are provided. The expression levels of high-quality Illumina tags for each of the two groups of libraries are determined and normalized as transcript per million (TPM). The Illumina tags are mapped to the annotated reference sequences to identify uniquely mapped tags. The mapping results are validated using information generated by digital restriction enzyme digestion of the reference sequences. Based on whether the tags matched to unique or multiple reference sequences after validation, the tags are grouped in three categories: one tag-one reference, one tag-one gene, and one tag-multiple genes. The tags within the first two categories are analyzed further to determine the reference sequences that contain unique expression levels or have potential alternative transcript splicing products. A Poisson mixture model is applied to analyze the differential expression of reference sequences with unique expression levels and the tags not being matched to the reference sequences. The progress of the analysis is monitored and reported. The analysis results are presented as text files and also deposited in a MySQL database that can be visualized and searched in Internet browsers. Two biological replicates of the DGE tag libraries of the infective juveniles of the entomopathogenic nematode Heterorhabditis bacteriophora TT01 and GPS11 strains were sequenced using Illumina platform to demonstrate the performance of the program.","PeriodicalId":231499,"journal":{"name":"2009 Ohio Collaborative Conference on Bioinformatics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Ohio Collaborative Conference on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCCBIO.2009.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Next-generation sequencing platforms, including Illumina, 454, and SOLiD are emerging as easier, faster, and cheaper alternatives to traditional sequencing platforms. Illumina digital gene expression (DGE) tag profiling allows comprehensive analysis of differentially expressed genes in organisms. Computer programs are necessary to handle the overwhelming amount of data generated by the Illumina Genome Analyzer. Here we report the design and implementation of a program for the analysis of differential gene expression based on Illumina data. The program TIGERA (Tool for Illumina Gene Expression Reads Analysis) was written in perl utilizing newly-implemented and preexisting algorithms with a simple graphical user interface. The program performs the following tasks automatically after the required inputs are provided. The expression levels of high-quality Illumina tags for each of the two groups of libraries are determined and normalized as transcript per million (TPM). The Illumina tags are mapped to the annotated reference sequences to identify uniquely mapped tags. The mapping results are validated using information generated by digital restriction enzyme digestion of the reference sequences. Based on whether the tags matched to unique or multiple reference sequences after validation, the tags are grouped in three categories: one tag-one reference, one tag-one gene, and one tag-multiple genes. The tags within the first two categories are analyzed further to determine the reference sequences that contain unique expression levels or have potential alternative transcript splicing products. A Poisson mixture model is applied to analyze the differential expression of reference sequences with unique expression levels and the tags not being matched to the reference sequences. The progress of the analysis is monitored and reported. The analysis results are presented as text files and also deposited in a MySQL database that can be visualized and searched in Internet browsers. Two biological replicates of the DGE tag libraries of the infective juveniles of the entomopathogenic nematode Heterorhabditis bacteriophora TT01 and GPS11 strains were sequenced using Illumina platform to demonstrate the performance of the program.

查看原文本刊更多论文

TIGERA: Illumina基因表达分析的新工具

下一代测序平台，包括Illumina、454和SOLiD，正在成为传统测序平台更简单、更快、更便宜的替代品。Illumina数字基因表达(DGE)标签分析可以全面分析生物体中差异表达的基因。计算机程序是处理由Illumina基因组分析仪产生的大量数据所必需的。在这里，我们报告了基于Illumina数据的差异基因表达分析程序的设计和实现。程序TIGERA (Illumina基因表达Reads分析工具)是用perl编写的，利用新实现的和预先存在的算法，具有简单的图形用户界面。在提供所需的输入后，程序自动执行以下任务。确定两组文库中每组高质量Illumina标签的表达水平，并将其归一化为每百万转录本(TPM)。Illumina标签被映射到带注释的参考序列，以识别唯一的映射标签。利用参考序列的数字限制性内切酶酶切产生的信息验证了定位结果。根据验证后的标签是否与唯一或多个参考序列匹配，将标签分为3类:一个标签-一个参考序列、一个标签-一个基因和一个标签-多个基因。进一步分析前两类中的标签，以确定包含独特表达水平或具有潜在替代转录物剪接产物的参考序列。应用泊松混合模型分析了具有唯一表达水平的参考序列和与参考序列不匹配的标签的差异表达。对分析的进度进行监测和报告。分析结果以文本文件的形式呈现，也存储在MySQL数据库中，可以在Internet浏览器中进行可视化和搜索。利用Illumina平台对昆虫病原线虫Heterorhabditis bacteriophora TT01和GPS11株感染性幼虫的DGE标记文库进行了2个生物重复测序，以验证程序的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 Ohio Collaborative Conference on Bioinformatics

自引率

0.00%

发文量