TIGERA: A New Tool for Illumina Gene Expression Reads Analysis

Xiaodong Bai, P. Grewal
{"title":"TIGERA: A New Tool for Illumina Gene Expression Reads Analysis","authors":"Xiaodong Bai, P. Grewal","doi":"10.1109/OCCBIO.2009.14","DOIUrl":null,"url":null,"abstract":"Next-generation sequencing platforms, including Illumina, 454, and SOLiD are emerging as easier, faster, and cheaper alternatives to traditional sequencing platforms. Illumina digital gene expression (DGE) tag profiling allows comprehensive analysis of differentially expressed genes in organisms. Computer programs are necessary to handle the overwhelming amount of data generated by the Illumina Genome Analyzer. Here we report the design and implementation of a program for the analysis of differential gene expression based on Illumina data. The program TIGERA (Tool for Illumina Gene Expression Reads Analysis) was written in perl utilizing newly-implemented and preexisting algorithms with a simple graphical user interface. The program performs the following tasks automatically after the required inputs are provided. The expression levels of high-quality Illumina tags for each of the two groups of libraries are determined and normalized as transcript per million (TPM). The Illumina tags are mapped to the annotated reference sequences to identify uniquely mapped tags. The mapping results are validated using information generated by digital restriction enzyme digestion of the reference sequences. Based on whether the tags matched to unique or multiple reference sequences after validation, the tags are grouped in three categories: one tag-one reference, one tag-one gene, and one tag-multiple genes. The tags within the first two categories are analyzed further to determine the reference sequences that contain unique expression levels or have potential alternative transcript splicing products. A Poisson mixture model is applied to analyze the differential expression of reference sequences with unique expression levels and the tags not being matched to the reference sequences. The progress of the analysis is monitored and reported. The analysis results are presented as text files and also deposited in a MySQL database that can be visualized and searched in Internet browsers. Two biological replicates of the DGE tag libraries of the infective juveniles of the entomopathogenic nematode Heterorhabditis bacteriophora TT01 and GPS11 strains were sequenced using Illumina platform to demonstrate the performance of the program.","PeriodicalId":231499,"journal":{"name":"2009 Ohio Collaborative Conference on Bioinformatics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Ohio Collaborative Conference on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCCBIO.2009.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Next-generation sequencing platforms, including Illumina, 454, and SOLiD are emerging as easier, faster, and cheaper alternatives to traditional sequencing platforms. Illumina digital gene expression (DGE) tag profiling allows comprehensive analysis of differentially expressed genes in organisms. Computer programs are necessary to handle the overwhelming amount of data generated by the Illumina Genome Analyzer. Here we report the design and implementation of a program for the analysis of differential gene expression based on Illumina data. The program TIGERA (Tool for Illumina Gene Expression Reads Analysis) was written in perl utilizing newly-implemented and preexisting algorithms with a simple graphical user interface. The program performs the following tasks automatically after the required inputs are provided. The expression levels of high-quality Illumina tags for each of the two groups of libraries are determined and normalized as transcript per million (TPM). The Illumina tags are mapped to the annotated reference sequences to identify uniquely mapped tags. The mapping results are validated using information generated by digital restriction enzyme digestion of the reference sequences. Based on whether the tags matched to unique or multiple reference sequences after validation, the tags are grouped in three categories: one tag-one reference, one tag-one gene, and one tag-multiple genes. The tags within the first two categories are analyzed further to determine the reference sequences that contain unique expression levels or have potential alternative transcript splicing products. A Poisson mixture model is applied to analyze the differential expression of reference sequences with unique expression levels and the tags not being matched to the reference sequences. The progress of the analysis is monitored and reported. The analysis results are presented as text files and also deposited in a MySQL database that can be visualized and searched in Internet browsers. Two biological replicates of the DGE tag libraries of the infective juveniles of the entomopathogenic nematode Heterorhabditis bacteriophora TT01 and GPS11 strains were sequenced using Illumina platform to demonstrate the performance of the program.
TIGERA: Illumina基因表达分析的新工具
下一代测序平台,包括Illumina、454和SOLiD,正在成为传统测序平台更简单、更快、更便宜的替代品。Illumina数字基因表达(DGE)标签分析可以全面分析生物体中差异表达的基因。计算机程序是处理由Illumina基因组分析仪产生的大量数据所必需的。在这里,我们报告了基于Illumina数据的差异基因表达分析程序的设计和实现。程序TIGERA (Illumina基因表达Reads分析工具)是用perl编写的,利用新实现的和预先存在的算法,具有简单的图形用户界面。在提供所需的输入后,程序自动执行以下任务。确定两组文库中每组高质量Illumina标签的表达水平,并将其归一化为每百万转录本(TPM)。Illumina标签被映射到带注释的参考序列,以识别唯一的映射标签。利用参考序列的数字限制性内切酶酶切产生的信息验证了定位结果。根据验证后的标签是否与唯一或多个参考序列匹配,将标签分为3类:一个标签-一个参考序列、一个标签-一个基因和一个标签-多个基因。进一步分析前两类中的标签,以确定包含独特表达水平或具有潜在替代转录物剪接产物的参考序列。应用泊松混合模型分析了具有唯一表达水平的参考序列和与参考序列不匹配的标签的差异表达。对分析的进度进行监测和报告。分析结果以文本文件的形式呈现,也存储在MySQL数据库中,可以在Internet浏览器中进行可视化和搜索。利用Illumina平台对昆虫病原线虫Heterorhabditis bacteriophora TT01和GPS11株感染性幼虫的DGE标记文库进行了2个生物重复测序,以验证程序的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信