Comparative Assessment of DNA Assemblers for Assembling Expressed Sequence Tags

2009 Ohio Collaborative Conference on Bioinformatics Pub Date : 2009-06-15 DOI:10.1109/OCCBIO.2009.19

X. Min, G. Butler, R. Storms, A. Tsang

{"title":"Comparative Assessment of DNA Assemblers for Assembling Expressed Sequence Tags","authors":"X. Min, G. Butler, R. Storms, A. Tsang","doi":"10.1109/OCCBIO.2009.19","DOIUrl":null,"url":null,"abstract":"Assembling expressed sequence tags (ESTs) is essential for removing redundancy and generating long virtual transcripts for EST annotation and gene finding. A number of assemblers are available, but there is a lack of detailed comparative assessment of the strength and weakness of these assemblers. We compared three assemblers including Phrap, CAP3 and TIGR Assembler (TA) using Aspergillus niger and Phanerochaete chrysosporium EST data. Phrap assembled more ESTs into contigs than TA and CAP3. Among the contigs and singletons generated by the three assemblers, 67 – 90% of them were identical. The number of contigs and singletons assembled by Phrap provides an estimate of the maximum number of unique genes represented in the dataset, while the numbers generated by TA and CAP3 provide an approximate estimate of unique transcripts since both TA and CAP are more discriminating to alternatively spliced transcripts. The error rate in contigs generated by Phrap was slightly higher than contigs generated by TA or CAP3. Phrap is thus recommended for EST assembling aiming at generating a set of unisequences with minimum redundancy for estimating the unigene number, and TA or CAP3 are used for assembling ESTs aiming at finding unique transcripts, i. e., for identification of alternative splicing.","PeriodicalId":231499,"journal":{"name":"2009 Ohio Collaborative Conference on Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Ohio Collaborative Conference on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OCCBIO.2009.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Assembling expressed sequence tags (ESTs) is essential for removing redundancy and generating long virtual transcripts for EST annotation and gene finding. A number of assemblers are available, but there is a lack of detailed comparative assessment of the strength and weakness of these assemblers. We compared three assemblers including Phrap, CAP3 and TIGR Assembler (TA) using Aspergillus niger and Phanerochaete chrysosporium EST data. Phrap assembled more ESTs into contigs than TA and CAP3. Among the contigs and singletons generated by the three assemblers, 67 – 90% of them were identical. The number of contigs and singletons assembled by Phrap provides an estimate of the maximum number of unique genes represented in the dataset, while the numbers generated by TA and CAP3 provide an approximate estimate of unique transcripts since both TA and CAP are more discriminating to alternatively spliced transcripts. The error rate in contigs generated by Phrap was slightly higher than contigs generated by TA or CAP3. Phrap is thus recommended for EST assembling aiming at generating a set of unisequences with minimum redundancy for estimating the unigene number, and TA or CAP3 are used for assembling ESTs aiming at finding unique transcripts, i. e., for identification of alternative splicing.

查看原文本刊更多论文

表达序列标签的DNA组装器的比较评估

组装表达序列标签(EST)是消除冗余和生成长虚拟转录本用于EST注释和基因发现的必要条件。有许多汇编程序可用，但缺乏对这些汇编程序的优缺点的详细比较评估。利用黑曲霉和黄孢平革菌EST数据，比较了Phrap、CAP3和TIGR汇编器(TA)三种汇编器。与TA和CAP3相比，Phrap将更多的est组装成contigs。在三种汇编器生成的组合和单例中，67 ~ 90%的组合和单例是相同的。Phrap组装的contigs和singlons的数量提供了数据集中所代表的唯一基因的最大数量的估计，而TA和CAP3生成的数量提供了唯一转录本的大致估计，因为TA和CAP对选择性剪接的转录本更具歧视性。Phrap生成的contigs错误率略高于TA或CAP3生成的contigs。因此，建议使用Phrap进行EST组装，目的是生成一组冗余最小的序列，用于估计单基因数，而使用TA或CAP3进行EST组装，目的是寻找唯一的转录本，即识别备选剪接。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 Ohio Collaborative Conference on Bioinformatics

自引率

0.00%

发文量