Computing highly specific and mismatch tolerant oligomers efficiently.

Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2003-01-01

Tomoyuki Yamada, Shinichi Morishita

{"title":"Computing highly specific and mismatch tolerant oligomers efficiently.","authors":"Tomoyuki Yamada, Shinichi Morishita","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>The sequencing of the genomes of a variety of species and the growing databases containing expressed sequence tags (ESTs) and complementary DNAs (cDNAs) facilitate the design of highly specific oligomers for use as genomic markers, PCR primers, or DNA oligo microarrays. The first step in evaluating the specificity of short oligomers of about twenty units in length is to determine the frequencies at which the oligomers occur. However, for oligomers longer than about fifty units this is not efficient, as they usually have a frequency of only 1. A more suitable procedure is to consider the mismatch tolerance of an oligomer, that is, the minimum number of mismatches that allows a given oligomer to match a sub-sequence other than the target sequence anywhere in the genome or the EST database. However, calculating the exact value of mismatch tolerance is computationally costly and impractical. Therefore, we studied the problem of checking whether an oligomer meets the constraint that its mismatch tolerance is no less than a given threshold. Here, we present an efficient dynamic programming algorithm solution that utilizes suffix and height arrays. We demonstrated the effectiveness of this algorithm by efficiently computing a dense list of oligo-markers applicable to the human genome. Experimental results show that the algorithm runs faster than well-known Abrahamson's algorithm by orders of magnitude and is able to enumerate 63% to approximately 79% of qualified oligomers.</p>","PeriodicalId":87204,"journal":{"name":"Proceedings. IEEE Computer Society Bioinformatics Conference","volume":"2 ","pages":"316-25"},"PeriodicalIF":0.0000,"publicationDate":"2003-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Bioinformatics Conference","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The sequencing of the genomes of a variety of species and the growing databases containing expressed sequence tags (ESTs) and complementary DNAs (cDNAs) facilitate the design of highly specific oligomers for use as genomic markers, PCR primers, or DNA oligo microarrays. The first step in evaluating the specificity of short oligomers of about twenty units in length is to determine the frequencies at which the oligomers occur. However, for oligomers longer than about fifty units this is not efficient, as they usually have a frequency of only 1. A more suitable procedure is to consider the mismatch tolerance of an oligomer, that is, the minimum number of mismatches that allows a given oligomer to match a sub-sequence other than the target sequence anywhere in the genome or the EST database. However, calculating the exact value of mismatch tolerance is computationally costly and impractical. Therefore, we studied the problem of checking whether an oligomer meets the constraint that its mismatch tolerance is no less than a given threshold. Here, we present an efficient dynamic programming algorithm solution that utilizes suffix and height arrays. We demonstrated the effectiveness of this algorithm by efficiently computing a dense list of oligo-markers applicable to the human genome. Experimental results show that the algorithm runs faster than well-known Abrahamson's algorithm by orders of magnitude and is able to enumerate 63% to approximately 79% of qualified oligomers.

本刊更多论文

高效计算高特异性和错配容忍低聚物。

多种物种的基因组测序和包含表达序列标签(est)和互补DNA (cdna)的不断增长的数据库有助于设计高度特异性的低聚物，用于基因组标记、PCR引物或DNA低聚物微阵列。评估长度约为20个单位的短低聚物特异性的第一步是确定低聚物发生的频率。然而，对于长度超过约50个单位的低聚物，这是不有效的，因为它们通常只有1的频率。更合适的方法是考虑低聚物的错配容忍度，即允许给定低聚物与基因组或EST数据库中任何位置的靶序列以外的子序列匹配的最小错配数。然而，计算错配容限的精确值在计算上是昂贵且不切实际的。因此，我们研究了检查低聚物是否满足其错配容忍度不小于给定阈值的约束问题。在这里，我们提出了一个有效的动态规划算法解决方案，利用后缀和高度数组。我们通过高效地计算适用于人类基因组的密集低聚标记列表来证明该算法的有效性。实验结果表明，该算法的运行速度比已知的Abrahamson算法快几个数量级，能够枚举63%至79%的合格低聚物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computer Society Bioinformatics Conference

自引率

0.00%

发文量