An algorithm to reconstruct a target DNA sequence from its spectrum connected at a given level

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings. Pub Date : 2003-03-10 DOI:10.1109/BIBE.2003.1188947

Fang-Xiang Wu, W. Zhang, A. Kusalik

{"title":"An algorithm to reconstruct a target DNA sequence from its spectrum connected at a given level","authors":"Fang-Xiang Wu, W. Zhang, A. Kusalik","doi":"10.1109/BIBE.2003.1188947","DOIUrl":null,"url":null,"abstract":"In order to sequence a target DNA, it is first cleaved into many shorter overlapping fragments by chemical or physical techniques. The nucleotide sequence of each fragment is then determined (read) by established methods. The set of all read fragments which cover the target DNA sequence is called its spectrum. It is believed that the shortest superstring of a spectrum is the best candidate for the target DNA sequence. The general problem of finding the shortest superstring for any given set of strings s is NP-hard. Fortunately, the biological instance of this problem is easier. It is not likely that two read fragments, each consisting of several hundred letters, which come from consecutive locations on the target DNA sequence have an overlap of only a few letters; typically, the overlap will be longer. Thus one may reasonably assume that two strings in the spectrum have significant overlap (connectivity) if they come from consecutive locations on the target DNA sequence. A class of important instances satisfying this assumption are those whose spectra are from DNA microarrays. This assumption allows us to claim and show the following: if the spectrum S of a target DNA sequence is substring-free and connected at level t, and the target DNA sequence has no repeats of size t or larger, then there exists an algorithm to reconstruct the target DNA sequence in the linear time O(|S|) after an overlap graph of the spectrum is built.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2003.1188947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In order to sequence a target DNA, it is first cleaved into many shorter overlapping fragments by chemical or physical techniques. The nucleotide sequence of each fragment is then determined (read) by established methods. The set of all read fragments which cover the target DNA sequence is called its spectrum. It is believed that the shortest superstring of a spectrum is the best candidate for the target DNA sequence. The general problem of finding the shortest superstring for any given set of strings s is NP-hard. Fortunately, the biological instance of this problem is easier. It is not likely that two read fragments, each consisting of several hundred letters, which come from consecutive locations on the target DNA sequence have an overlap of only a few letters; typically, the overlap will be longer. Thus one may reasonably assume that two strings in the spectrum have significant overlap (connectivity) if they come from consecutive locations on the target DNA sequence. A class of important instances satisfying this assumption are those whose spectra are from DNA microarrays. This assumption allows us to claim and show the following: if the spectrum S of a target DNA sequence is substring-free and connected at level t, and the target DNA sequence has no repeats of size t or larger, then there exists an algorithm to reconstruct the target DNA sequence in the linear time O(|S|) after an overlap graph of the spectrum is built.

查看原文本刊更多论文

一种从在给定水平上连接的谱中重建目标DNA序列的算法

为了对目标DNA进行测序，首先通过化学或物理技术将其切割成许多较短的重叠片段。每个片段的核苷酸序列然后通过既定的方法确定(读取)。覆盖目标DNA序列的所有可读片段的集合称为其谱。人们认为谱中最短的超弦是目标DNA序列的最佳候选。对于任意给定的一组弦s，寻找最短超弦的一般问题是np困难的。幸运的是，这个问题的生物学实例更容易。来自目标DNA序列上连续位置的两个可读片段(每个片段由几百个字母组成)不太可能只有几个字母重叠;通常，重叠的时间会更长。因此，人们可以合理地假设，如果光谱中的两个字符串来自目标DNA序列上的连续位置，则它们具有显著的重叠(连通性)。满足这一假设的一类重要实例是那些光谱来自DNA微阵列的实例。这个假设使我们可以声明并证明:如果目标DNA序列的谱S是无子串的，并且在t层连通，并且目标DNA序列没有大小为t或更大的重复序列，则在建立谱的重叠图后，存在一种算法在线性时间O(|S|)内重构目标DNA序列。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.

自引率

0.00%

发文量