{"title":"An algorithm to reconstruct a target DNA sequence from its spectrum connected at a given level","authors":"Fang-Xiang Wu, W. Zhang, A. Kusalik","doi":"10.1109/BIBE.2003.1188947","DOIUrl":null,"url":null,"abstract":"In order to sequence a target DNA, it is first cleaved into many shorter overlapping fragments by chemical or physical techniques. The nucleotide sequence of each fragment is then determined (read) by established methods. The set of all read fragments which cover the target DNA sequence is called its spectrum. It is believed that the shortest superstring of a spectrum is the best candidate for the target DNA sequence. The general problem of finding the shortest superstring for any given set of strings s is NP-hard. Fortunately, the biological instance of this problem is easier. It is not likely that two read fragments, each consisting of several hundred letters, which come from consecutive locations on the target DNA sequence have an overlap of only a few letters; typically, the overlap will be longer. Thus one may reasonably assume that two strings in the spectrum have significant overlap (connectivity) if they come from consecutive locations on the target DNA sequence. A class of important instances satisfying this assumption are those whose spectra are from DNA microarrays. This assumption allows us to claim and show the following: if the spectrum S of a target DNA sequence is substring-free and connected at level t, and the target DNA sequence has no repeats of size t or larger, then there exists an algorithm to reconstruct the target DNA sequence in the linear time O(|S|) after an overlap graph of the spectrum is built.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2003.1188947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In order to sequence a target DNA, it is first cleaved into many shorter overlapping fragments by chemical or physical techniques. The nucleotide sequence of each fragment is then determined (read) by established methods. The set of all read fragments which cover the target DNA sequence is called its spectrum. It is believed that the shortest superstring of a spectrum is the best candidate for the target DNA sequence. The general problem of finding the shortest superstring for any given set of strings s is NP-hard. Fortunately, the biological instance of this problem is easier. It is not likely that two read fragments, each consisting of several hundred letters, which come from consecutive locations on the target DNA sequence have an overlap of only a few letters; typically, the overlap will be longer. Thus one may reasonably assume that two strings in the spectrum have significant overlap (connectivity) if they come from consecutive locations on the target DNA sequence. A class of important instances satisfying this assumption are those whose spectra are from DNA microarrays. This assumption allows us to claim and show the following: if the spectrum S of a target DNA sequence is substring-free and connected at level t, and the target DNA sequence has no repeats of size t or larger, then there exists an algorithm to reconstruct the target DNA sequence in the linear time O(|S|) after an overlap graph of the spectrum is built.