{"title":"在levenshstein测度下寻找高阶基元","authors":"E. Adebiyi, Tinuke Dipe","doi":"10.1109/CSB.2003.1227414","DOIUrl":null,"url":null,"abstract":"We study the problem of finding higher order motifs under the levenshtein measure, otherwise known as the edit distance. In the problem set-up, we are given N sequences, each of average length n, over a finite alphabet /spl Sigma/ and thresholds D and q, we are to find composite motifs that contain motifs of length P (these motifs occur with almost D differences) in 1 /spl les/ q /spl les/ N distinct sequences. Two interesting but involved algorithms for finding higher order motifs under the edit distance was presented by Marsan and Sagot. Their second algorithm is much more complicated and its complexity is asymptotically not better. Their first algorithm runs in O(M /spl middot/ N/sup 2/n/sup 1+/spl alpha/ /spl middot/p /spl middot/ pow(/spl epsi/)/) where p /spl ges/ 2, /spl alpha/ > 0, pow(/spl epsi/) is a concave function that is less than 1, /spl epsi/= D/P and M is the expected number of all monad motifs. We present an alternative algorithmic approach also for Edit distance based on the concept described. The resulting algorithm is simpler and runs in O(N/sup 2/n/sup 1+p /spl middot/ pow(/spl epsi/)/) expected time.","PeriodicalId":147883,"journal":{"name":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Finding higher order motifs under the levenshtein measure\",\"authors\":\"E. Adebiyi, Tinuke Dipe\",\"doi\":\"10.1109/CSB.2003.1227414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of finding higher order motifs under the levenshtein measure, otherwise known as the edit distance. In the problem set-up, we are given N sequences, each of average length n, over a finite alphabet /spl Sigma/ and thresholds D and q, we are to find composite motifs that contain motifs of length P (these motifs occur with almost D differences) in 1 /spl les/ q /spl les/ N distinct sequences. Two interesting but involved algorithms for finding higher order motifs under the edit distance was presented by Marsan and Sagot. Their second algorithm is much more complicated and its complexity is asymptotically not better. Their first algorithm runs in O(M /spl middot/ N/sup 2/n/sup 1+/spl alpha/ /spl middot/p /spl middot/ pow(/spl epsi/)/) where p /spl ges/ 2, /spl alpha/ > 0, pow(/spl epsi/) is a concave function that is less than 1, /spl epsi/= D/P and M is the expected number of all monad motifs. We present an alternative algorithmic approach also for Edit distance based on the concept described. The resulting algorithm is simpler and runs in O(N/sup 2/n/sup 1+p /spl middot/ pow(/spl epsi/)/) expected time.\",\"PeriodicalId\":147883,\"journal\":{\"name\":\"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSB.2003.1227414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSB.2003.1227414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Finding higher order motifs under the levenshtein measure
We study the problem of finding higher order motifs under the levenshtein measure, otherwise known as the edit distance. In the problem set-up, we are given N sequences, each of average length n, over a finite alphabet /spl Sigma/ and thresholds D and q, we are to find composite motifs that contain motifs of length P (these motifs occur with almost D differences) in 1 /spl les/ q /spl les/ N distinct sequences. Two interesting but involved algorithms for finding higher order motifs under the edit distance was presented by Marsan and Sagot. Their second algorithm is much more complicated and its complexity is asymptotically not better. Their first algorithm runs in O(M /spl middot/ N/sup 2/n/sup 1+/spl alpha/ /spl middot/p /spl middot/ pow(/spl epsi/)/) where p /spl ges/ 2, /spl alpha/ > 0, pow(/spl epsi/) is a concave function that is less than 1, /spl epsi/= D/P and M is the expected number of all monad motifs. We present an alternative algorithmic approach also for Edit distance based on the concept described. The resulting algorithm is simpler and runs in O(N/sup 2/n/sup 1+p /spl middot/ pow(/spl epsi/)/) expected time.