{"title":"共识串匹配在转位突变等位基因异质性诊断中的应用","authors":"F. Zohora, Mohammad Sohel Rahman","doi":"10.1504/IJDMB.2015.072756","DOIUrl":null,"url":null,"abstract":"In this paper, an algorithm is proposed that detects the existence of a common ancestor gene sequence for non-overlapping transposition metric given two input DNA sequences. We consider two cases: fixed length transposition and all length transposition. For the first one, the algorithm has the time complexity of O(n3), where n is the length of input sequences. In case of all length transposition, theoretical worst case time complexity of the algorithm is proven to be O(n4). However, practically the worst case and the average case time complexity for all length transposition are found to be O(n3) and O(n2) respectively. This work is motivated by the purpose of diagnosing unknown genetic disease that shows allelic heterogeneity, a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. The algorithm can be useful as well in the study of breed-related hereditary to determine the genetic spread of a defective gene in the population.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"13 4 1","pages":"360-77"},"PeriodicalIF":0.2000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072756","citationCount":"1","resultStr":"{\"title\":\"Application of consensus string matching in the diagnosis of allelic heterogeneity involving transposition mutation\",\"authors\":\"F. Zohora, Mohammad Sohel Rahman\",\"doi\":\"10.1504/IJDMB.2015.072756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, an algorithm is proposed that detects the existence of a common ancestor gene sequence for non-overlapping transposition metric given two input DNA sequences. We consider two cases: fixed length transposition and all length transposition. For the first one, the algorithm has the time complexity of O(n3), where n is the length of input sequences. In case of all length transposition, theoretical worst case time complexity of the algorithm is proven to be O(n4). However, practically the worst case and the average case time complexity for all length transposition are found to be O(n3) and O(n2) respectively. This work is motivated by the purpose of diagnosing unknown genetic disease that shows allelic heterogeneity, a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. The algorithm can be useful as well in the study of breed-related hereditary to determine the genetic spread of a defective gene in the population.\",\"PeriodicalId\":54964,\"journal\":{\"name\":\"International Journal of Data Mining and Bioinformatics\",\"volume\":\"13 4 1\",\"pages\":\"360-77\"},\"PeriodicalIF\":0.2000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/IJDMB.2015.072756\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Mining and Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1504/IJDMB.2015.072756\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1504/IJDMB.2015.072756","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Application of consensus string matching in the diagnosis of allelic heterogeneity involving transposition mutation
In this paper, an algorithm is proposed that detects the existence of a common ancestor gene sequence for non-overlapping transposition metric given two input DNA sequences. We consider two cases: fixed length transposition and all length transposition. For the first one, the algorithm has the time complexity of O(n3), where n is the length of input sequences. In case of all length transposition, theoretical worst case time complexity of the algorithm is proven to be O(n4). However, practically the worst case and the average case time complexity for all length transposition are found to be O(n3) and O(n2) respectively. This work is motivated by the purpose of diagnosing unknown genetic disease that shows allelic heterogeneity, a case where a normal gene mutates in different orders resulting in two different gene sequences causing two different genetic diseases. The algorithm can be useful as well in the study of breed-related hereditary to determine the genetic spread of a defective gene in the population.
期刊介绍:
Mining bioinformatics data is an emerging area at the intersection between bioinformatics and data mining. The objective of IJDMB is to facilitate collaboration between data mining researchers and bioinformaticians by presenting cutting edge research topics and methodologies in the area of data mining for bioinformatics. This perspective acknowledges the inter-disciplinary nature of research in data mining and bioinformatics and provides a unified forum for researchers/practitioners/students/policy makers to share the latest research and developments in this fast growing multi-disciplinary research area.