{"title":"Motif: Cloudera Motif DNA查找算法","authors":"Tahani M. Allam","doi":"10.5815/ijitcs.2023.04.02","DOIUrl":null,"url":null,"abstract":"Many studying systems of gene function work depend on the DNA motif. DNA motifs finding generate a lot of trails which make it complex. Regulation of gene expression is identified according to Transcription Factor Binding Sites (TFBSs). There are different algorithms explained, over the past decades, to get an accurate motif tool. The major problems for these algorithms are on the execution time and the memory size which depend on the probabilistic approaches. Our previous algorithm, called EIMF, is recently proposed to overcome these problems by rearranging data. Because cloud computing involves many resources, the challenge of mapping jobs to infinite computing resources is an NP-hard optimization problem. In this paper, we proposed an Impala framework for solving a motif finding algorithms in single and multi-user based on cloud computing. Also, the comparison between Cloud motif and previous EIMF algorithms is performed in three different motif group. The results obtained the Cloudera motif was a considerable finding algorithms in the experimental group that decreased the execution time and the Memory size, when compared with the previous EIMF algorithms. The proposed MOTIFSM algorithm based on the cloud computing decrease the execution time by 70% approximately in MOTIFSM than EIMF framework. Memory size also is decreased in MOTIFSM about 75% than EIMF.","PeriodicalId":130361,"journal":{"name":"International Journal of Information Technology and Computer Science","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MOTIFSM: Cloudera Motif DNA Finding Algorithm\",\"authors\":\"Tahani M. Allam\",\"doi\":\"10.5815/ijitcs.2023.04.02\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many studying systems of gene function work depend on the DNA motif. DNA motifs finding generate a lot of trails which make it complex. Regulation of gene expression is identified according to Transcription Factor Binding Sites (TFBSs). There are different algorithms explained, over the past decades, to get an accurate motif tool. The major problems for these algorithms are on the execution time and the memory size which depend on the probabilistic approaches. Our previous algorithm, called EIMF, is recently proposed to overcome these problems by rearranging data. Because cloud computing involves many resources, the challenge of mapping jobs to infinite computing resources is an NP-hard optimization problem. In this paper, we proposed an Impala framework for solving a motif finding algorithms in single and multi-user based on cloud computing. Also, the comparison between Cloud motif and previous EIMF algorithms is performed in three different motif group. The results obtained the Cloudera motif was a considerable finding algorithms in the experimental group that decreased the execution time and the Memory size, when compared with the previous EIMF algorithms. The proposed MOTIFSM algorithm based on the cloud computing decrease the execution time by 70% approximately in MOTIFSM than EIMF framework. Memory size also is decreased in MOTIFSM about 75% than EIMF.\",\"PeriodicalId\":130361,\"journal\":{\"name\":\"International Journal of Information Technology and Computer Science\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Technology and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijitcs.2023.04.02\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijitcs.2023.04.02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Many studying systems of gene function work depend on the DNA motif. DNA motifs finding generate a lot of trails which make it complex. Regulation of gene expression is identified according to Transcription Factor Binding Sites (TFBSs). There are different algorithms explained, over the past decades, to get an accurate motif tool. The major problems for these algorithms are on the execution time and the memory size which depend on the probabilistic approaches. Our previous algorithm, called EIMF, is recently proposed to overcome these problems by rearranging data. Because cloud computing involves many resources, the challenge of mapping jobs to infinite computing resources is an NP-hard optimization problem. In this paper, we proposed an Impala framework for solving a motif finding algorithms in single and multi-user based on cloud computing. Also, the comparison between Cloud motif and previous EIMF algorithms is performed in three different motif group. The results obtained the Cloudera motif was a considerable finding algorithms in the experimental group that decreased the execution time and the Memory size, when compared with the previous EIMF algorithms. The proposed MOTIFSM algorithm based on the cloud computing decrease the execution time by 70% approximately in MOTIFSM than EIMF framework. Memory size also is decreased in MOTIFSM about 75% than EIMF.