{"title":"基于聚类的DNA序列De Novo Motif发现算法","authors":"Mohammad Haghir Ebrahim-Abadi, E. Fatemizadeh","doi":"10.1109/ICBME.2017.8430242","DOIUrl":null,"url":null,"abstract":"Motif discovery is a challenging problem in molecular biology and has been attracting researcher's attention for years. Different kind of data and computational methods have been used to unravel this problem, but there is still room for improvement. In this study, our goal was to develop a method with the ability to identify all the TFBS signals, including known and unknown, inside the input set of sequences. We developed a clustering method specialized as part of our algorithm which outperforms other existing clustering methods such as DNACLUST and CD-HIT-EST in clustering short sequences. A scoring system was needed to determine how much a cluster is close to being a real motif. Multiple features are calculated based on the contents of each cluster to determine the score of the cluster. These features contain a set of divergence measures, positional, and occurrence information. These scores are combined in a way that a trade-off between them determines the clusters situation. There is an option to compare the final results with the motif databases such as Jolma2013, and UniProbe using Tomtom motif comparison tool. Algorithm Evaluation has been performed on three datasets from ABS database.","PeriodicalId":116204,"journal":{"name":"2017 24th National and 2nd International Iranian Conference on Biomedical Engineering (ICBME)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Clustering-Based Algorithm for De Novo Motif Discovery in DNA Sequences\",\"authors\":\"Mohammad Haghir Ebrahim-Abadi, E. Fatemizadeh\",\"doi\":\"10.1109/ICBME.2017.8430242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motif discovery is a challenging problem in molecular biology and has been attracting researcher's attention for years. Different kind of data and computational methods have been used to unravel this problem, but there is still room for improvement. In this study, our goal was to develop a method with the ability to identify all the TFBS signals, including known and unknown, inside the input set of sequences. We developed a clustering method specialized as part of our algorithm which outperforms other existing clustering methods such as DNACLUST and CD-HIT-EST in clustering short sequences. A scoring system was needed to determine how much a cluster is close to being a real motif. Multiple features are calculated based on the contents of each cluster to determine the score of the cluster. These features contain a set of divergence measures, positional, and occurrence information. These scores are combined in a way that a trade-off between them determines the clusters situation. There is an option to compare the final results with the motif databases such as Jolma2013, and UniProbe using Tomtom motif comparison tool. Algorithm Evaluation has been performed on three datasets from ABS database.\",\"PeriodicalId\":116204,\"journal\":{\"name\":\"2017 24th National and 2nd International Iranian Conference on Biomedical Engineering (ICBME)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 24th National and 2nd International Iranian Conference on Biomedical Engineering (ICBME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBME.2017.8430242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 24th National and 2nd International Iranian Conference on Biomedical Engineering (ICBME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBME.2017.8430242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Clustering-Based Algorithm for De Novo Motif Discovery in DNA Sequences
Motif discovery is a challenging problem in molecular biology and has been attracting researcher's attention for years. Different kind of data and computational methods have been used to unravel this problem, but there is still room for improvement. In this study, our goal was to develop a method with the ability to identify all the TFBS signals, including known and unknown, inside the input set of sequences. We developed a clustering method specialized as part of our algorithm which outperforms other existing clustering methods such as DNACLUST and CD-HIT-EST in clustering short sequences. A scoring system was needed to determine how much a cluster is close to being a real motif. Multiple features are calculated based on the contents of each cluster to determine the score of the cluster. These features contain a set of divergence measures, positional, and occurrence information. These scores are combined in a way that a trade-off between them determines the clusters situation. There is an option to compare the final results with the motif databases such as Jolma2013, and UniProbe using Tomtom motif comparison tool. Algorithm Evaluation has been performed on three datasets from ABS database.