Amit U. Sinha, Mukta Phatak, Raj Bhatnagar, Anil G. Jegga
{"title":"Identifying Functional Binding Motifs of Tumor Protein p53 Using Support Vector Machines","authors":"Amit U. Sinha, Mukta Phatak, Raj Bhatnagar, Anil G. Jegga","doi":"10.1109/ICMLA.2007.46","DOIUrl":null,"url":null,"abstract":"Identification of transcription factor binding site in DNA sequences is a frequently performed task in bioinformatics. However, current methods of search produce a large number of false positives as these motifs are short and degenerate. We propose an implicit model of cooperative binding of transcription factors. We hypothesize that flanking regions of binding sites have a different composition compared to regions which do not have that binding site. Using statistically significant motifs in flanking region of true binding sites as features, we design a SVM classifier for discriminating true binding sites from false positives. We demonstrate the effectiveness of our method on a data set of experimentally verified p53 binding sites. We were able to obtain an overall accuracy of 80% and 76% on cross- validation and independent test set, respectively. By analyzing the features, we identified known as well as potentially new binding partners of p53.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Identification of transcription factor binding site in DNA sequences is a frequently performed task in bioinformatics. However, current methods of search produce a large number of false positives as these motifs are short and degenerate. We propose an implicit model of cooperative binding of transcription factors. We hypothesize that flanking regions of binding sites have a different composition compared to regions which do not have that binding site. Using statistically significant motifs in flanking region of true binding sites as features, we design a SVM classifier for discriminating true binding sites from false positives. We demonstrate the effectiveness of our method on a data set of experimentally verified p53 binding sites. We were able to obtain an overall accuracy of 80% and 76% on cross- validation and independent test set, respectively. By analyzing the features, we identified known as well as potentially new binding partners of p53.