{"title":"eRFSVM:预测增强子的混合分类器--集成随机森林与支持向量机。","authors":"Fang Huang, Jiawei Shen, Qingli Guo, Yongyong Shi","doi":"10.1186/s41065-016-0012-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance.</p><p><strong>Results: </strong>We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively.</p><p><strong>Conclusions: </strong>All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2016-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5226099/pdf/","citationCount":"0","resultStr":"{\"title\":\"eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines.\",\"authors\":\"Fang Huang, Jiawei Shen, Qingli Guo, Yongyong Shi\",\"doi\":\"10.1186/s41065-016-0012-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance.</p><p><strong>Results: </strong>We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively.</p><p><strong>Conclusions: </strong>All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2016-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5226099/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s41065-016-0012-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2016/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s41065-016-0012-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines.
Background: Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance.
Results: We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively.
Conclusions: All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers.