eRFSVM:预测增强子的混合分类器--集成随机森林与支持向量机。

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
ACS Applied Bio Materials Pub Date : 2016-06-30 eCollection Date: 2016-01-01 DOI:10.1186/s41065-016-0012-2
Fang Huang, Jiawei Shen, Qingli Guo, Yongyong Shi
{"title":"eRFSVM:预测增强子的混合分类器--集成随机森林与支持向量机。","authors":"Fang Huang, Jiawei Shen, Qingli Guo, Yongyong Shi","doi":"10.1186/s41065-016-0012-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance.</p><p><strong>Results: </strong>We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively.</p><p><strong>Conclusions: </strong>All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2016-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5226099/pdf/","citationCount":"0","resultStr":"{\"title\":\"eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines.\",\"authors\":\"Fang Huang, Jiawei Shen, Qingli Guo, Yongyong Shi\",\"doi\":\"10.1186/s41065-016-0012-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance.</p><p><strong>Results: </strong>We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively.</p><p><strong>Conclusions: </strong>All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2016-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5226099/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s41065-016-0012-2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2016/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s41065-016-0012-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

摘要

背景:增强子是组织特异性远端调控元件,在基因调控和表达中发挥着重要作用。增强子的预测和识别是生物信息学研究中重要但具有挑战性的问题。现有的计算方法大多是单一分类器,只能预测基于转录辅激活因子 EP300 的增强子,而且泛化性能较低:eRFSVM 集成了 eRFSVM-ENCODE 和 eRFSVM-FANTOM5 两个具有不同特征和标签的组件。基础分类器使用随机森林训练来自单一组织或细胞的数据集。主分类器以基础分类器的预测结果为输入,通过支持向量机算法做出最终决定。对于 eRFSVM-ENCODE,我们以 ChIP-Seq 数据集为特征,以基于 EP300 的增强子为标签,训练了来自 Gm12878、Hep、H1-hesc 和 Huvec 等细胞系的数据集。我们在 K562 数据集上测试了 eRFSVM-ENCODE,结果预测精度为 83.69%,远远优于现有的分类器。对于 eRFSVM-FANTOM5,以 FANTOM5 项目中 RNA 鉴定出的增强子为标签,eRFSVM 的精确度、召回率、F-score 和准确率分别为 86.17 %、36.06 %、50.84 % 和 93.38 %,比现有算法分别提高了 23.24 %(69.92 %)、97.05 %(18.30 %)、76.90 %(28.74 %)、4.69 %(89.20 %):所有这些结果表明,eRFSVM 在预测基于 EP300 和基于 FAMTOM5 RNAs 的增强子方面是一种更好的分类器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines.

Background: Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance.

Results: We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively.

Conclusions: All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信