利用条件推理林识别变量重要性

N. Settouti, Mostafa EL HABIB DAHO, Amine Chikh
{"title":"利用条件推理林识别变量重要性","authors":"N. Settouti, Mostafa EL HABIB DAHO, Amine Chikh","doi":"10.1504/IJBRA.2017.10003483","DOIUrl":null,"url":null,"abstract":"Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate.","PeriodicalId":434900,"journal":{"name":"Int. J. Bioinform. Res. Appl.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using conditional inference forest to identify variable importance\",\"authors\":\"N. Settouti, Mostafa EL HABIB DAHO, Amine Chikh\",\"doi\":\"10.1504/IJBRA.2017.10003483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate.\",\"PeriodicalId\":434900,\"journal\":{\"name\":\"Int. J. Bioinform. Res. Appl.\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Bioinform. Res. Appl.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJBRA.2017.10003483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Bioinform. Res. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJBRA.2017.10003483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随机森林变量重要性测度作为分类任务中变量选择的一种手段,越来越受到人们的关注。随机森林中变量重要性的度量在许多应用中是一种明智的变量选择方式,但在潜在预测变量在测量尺度或类别数量上变化的情况下并不可靠。在本文中,我们实现了由条件推理树(CIT)构建的随机森林,称为条件推理森林(CIF)。在条件推理林中的每棵树中,节点的划分都是基于具有良好结合性的方式进行的。使用卡方检验统计量来衡量相关性。除了识别提高分类准确性的变量外,该方法还清楚地识别出对准确性中性的变量,以及那些干扰正确分类的变量。在本文中,我们对大型生物数据分类的整体算法条件推理森林(CIF)特别感兴趣。该算法被评价为在保持非常满意的分类率的同时选择减少数量的特征的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using conditional inference forest to identify variable importance
Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信