{"title":"利用条件推理林识别变量重要性","authors":"N. Settouti, Mostafa EL HABIB DAHO, Amine Chikh","doi":"10.1504/IJBRA.2017.10003483","DOIUrl":null,"url":null,"abstract":"Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate.","PeriodicalId":434900,"journal":{"name":"Int. J. Bioinform. Res. Appl.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using conditional inference forest to identify variable importance\",\"authors\":\"N. Settouti, Mostafa EL HABIB DAHO, Amine Chikh\",\"doi\":\"10.1504/IJBRA.2017.10003483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate.\",\"PeriodicalId\":434900,\"journal\":{\"name\":\"Int. J. Bioinform. Res. Appl.\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Bioinform. Res. Appl.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJBRA.2017.10003483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Bioinform. Res. Appl.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJBRA.2017.10003483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using conditional inference forest to identify variable importance
Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate.