A Comparative Study of Random Forest and Double Random Forest Models from View Points of Their Interpretability

Adlina Khairunnisa, K. Notodiputro, B. Sartono
{"title":"A Comparative Study of Random Forest and Double Random Forest Models from View Points of Their Interpretability","authors":"Adlina Khairunnisa, K. Notodiputro, B. Sartono","doi":"10.15294/sji.v11i1.48721","DOIUrl":null,"url":null,"abstract":"Purpose: This study aims to compare the performance of ensemble trees such as Random Forest (RF) and Double Random Forest (DRF) from view points of interpretability of the models. Both models have strong predictive performance but the inner working of the models is not human understandable. Model interpretability is required to explain the relationship between the predictors and the response. We apply association rules to simplify the essence of the models.Methods: This study compares interpretability of RF and DRF using association rules. Each decision tree formed from each model is converted into if-then rules by following the path from root node to leaf nodes. The data was selected in such a way that they were underfit data. This is due to the fact that DRF has been shown by other researchers to overcome the underfitting problem faced by RF. A Simulation study has been conducted to evaluate the extracted rules from RF and DRF. The rules extracted from both models are compared in terms of model interpretability based on support and confidence values. Association rules may also be applied to identify the characteristics of poor people who are working in Yogyakarta.Result: The simulation results revealed that the interpretability of DRF outperformed RF especially in the case of modelling underfit data.  On the other hand, using empirical data we have been able to characterize the profile of poor people who are working in Yogyakarta based on the most frequent rules.Novelty: Research on interpretable DRF is still rare, especially the interpretation model using association rules. Previous studies focused only on interpreting the random forest model using association rules. In this study, the rules extracted from the random forest and double random forest models are compared based on the quality of the rules extracted.","PeriodicalId":30781,"journal":{"name":"Scientific Journal of Informatics","volume":"73 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Journal of Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/sji.v11i1.48721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: This study aims to compare the performance of ensemble trees such as Random Forest (RF) and Double Random Forest (DRF) from view points of interpretability of the models. Both models have strong predictive performance but the inner working of the models is not human understandable. Model interpretability is required to explain the relationship between the predictors and the response. We apply association rules to simplify the essence of the models.Methods: This study compares interpretability of RF and DRF using association rules. Each decision tree formed from each model is converted into if-then rules by following the path from root node to leaf nodes. The data was selected in such a way that they were underfit data. This is due to the fact that DRF has been shown by other researchers to overcome the underfitting problem faced by RF. A Simulation study has been conducted to evaluate the extracted rules from RF and DRF. The rules extracted from both models are compared in terms of model interpretability based on support and confidence values. Association rules may also be applied to identify the characteristics of poor people who are working in Yogyakarta.Result: The simulation results revealed that the interpretability of DRF outperformed RF especially in the case of modelling underfit data.  On the other hand, using empirical data we have been able to characterize the profile of poor people who are working in Yogyakarta based on the most frequent rules.Novelty: Research on interpretable DRF is still rare, especially the interpretation model using association rules. Previous studies focused only on interpreting the random forest model using association rules. In this study, the rules extracted from the random forest and double random forest models are compared based on the quality of the rules extracted.
从可解释性角度对随机森林和双随机森林模型进行比较研究
目的:本研究旨在从模型可解释性的角度比较随机森林(RF)和双随机森林(DRF)等集合树的性能。这两种模型都具有很强的预测性能,但模型的内部工作原理却非人类所能理解。要解释预测因子与响应之间的关系,就需要模型的可解释性。我们运用关联规则来简化模型的本质:本研究使用关联规则比较 RF 和 DRF 的可解释性。从根节点到叶节点的路径将每个模型形成的决策树转换为 "如果-那么 "规则。数据的选择方式使其成为欠拟合数据。这是因为其他研究人员已经证明 DRF 可以克服 RF 所面临的欠拟合问题。为了评估从 RF 和 DRF 中提取的规则,我们进行了一项模拟研究。根据支持度和置信度值,比较了从这两种模型中提取的规则对模型的可解释性。关联规则还可用于识别在日惹工作的贫困人口的特征:模拟结果表明,DRF 的可解释性优于 RF,尤其是在对欠拟合数据建模时。 另一方面,通过使用经验数据,我们能够根据最常见的规则确定在日惹工作的贫困人口的特征:新颖性:关于可解释 DRF 的研究仍然很少,尤其是使用关联规则的解释模型。以往的研究只关注使用关联规则解释随机森林模型。在本研究中,根据所提取规则的质量,对从随机森林模型和双随机森林模型中提取的规则进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
13
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信