Random-Splitting Random Forest with Multiple Mixed-Data Covariates

Q4 Medicine
Mohammad Fayaz, Alireza Abadi, Soheila Khodakarim
{"title":"Random-Splitting Random Forest with Multiple Mixed-Data Covariates","authors":"Mohammad Fayaz, Alireza Abadi, Soheila Khodakarim","doi":"10.18502/jbe.v9i1.13974","DOIUrl":null,"url":null,"abstract":"Introduction:The bagging (BG) and random forest (RF) are famous supervised statistical learning methods based on the classification and regression trees. The BG and RF can deal with different types of responses such as categorical, continuous, etc. There are curves, time series, functional data, or observations that are related to each other based on their domain in many statistical applications. The RF methods are extended to some cases for functional data as covariates or responses in many pieces of literature. Among them, random-splitting is used to summarize the functional data to the multiple related summary statistics such as average, etc.
 Methods: This research article extends this method and introduces the mixed data BG (MD-BG) and RF (MD-RF) algorithm for multiple functional and non-functional, or mixed and hybrid data, covariates and it calculates the variable importance plot (VIP) for each covariate.
 Results: The main differences between MD-BG and MD-RF are in choosing the covariates that in the first, all covariates remain in the model but the second uses a random sample of covariates. The MD-RF helps to unmask the most important parts of functional covariates and the most important non-functional covariates.
 Conclusion: We apply our methods on the two datasets of DTI and Tecator and compare their performances for continuous and categorical responses with developed R package (“RSRF”) in the GitHub.","PeriodicalId":34310,"journal":{"name":"Journal of Biostatistics and Epidemiology","volume":"2015 29","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biostatistics and Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18502/jbe.v9i1.13974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction:The bagging (BG) and random forest (RF) are famous supervised statistical learning methods based on the classification and regression trees. The BG and RF can deal with different types of responses such as categorical, continuous, etc. There are curves, time series, functional data, or observations that are related to each other based on their domain in many statistical applications. The RF methods are extended to some cases for functional data as covariates or responses in many pieces of literature. Among them, random-splitting is used to summarize the functional data to the multiple related summary statistics such as average, etc. Methods: This research article extends this method and introduces the mixed data BG (MD-BG) and RF (MD-RF) algorithm for multiple functional and non-functional, or mixed and hybrid data, covariates and it calculates the variable importance plot (VIP) for each covariate. Results: The main differences between MD-BG and MD-RF are in choosing the covariates that in the first, all covariates remain in the model but the second uses a random sample of covariates. The MD-RF helps to unmask the most important parts of functional covariates and the most important non-functional covariates. Conclusion: We apply our methods on the two datasets of DTI and Tecator and compare their performances for continuous and categorical responses with developed R package (“RSRF”) in the GitHub.
多混合数据协变量随机分裂随机森林
bagging (BG)和random forest (RF)是著名的基于分类树和回归树的监督统计学习方法。BG和RF可以处理不同类型的响应,如分类响应、连续响应等。在许多统计应用中,有曲线、时间序列、函数数据或观测值,它们基于各自的域而相互关联。在许多文献中,RF方法被扩展到功能数据作为协变量或响应的某些情况。其中,随机分割是将功能数据汇总为多个相关的汇总统计量,如平均值等 方法:本文对该方法进行了扩展,引入了混合数据BG (MD-BG)和RF (MD-RF)算法,对多个功能和非功能,或混合和混合数据,协变量,计算每个协变量的变量重要性图(VIP)。结果:MD-BG和MD-RF的主要区别在于协变量的选择,在前者中,所有协变量都保留在模型中,而后者使用随机样本的协变量。MD-RF有助于揭示功能协变量的最重要部分和最重要的非功能协变量。 结论:我们将我们的方法应用于DTI和Tecator两个数据集,并与GitHub中开发的R包(“RSRF”)比较了它们在连续和分类响应方面的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.80
自引率
0.00%
发文量
26
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信