Generalized Estimating Equations Boosting (GEEB) machine for correlated data

IF 8.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
{"title":"Generalized Estimating Equations Boosting (GEEB) machine for correlated data","authors":"","doi":"10.1186/s40537-023-00875-5","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>Rapid development in data science enables machine learning and artificial intelligence to be the most popular research tools across various disciplines. While numerous articles have shown decent predictive ability, little research has examined the impact of complex correlated data. We aim to develop a more accurate model under repeated measures or hierarchical data structures. Therefore, this study proposes a novel algorithm, the Generalized Estimating Equations Boosting (GEEB) machine, to integrate the gradient boosting technique into the benchmark statistical approach that deals with the correlated data, the generalized Estimating Equations (GEE). Unlike the previous gradient boosting utilizing all input features, we randomly select some input features when building the model to reduce predictive errors. The simulation study evaluates the predictive performance of the GEEB, GEE, eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) across several hierarchical structures with different sample sizes. Results suggest that the new strategy GEEB outperforms the GEE and demonstrates superior predictive accuracy than the SVM and XGBoost in most situations. An application to a real-world dataset, the Forest Fire Data, also revealed that the GEEB reduced mean squared errors by 4.5% to 25% compared to GEE, XGBoost, and SVM. This research also provides a freely available R function that could implement the GEEB machine effortlessly for longitudinal or hierarchical data.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"29 1 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-023-00875-5","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Rapid development in data science enables machine learning and artificial intelligence to be the most popular research tools across various disciplines. While numerous articles have shown decent predictive ability, little research has examined the impact of complex correlated data. We aim to develop a more accurate model under repeated measures or hierarchical data structures. Therefore, this study proposes a novel algorithm, the Generalized Estimating Equations Boosting (GEEB) machine, to integrate the gradient boosting technique into the benchmark statistical approach that deals with the correlated data, the generalized Estimating Equations (GEE). Unlike the previous gradient boosting utilizing all input features, we randomly select some input features when building the model to reduce predictive errors. The simulation study evaluates the predictive performance of the GEEB, GEE, eXtreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) across several hierarchical structures with different sample sizes. Results suggest that the new strategy GEEB outperforms the GEE and demonstrates superior predictive accuracy than the SVM and XGBoost in most situations. An application to a real-world dataset, the Forest Fire Data, also revealed that the GEEB reduced mean squared errors by 4.5% to 25% compared to GEE, XGBoost, and SVM. This research also provides a freely available R function that could implement the GEEB machine effortlessly for longitudinal or hierarchical data.

用于相关数据的广义估计方程提升(GEEB)机器
摘要 数据科学的快速发展使机器学习和人工智能成为各学科最流行的研究工具。虽然有许多文章显示了良好的预测能力,但很少有研究探讨复杂相关数据的影响。我们的目标是在重复测量或分层数据结构下开发出更精确的模型。因此,本研究提出了一种新算法--广义估计方程提升(GEEB)机,将梯度提升技术整合到处理相关数据的基准统计方法--广义估计方程(GEE)中。与以往利用所有输入特征的梯度提升不同,我们在建立模型时随机选择了一些输入特征,以减少预测误差。模拟研究评估了 GEEB、GEE、eXtreme Gradient Boosting (XGBoost) 和支持向量机 (SVM) 在不同样本量的多个分层结构中的预测性能。结果表明,新策略 GEEB 的表现优于 GEE,并且在大多数情况下都比 SVM 和 XGBoost 表现出更高的预测准确性。在实际数据集森林火灾数据中的应用也表明,与 GEE、XGBoost 和 SVM 相比,GEEB 将均方误差减少了 4.5% 到 25%。这项研究还提供了一个免费的 R 函数,可以毫不费力地为纵向或分层数据实现 GEEB 机器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Big Data
Journal of Big Data Computer Science-Information Systems
CiteScore
17.80
自引率
3.70%
发文量
105
审稿时长
13 weeks
期刊介绍: The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信