FHBF:针对高度不平衡临床数据集上的监督学习任务的具有辍学率的联合混合提升森林

IF 6.7 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Vasileios C. Pezoulas, Fanis Kalatzis, Themis P. Exarchos, Andreas Goules, Athanasios G. Tzioufas, Dimitrios I. Fotiadis
{"title":"FHBF:针对高度不平衡临床数据集上的监督学习任务的具有辍学率的联合混合提升森林","authors":"Vasileios C. Pezoulas, Fanis Kalatzis, Themis P. Exarchos, Andreas Goules, Athanasios G. Tzioufas, Dimitrios I. Fotiadis","doi":"10.1016/j.patter.2023.100893","DOIUrl":null,"url":null,"abstract":"<p>Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"14 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FHBF: Federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets\",\"authors\":\"Vasileios C. Pezoulas, Fanis Kalatzis, Themis P. Exarchos, Andreas Goules, Athanasios G. Tzioufas, Dimitrios I. Fotiadis\",\"doi\":\"10.1016/j.patter.2023.100893\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).</p>\",\"PeriodicalId\":36242,\"journal\":{\"name\":\"Patterns\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2024-01-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Patterns\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.patter.2023.100893\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2023.100893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

尽管已有多项研究将梯度提升树(GBT)作为一种稳健的分类器用于联合学习任务(联合 GBT [FGBT]),甚至是有辍学率的任务(有辍学率的联合梯度提升树 [FDART]),但这些研究都没有研究过联合 GBT 在联合环境中跨异构和高度不平衡数据集时的过拟合效应,也没有研究过损失函数中的辍学效应。在这项工作中,我们提出了联合混合提升森林(FHBF)算法,该算法采用了混合权重更新方法,以克服在云中高度不平衡数据集的训练过程中因过拟合效应而产生的问题。我们进行了八项案例研究,以强调 FHBF 与现有算法的性能对比,从而在 18 个欧洲联合数据库中开发出用于淋巴瘤开发的稳健人工智能模型。我们的结果凸显了FHBF的鲁棒性,与FGBT(0.611)和FDART(0.584)相比,FHBF的平均损失为0.527,分类性能却有所提高(灵敏度为0.938,特异度为0.732)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FHBF: Federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets

Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Patterns
Patterns Decision Sciences-Decision Sciences (all)
CiteScore
10.60
自引率
4.60%
发文量
153
审稿时长
19 weeks
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信