High-dimensional Iterative Causal Forest (hdiCF) for Subgroup Identification Using Health Care Claims Data.

IF 5 2区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Tiansheng Wang, Virginia Pate, Richard Wyss, John B Buse, Michael R Kosorok, Til Stürmer
{"title":"High-dimensional Iterative Causal Forest (hdiCF) for Subgroup Identification Using Health Care Claims Data.","authors":"Tiansheng Wang, Virginia Pate, Richard Wyss, John B Buse, Michael R Kosorok, Til Stürmer","doi":"10.1093/aje/kwaf127","DOIUrl":null,"url":null,"abstract":"<p><p>We tested a novel high-dimensional approach (using 1 ordinal variable per code with up to four levels: zero, occurred once, sporadically, or frequent) against the standard high-dimensional propensity score (hdPS) method (up to 3 binary variables per code) for detecting heterogeneous treatment effects (HTE). Using the iterative causal forest (iCF) subgrouping algorithm, we analyzed a new-user cohort of 8,075 sodium-glucose cotransporter-2 inhibitors and 7,313 glucagon-like peptide-1 receptor agonists from a 20% random Medicare sample (2015-2019) with ≥1-year parts A/B/D enrollment and without severe renal disease. We extracted the top 200 prevalent codes across diagnoses, procedures, and prescriptions during the 1-year baseline. Subgroup-specific conditional average treatment effects (CATEs) were assessed for 2-year risk differences (aRD) in hospitalized heart failure using inverse-probability treatment weighting. The overall population exhibited an aRD of -0.4% (95% CI -1.1%, 0.2%). Our high-dimensional setting identified patients with ≥2 loop diuretic prescriptions (aRD: -2.6%, 95% CI: -5.0%, -0.2%) as the subgroup with the largest CATE. In contrast, the high-dimensional setting from hdPS identified patients with chronic kidney disease (aRD: -1.7%, 95% CI: -3.6%, 0.2%). Across various sensitivity analyses, our high-dimensional approach more accurately identified expected subgroups with HTE that aligns with prior clinical evidence.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":" ","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwaf127","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

We tested a novel high-dimensional approach (using 1 ordinal variable per code with up to four levels: zero, occurred once, sporadically, or frequent) against the standard high-dimensional propensity score (hdPS) method (up to 3 binary variables per code) for detecting heterogeneous treatment effects (HTE). Using the iterative causal forest (iCF) subgrouping algorithm, we analyzed a new-user cohort of 8,075 sodium-glucose cotransporter-2 inhibitors and 7,313 glucagon-like peptide-1 receptor agonists from a 20% random Medicare sample (2015-2019) with ≥1-year parts A/B/D enrollment and without severe renal disease. We extracted the top 200 prevalent codes across diagnoses, procedures, and prescriptions during the 1-year baseline. Subgroup-specific conditional average treatment effects (CATEs) were assessed for 2-year risk differences (aRD) in hospitalized heart failure using inverse-probability treatment weighting. The overall population exhibited an aRD of -0.4% (95% CI -1.1%, 0.2%). Our high-dimensional setting identified patients with ≥2 loop diuretic prescriptions (aRD: -2.6%, 95% CI: -5.0%, -0.2%) as the subgroup with the largest CATE. In contrast, the high-dimensional setting from hdPS identified patients with chronic kidney disease (aRD: -1.7%, 95% CI: -3.6%, 0.2%). Across various sensitivity analyses, our high-dimensional approach more accurately identified expected subgroups with HTE that aligns with prior clinical evidence.

高维迭代因果森林(hdiCF)的亚组识别使用医疗保健索赔数据。
我们测试了一种新的高维方法(每个代码使用1个有序变量,最多四个级别:零,一次发生,偶尔发生或频繁发生)与标准高维倾向得分(hdPS)方法(每个代码最多3个二进制变量)相比,用于检测异质性治疗效果(HTE)。使用迭代因果森林(iCF)亚组算法,我们分析了来自20%随机医疗保险样本(2015-2019)的8,075个钠-葡萄糖共转运蛋白2抑制剂和7,313个胰高血糖素样肽-1受体激动剂的新用户队列,这些样本≥1年的a /B/D部分登记,无严重肾脏疾病。我们在1年的基线期间提取了诊断、程序和处方中排名前200位的流行代码。采用反概率治疗加权法评估住院心力衰竭患者2年风险差异(aRD)的亚组特定条件平均治疗效果(CATEs)。总体aRD为-0.4% (95% CI -1.1%, 0.2%)。我们的高维环境确定服用≥2环利尿剂处方的患者(aRD: -2.6%, 95% CI: -5.0%, -0.2%)是CATE最大的亚组。相比之下,hdPS的高维环境确定了慢性肾脏疾病患者(aRD: -1.7%, 95% CI: -3.6%, 0.2%)。通过各种敏感性分析,我们的高维方法更准确地识别出与先前临床证据一致的HTE预期亚组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
American journal of epidemiology
American journal of epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
7.40
自引率
4.00%
发文量
221
审稿时长
3-6 weeks
期刊介绍: The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research. It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信