Use of Machine Learning to Compare Disease Risk Scores and Propensity Scores Across Complex Confounding Scenarios: A Simulation Study.

IF 2.4 4区 医学 Q3 PHARMACOLOGY & PHARMACY
Yuchen Guo, Victoria Y Strauss, Sara Khalid, Daniel Prieto-Alhambra
{"title":"Use of Machine Learning to Compare Disease Risk Scores and Propensity Scores Across Complex Confounding Scenarios: A Simulation Study.","authors":"Yuchen Guo, Victoria Y Strauss, Sara Khalid, Daniel Prieto-Alhambra","doi":"10.1002/pds.70165","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The surge of treatments for COVID-19 in the second quarter of 2020 had a low prevalence of treatment and high outcome risk. Motivated by that, we conducted a simulation study comparing disease risk scores (DRS) and propensity scores (PS) using a range of scenarios with different treatment prevalences and outcome risks.</p><p><strong>Method: </strong>Four methods were used to estimate PS and DRS: logistic regression (reference method), least absolute shrinkage and selection operator (LASSO), multilayer perceptron (MLP), and XgBoost. Monte Carlo simulations generated data across 25 scenarios varying in treatment prevalence, outcome risk, data complexity, and sample size. Average treatment effects were calculated after matching. Relative bias and average absolute standardized mean difference (ASMD) were reported.</p><p><strong>Result: </strong>Estimation bias increased as treatment prevalence decreased. DRS showed lower bias than PS when treatment prevalence was below 0.1, especially in nonlinear data. However, DRS did not outperform PS in linear or small sample data. PS had comparable or lower bias than DRS when treatment prevalence was 0.1-0.5. Three machine learning (ML) methods performed similarly, with LASSO and XgBoost outperforming the reference method in some nonlinear scenarios. ASMD results indicated that DRS was less impacted by decreasing treatment prevalence compared to PS.</p><p><strong>Conclusion: </strong>Under nonlinear data, DRS reduced bias compared to PS in scenarios with low treatment prevalence, while PS was preferable for data with treatment prevalence greater than 0.1, regardless of the outcome risk. ML methods can outperform the logistic regression method for PS and DRS estimation. Both decreasing sample size and adding nonlinearity and nonadditivity in data increased bias for all methods tested.</p>","PeriodicalId":19782,"journal":{"name":"Pharmacoepidemiology and Drug Safety","volume":"34 6","pages":"e70165"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12130674/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pharmacoepidemiology and Drug Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/pds.70165","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: The surge of treatments for COVID-19 in the second quarter of 2020 had a low prevalence of treatment and high outcome risk. Motivated by that, we conducted a simulation study comparing disease risk scores (DRS) and propensity scores (PS) using a range of scenarios with different treatment prevalences and outcome risks.

Method: Four methods were used to estimate PS and DRS: logistic regression (reference method), least absolute shrinkage and selection operator (LASSO), multilayer perceptron (MLP), and XgBoost. Monte Carlo simulations generated data across 25 scenarios varying in treatment prevalence, outcome risk, data complexity, and sample size. Average treatment effects were calculated after matching. Relative bias and average absolute standardized mean difference (ASMD) were reported.

Result: Estimation bias increased as treatment prevalence decreased. DRS showed lower bias than PS when treatment prevalence was below 0.1, especially in nonlinear data. However, DRS did not outperform PS in linear or small sample data. PS had comparable or lower bias than DRS when treatment prevalence was 0.1-0.5. Three machine learning (ML) methods performed similarly, with LASSO and XgBoost outperforming the reference method in some nonlinear scenarios. ASMD results indicated that DRS was less impacted by decreasing treatment prevalence compared to PS.

Conclusion: Under nonlinear data, DRS reduced bias compared to PS in scenarios with low treatment prevalence, while PS was preferable for data with treatment prevalence greater than 0.1, regardless of the outcome risk. ML methods can outperform the logistic regression method for PS and DRS estimation. Both decreasing sample size and adding nonlinearity and nonadditivity in data increased bias for all methods tested.

使用机器学习比较疾病风险评分和倾向评分在复杂的混杂情况:模拟研究。
目的:2020年第二季度COVID-19治疗激增,但治疗流行率低,结局风险高。受此启发,我们进行了一项模拟研究,比较了疾病风险评分(DRS)和倾向评分(PS),使用了一系列不同治疗流行率和结局风险的场景。方法:采用logistic回归(参考法)、最小绝对收缩和选择算子(LASSO)、多层感知器(MLP)和XgBoost四种方法估计PS和DRS。蒙特卡罗模拟生成了25种不同治疗流行率、结局风险、数据复杂性和样本量的情景数据。匹配后计算平均处理效果。报告了相对偏倚和平均绝对标准化平均差(ASMD)。结果:随着治疗患病率的降低,估计偏倚增加。当治疗患病率低于0.1时,DRS的偏倚低于PS,特别是在非线性数据中。然而,DRS在线性或小样本数据中并不优于PS。当治疗流行率为0.1-0.5时,PS的偏倚与DRS相当或更低。三种机器学习(ML)方法的表现相似,LASSO和XgBoost在一些非线性场景中优于参考方法。ASMD结果显示,与PS相比,DRS受治疗患病率降低的影响较小。结论:在非线性数据下,与PS相比,DRS在治疗患病率低的情况下减少了偏倚,而对于治疗患病率大于0.1的数据,无论结局风险如何,PS都更可取。ML方法在PS和DRS估计方面优于逻辑回归方法。减少样本量和增加数据的非线性和非可加性都会增加所有测试方法的偏倚。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.80
自引率
7.70%
发文量
173
审稿时长
3 months
期刊介绍: The aim of Pharmacoepidemiology and Drug Safety is to provide an international forum for the communication and evaluation of data, methods and opinion in the discipline of pharmacoepidemiology. The Journal publishes peer-reviewed reports of original research, invited reviews and a variety of guest editorials and commentaries embracing scientific, medical, statistical, legal and economic aspects of pharmacoepidemiology and post-marketing surveillance of drug safety. Appropriate material in these categories may also be considered for publication as a Brief Report. Particular areas of interest include: design, analysis, results, and interpretation of studies looking at the benefit or safety of specific pharmaceuticals, biologics, or medical devices, including studies in pharmacovigilance, postmarketing surveillance, pharmacoeconomics, patient safety, molecular pharmacoepidemiology, or any other study within the broad field of pharmacoepidemiology; comparative effectiveness research relating to pharmaceuticals, biologics, and medical devices. Comparative effectiveness research is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition, as these methods are truly used in the real world; methodologic contributions of relevance to pharmacoepidemiology, whether original contributions, reviews of existing methods, or tutorials for how to apply the methods of pharmacoepidemiology; assessments of harm versus benefit in drug therapy; patterns of drug utilization; relationships between pharmacoepidemiology and the formulation and interpretation of regulatory guidelines; evaluations of risk management plans and programmes relating to pharmaceuticals, biologics and medical devices.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信