{"title":"REFINE2:一个简化的模拟工具,帮助流行病学家在用户指定的数据中评估效果估计的适用性和敏感性。","authors":"Xiang Meng, Jonathan Y Huang","doi":"10.1093/aje/kwaf195","DOIUrl":null,"url":null,"abstract":"<p><p>Epidemiologists have access to various methods to reduce bias and improve statistical efficiency in effect estimation, from standard multivariable regression to state-of-the-art doubly-robust efficient estimators paired with highly flexible, data-adaptive algorithms (\"machine learning\"). However, due to numerous assumptions and trade-offs, epidemiologists face practical difficulties in recognizing which method, if any, may be suitable for their specific data and hypotheses. Importantly, relative advantages are necessarily context-specific (data structure, algorithms, model misspecification), limiting the utility of universal guidance. Evaluating performance through real-data-based simulations is useful but out-of-reach for many epidemiologists. We present a user-friendly, offline Shiny app REFINE2 (Realistic Evaluations of Finite sample INference using Efficient Estimators) that enables analysts to input their own data and quickly compare the performance of different algorithms within their data context in estimating a prespecified average treatment effect (ATE). REFINE2 automates plasmode simulation of a plausible target ATE given observed covariates and then examines bias and confidence interval coverage (relative to this target) given user-specified models. We present an extensive case study to illustrate how REFINE2 can be used to guide analyses within epidemiologist's own data under three typical scenarios: residual confounding; spurious covariates; and mis-specified effect modification. As expected, the apparent best method differed across scenarios and are suboptimal under residual confounding. REFINE2 may help epidemiologists not only chose amongst imperfect models, but also better understand common underappreciated problems, such as finite sample bias using machine learning.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":" ","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"REFINE2: A simplified simulation tool to help epidemiologists evaluate the suitability and sensitivity of effect estimation within user-specified data.\",\"authors\":\"Xiang Meng, Jonathan Y Huang\",\"doi\":\"10.1093/aje/kwaf195\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Epidemiologists have access to various methods to reduce bias and improve statistical efficiency in effect estimation, from standard multivariable regression to state-of-the-art doubly-robust efficient estimators paired with highly flexible, data-adaptive algorithms (\\\"machine learning\\\"). However, due to numerous assumptions and trade-offs, epidemiologists face practical difficulties in recognizing which method, if any, may be suitable for their specific data and hypotheses. Importantly, relative advantages are necessarily context-specific (data structure, algorithms, model misspecification), limiting the utility of universal guidance. Evaluating performance through real-data-based simulations is useful but out-of-reach for many epidemiologists. We present a user-friendly, offline Shiny app REFINE2 (Realistic Evaluations of Finite sample INference using Efficient Estimators) that enables analysts to input their own data and quickly compare the performance of different algorithms within their data context in estimating a prespecified average treatment effect (ATE). REFINE2 automates plasmode simulation of a plausible target ATE given observed covariates and then examines bias and confidence interval coverage (relative to this target) given user-specified models. We present an extensive case study to illustrate how REFINE2 can be used to guide analyses within epidemiologist's own data under three typical scenarios: residual confounding; spurious covariates; and mis-specified effect modification. As expected, the apparent best method differed across scenarios and are suboptimal under residual confounding. REFINE2 may help epidemiologists not only chose amongst imperfect models, but also better understand common underappreciated problems, such as finite sample bias using machine learning.</p>\",\"PeriodicalId\":7472,\"journal\":{\"name\":\"American journal of epidemiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/aje/kwaf195\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwaf195","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
REFINE2: A simplified simulation tool to help epidemiologists evaluate the suitability and sensitivity of effect estimation within user-specified data.
Epidemiologists have access to various methods to reduce bias and improve statistical efficiency in effect estimation, from standard multivariable regression to state-of-the-art doubly-robust efficient estimators paired with highly flexible, data-adaptive algorithms ("machine learning"). However, due to numerous assumptions and trade-offs, epidemiologists face practical difficulties in recognizing which method, if any, may be suitable for their specific data and hypotheses. Importantly, relative advantages are necessarily context-specific (data structure, algorithms, model misspecification), limiting the utility of universal guidance. Evaluating performance through real-data-based simulations is useful but out-of-reach for many epidemiologists. We present a user-friendly, offline Shiny app REFINE2 (Realistic Evaluations of Finite sample INference using Efficient Estimators) that enables analysts to input their own data and quickly compare the performance of different algorithms within their data context in estimating a prespecified average treatment effect (ATE). REFINE2 automates plasmode simulation of a plausible target ATE given observed covariates and then examines bias and confidence interval coverage (relative to this target) given user-specified models. We present an extensive case study to illustrate how REFINE2 can be used to guide analyses within epidemiologist's own data under three typical scenarios: residual confounding; spurious covariates; and mis-specified effect modification. As expected, the apparent best method differed across scenarios and are suboptimal under residual confounding. REFINE2 may help epidemiologists not only chose amongst imperfect models, but also better understand common underappreciated problems, such as finite sample bias using machine learning.
期刊介绍:
The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research.
It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.