Ritoban Kundu, Xu Shi, Jean Morrison, Jessica Barrett, Bhramar Mukherjee
{"title":"A framework for understanding selection bias in real-world healthcare data.","authors":"Ritoban Kundu, Xu Shi, Jean Morrison, Jessica Barrett, Bhramar Mukherjee","doi":"10.1093/jrsssa/qnae039","DOIUrl":"10.1093/jrsssa/qnae039","url":null,"abstract":"<p><p>Using administrative patient-care data such as Electronic Health Records (EHR) and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect estimates of the association between a binary outcome and an exposure (continuous or categorical) of interest. We consider four easy-to-implement weighting approaches to reduce selection bias with accompanying variance formulae. We demonstrate through a simulation study when they can rescue us in practice with analysis of real-world data. We compare these methods using a data example where our goal is to estimate the well-known association of cancer and biological sex, using EHR from a longitudinal biorepository at the University of Michigan Healthcare system. We provide annotated R codes to implement these weighted methods with associated inference.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"187 3","pages":"606-635"},"PeriodicalIF":1.5,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dr Arun Chind’s contribution to the Discussion of “A system of population estimates compiled from administrative data only” by Dunne and Zhang","authors":"A. Chind","doi":"10.1093/jrsssa/qnad119","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad119","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"9 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79677215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Psychometrics of Standard Setting","authors":"Andrew Mcculloch","doi":"10.1093/jrsssa/qnad108","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad108","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"6 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78062263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measurement Models for Psychological Attributes","authors":"Andrew Mcculloch","doi":"10.1093/jrsssa/qnad107","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad107","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75777254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Science Ethics: Concepts, Techniques and Cautionary Tales","authors":"R. Reese","doi":"10.1093/jrsssa/qnad111","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad111","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"146 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83108647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big Data and Social Science Data Science Methods and Tools for Research and Practice","authors":"V. Kalyani","doi":"10.1093/jrsssa/qnad109","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad109","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"6 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82363865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sarah Henry and Katie O’Farrell’s contribution to the Discussion of 'A system of population estimates compiled from administrative data only' by John Dunne and Li-Chun Zhang","authors":"Sarah Henry, K. O’Farrell","doi":"10.1093/jrsssa/qnad095","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad095","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"14 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90714864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biostatistics Decoded","authors":"Mukesh Srivastava","doi":"10.1093/jrsssa/qnad093","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad093","url":null,"abstract":"Description: Study design and statistical methodology are two important concerns for the clinical researcher. This book sets out to address both issues in a clear and concise manner. The presentation of statistical theory starts from basic concepts, such as the properties of means and variances, the properties of the Normal distribution and the Central Limit Theorem and leads to more advanced topics such as maximum likelihood estimation, inverse variance and stepwise regression as well as, time–to–event, and event–count methods. Furthermore, this book explores sampling methods, study design and statistical methods and is organized according to the areas of application of each of the statistical methods and the corresponding study designs. Illustrations, working examples, computer simulations and geometrical approaches, rather than mathematical expressions and formulae, are used throughout the book to explain every statistical method. Biostatisticians and researchers in the medical and pharmaceutical industry who need guidance on the design and analyis of medical research will find this book useful as well as graduate students of statistics and mathematics with an interest in biostatistics Biostatistics Decoded:-Provides clear explanations of key statistical concepts with a firm emphasis on practical aspects of design and analysis of medical research.-Features worked examples to illustrate each statistical method using computer simulations and geometrical approaches, rather than mathematical expressions and formulae.-Explores the main types of clinical research studies, such as, descriptive, analytical and experimental studies.-Addresses advanced modeling techniques such as interaction analysis and encoding by reference and polynomial regression.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"66 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74724939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneity in the US gender wage gap","authors":"Philipp Bach, V. Chernozhukov, M. Spindler","doi":"10.1093/jrsssa/qnad091","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad091","url":null,"abstract":"As a measure of gender inequality, the gender wage gap has come to play an important role both in academic research and the public debate. In 2016, the majority of full-time employed women in the United States earned significantly less than comparable men. The extent to which women were affected by gender inequality in earnings, however, depended greatly on socio-economic characteristics, such as marital status or educational attainment. In this paper, we analyse data from the 2016 American Community Survey using a high-dimensional wage regression and applying double lasso to quantify heterogeneity in the gender wage gap. We find that the wage gap varied substantially across women and that the magnitude of the gap varied primarily by marital status, having children at home, race, occupation, industry, and educational attainment. These insights are helpful in designing policies that can reduce discrimination and unequal pay more effectively.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"8 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89258581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A celebration of 50 years of the Cox model in memory of Sir David Cox","authors":"A. .. Lawrance","doi":"10.1093/jrsssa/qnad087","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad087","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"53 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75895971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}