Eric A Bai, Botao Ju, Madeleine Beckner, Jerome P Reiter, M Giovanna Merli, Ted Mouw
{"title":"Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names.","authors":"Eric A Bai, Botao Ju, Madeleine Beckner, Jerome P Reiter, M Giovanna Merli, Ted Mouw","doi":"10.1093/jrsssa/qnae107","DOIUrl":"10.1093/jrsssa/qnae107","url":null,"abstract":"<p><p>Many population surveys do not provide information on respondents' residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography can be beneficial for characterizing neighbourhoods, especially for relatively rare populations such as immigrants. One way to obtain such information is to link survey records to records in auxiliary databases that include residential addresses by matching on variables common to both files. We present an approach based on probabilistic record linkage that enables matching survey participants in the Chinese Immigrants in Raleigh-Durham Study to records from InfoUSA, an information provider of residential records. The two files use different Chinese name romanization practices, which we address through a novel and generalizable strategy for constructing records' pairwise comparison vectors for romanized names. Using a fully Bayesian record linkage model, we characterize the geospatial distribution of Chinese immigrants in the Raleigh-Durham area of North Carolina.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"84-97"},"PeriodicalIF":1.5,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728054/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparison of some existing and novel methods for integrating historical models to improve estimation of coefficients in logistic regression.","authors":"Philip S Boonstra, Pedro Orozco Del Pino","doi":"10.1093/jrsssa/qnae093","DOIUrl":"10.1093/jrsssa/qnae093","url":null,"abstract":"<p><p>Model integration refers to the process of incorporating a fitted historical model into the estimation of a current study to increase statistical efficiency. Integration can be challenging when the current model includes new covariates, leading to potential model misspecification. We present and evaluate seven existing and novel model integration techniques, which employ both likelihood constraints and Bayesian informative priors. Using a simulation study of logistic regression, we quantify how efficiency-assessed by bias and variance-changes with the sample sizes of both historical and current studies and in response to violations to transportability assumptions. We also apply these methods to a case study in which the goal is to use novel predictors to update a risk prediction model for in-hospital mortality among pediatric extracorporeal membrane oxygenation patients. Our simulation study and case study suggest that (i) when historical sample size is small, accounting for this statistical uncertainty is more efficient; (ii) all methods lose efficiency when there exist differences between the historical and current data-generating mechanisms; (iii) additional shrinkage to zero can improve efficiency in higher-dimensional settings but at the cost of bias in estimation.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"46-67"},"PeriodicalIF":1.5,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul N Zivich, Jessie K Edwards, Bonnie E Shook-Sa, Eric T Lofgren, Justin Lessler, Stephen R Cole
{"title":"Synthesis estimators for transportability with positivity violations by a continuous covariate.","authors":"Paul N Zivich, Jessie K Edwards, Bonnie E Shook-Sa, Eric T Lofgren, Justin Lessler, Stephen R Cole","doi":"10.1093/jrsssa/qnae084","DOIUrl":"10.1093/jrsssa/qnae084","url":null,"abstract":"<p><p>Studies intended to estimate the effect of a treatment, like randomized trials, may not be sampled from the desired target population. To correct for this discrepancy, estimates can be transported to the target population. Methods for transporting between populations are often premised on a positivity assumption, such that all relevant covariate patterns in one population are also present in the other. However, eligibility criteria, particularly in the case of trials, can result in violations of positivity when transporting to external populations. To address nonpositivity, a synthesis of statistical and mathematical models can be considered. This approach integrates multiple data sources (e.g. trials, observational, pharmacokinetic studies) to estimate treatment effects, leveraging mathematical models to handle positivity violations. This approach was previously demonstrated for positivity violations by a single binary covariate. Here, we extend the synthesis approach for positivity violations with a continuous covariate. For estimation, two novel augmented inverse probability weighting estimators are proposed. Both estimators are contrasted with other common approaches for addressing nonpositivity. Empirical performance is compared via Monte Carlo simulation. Finally, the competing approaches are illustrated with an example in the context of two-drug vs. one-drug antiretroviral therapy on CD4 T cell counts among women with HIV.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"158-180"},"PeriodicalIF":1.5,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lingxiao Wang, Yan Li, Barry I Graubard, Hormuzd A Katki
{"title":"Data-integration with pseudoweights and survey-calibration: application to developing US-representative lung cancer risk models for use in screening.","authors":"Lingxiao Wang, Yan Li, Barry I Graubard, Hormuzd A Katki","doi":"10.1093/jrsssa/qnae059","DOIUrl":"10.1093/jrsssa/qnae059","url":null,"abstract":"<p><p>Accurate cancer risk estimation is crucial to clinical decision-making, such as identifying high-risk people for screening. However, most existing cancer risk models incorporate data from epidemiologic studies, which usually cannot represent the target population. While population-based health surveys are ideal for making inference to the target population, they typically do not collect time-to-cancer incidence data. Instead, time-to-cancer specific mortality is often readily available on surveys via linkage to vital statistics. We develop calibrated pseudoweighting methods that integrate individual-level data from a cohort and a survey, and summary statistics of cancer incidence from national cancer registries. By leveraging individual-level cancer mortality data in the survey, the proposed methods impute time-to-cancer incidence for survey sample individuals and use survey calibration with auxiliary variables of influence functions generated from Cox regression to improve robustness and efficiency of the inverse-propensity pseudoweighting method in estimating pure risks. We develop a lung cancer incidence pure risk model from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial using our proposed methods by integrating data from the National Health Interview Survey and cancer registries.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"119-139"},"PeriodicalIF":1.5,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ritoban Kundu, Xu Shi, Jean Morrison, Jessica Barrett, Bhramar Mukherjee
{"title":"A framework for understanding selection bias in real-world healthcare data.","authors":"Ritoban Kundu, Xu Shi, Jean Morrison, Jessica Barrett, Bhramar Mukherjee","doi":"10.1093/jrsssa/qnae039","DOIUrl":"10.1093/jrsssa/qnae039","url":null,"abstract":"<p><p>Using administrative patient-care data such as Electronic Health Records (EHR) and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect estimates of the association between a binary outcome and an exposure (continuous or categorical) of interest. We consider four easy-to-implement weighting approaches to reduce selection bias with accompanying variance formulae. We demonstrate through a simulation study when they can rescue us in practice with analysis of real-world data. We compare these methods using a data example where our goal is to estimate the well-known association of cancer and biological sex, using EHR from a longitudinal biorepository at the University of Michigan Healthcare system. We provide annotated R codes to implement these weighted methods with associated inference.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"187 3","pages":"606-635"},"PeriodicalIF":1.5,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dr Arun Chind’s contribution to the Discussion of “A system of population estimates compiled from administrative data only” by Dunne and Zhang","authors":"A. Chind","doi":"10.1093/jrsssa/qnad119","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad119","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"9 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79677215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Psychometrics of Standard Setting","authors":"Andrew Mcculloch","doi":"10.1093/jrsssa/qnad108","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad108","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"6 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78062263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measurement Models for Psychological Attributes","authors":"Andrew Mcculloch","doi":"10.1093/jrsssa/qnad107","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad107","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"32 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75777254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Science Ethics: Concepts, Techniques and Cautionary Tales","authors":"R. Reese","doi":"10.1093/jrsssa/qnad111","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad111","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"146 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83108647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big Data and Social Science Data Science Methods and Tools for Research and Practice","authors":"V. Kalyani","doi":"10.1093/jrsssa/qnad109","DOIUrl":"https://doi.org/10.1093/jrsssa/qnad109","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"6 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82363865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}