{"title":"Improving Survey Inference Using Administrative Records Without Releasing Individual-Level Continuous Data.","authors":"Sharifa Z Williams, Jungang Zou, Yutao Liu, Yajuan Si, Sandro Galea, Qixuan Chen","doi":"10.1002/sim.10270","DOIUrl":"10.1002/sim.10270","url":null,"abstract":"<p><p>Probability surveys are challenged by increasing nonresponse rates, resulting in biased statistical inference. Auxiliary information about populations can be used to reduce bias in estimation. Often continuous auxiliary variables in administrative records are first discretized before releasing to the public to avoid confidentiality breaches. This may weaken the utility of the administrative records in improving survey estimates, particularly when there is a strong relationship between continuous auxiliary information and the survey outcome. In this paper, we propose a two-step strategy, where the confidential continuous auxiliary data in the population are first utilized to estimate the response propensity score of the survey sample by statistical agencies, which is then included in a modified population data for data users. In the second step, data users who do not have access to confidential continuous auxiliary data conduct predictive survey inference by including discretized continuous variables and the propensity score as predictors using splines in a Bayesian model. We show by simulation that the proposed method performs well, yielding more efficient estimates of population means with 95% credible intervals providing better coverage than alternative approaches. We illustrate the proposed method using the Ohio Army National Guard Mental Health Initiative (OHARNG-MHI). The methods developed in this work are readily available in the R package AuxSurvey.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5803-5813"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639655/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Estimating Equations for Survival Data With Dependent Censoring.","authors":"Lili Yu, Liang Liu","doi":"10.1002/sim.10296","DOIUrl":"10.1002/sim.10296","url":null,"abstract":"<p><p>Independent censoring is usually assumed in survival data analysis. However, dependent censoring, where the survival time is dependent on the censoring time, is often seen in real data applications. In this project, we model the vector of survival time and censoring time marginally through semiparametric heteroscedastic accelerated failure time models and model their association by the vector of errors in the model. We show that this semiparametric model is identified, and the generalized estimating equation approach is extended to estimate the parameters in this model. It is shown that the estimators of the model parameters are consistent and asymptotically normal. Simulation studies are conducted to compare it with the estimation method under a parametric model. A real dataset from a prostate cancer study is used for illustration of the new proposed method.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5983-5995"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142772456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaogang Su, Lei Liu, Lili Liu, Ruiwen Zhou, Guoqiao Wang, Elise Dusseldorp, Tianni Zhou
{"title":"Regression Trees With Fused Leaves.","authors":"Xiaogang Su, Lei Liu, Lili Liu, Ruiwen Zhou, Guoqiao Wang, Elise Dusseldorp, Tianni Zhou","doi":"10.1002/sim.10272","DOIUrl":"10.1002/sim.10272","url":null,"abstract":"<p><p>We propose a novel regression tree method named \"TreeFuL,\" an abbreviation for 'Tree with Fused Leaves.' TreeFuL innovatively combines recursive partitioning with fused regularization, offering a distinct approach to the conventional pruning method. One of TreeFuL's noteworthy advantages is its capacity for cross-validated amalgamation of non-neighboring terminal nodes. This is facilitated by a leaf coloring scheme that supports tree shearing and node amalgamation. As a result, TreeFuL facilitates the development of more parsimonious tree models without compromising predictive accuracy. The refined model offers enhanced interpretability, making it particularly well-suited for biomedical applications of decision trees, such as disease diagnosis and prognosis. We demonstrate the practical advantages of our proposed method through simulation studies and an analysis of data collected in an obesity study.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5872-5884"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771769/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Nonparametric Model for Heterogeneous Treatment Effects With Zero-Inflated Data.","authors":"Chanmin Kim, Yisheng Li, Ting Xu, Zhongxing Liao","doi":"10.1002/sim.10266","DOIUrl":"10.1002/sim.10266","url":null,"abstract":"<p><p>One goal of precision medicine is to develop effective treatments for patients by tailoring to their individual demographic, clinical, and/or genetic characteristics. To achieve this goal, statistical models must be developed that can identify and evaluate potentially heterogeneous treatment effects in a robust manner. The oft-cited existing methods for assessing treatment effect heterogeneity are based upon parametric models with interactions or conditioning on covariate values, the performance of which is sensitive to the omission of important covariates and/or the choice of their values. We propose a new Bayesian nonparametric (BNP) method for estimating heterogeneous causal effects in studies with zero-inflated outcome data, which arise commonly in health-related studies. We employ the enriched Dirichlet process (EDP) mixture in our BNP approach, establishing a connection between an outcome DP mixture and a covariate DP mixture. This enables us to estimate posterior distributions concurrently, facilitating flexible inference regarding individual causal effects. We show in a set of simulation studies that the proposed method outperforms two other BNP methods in terms of bias and mean squared error (MSE) of the conditional average treatment effect estimates. In particular, the proposed model has the advantage of appropriately reflecting uncertainty in regions where the overlap condition is violated compared to other competing models. We apply the proposed method to a study of the relationship between heart radiation dose parameters and the blood level of high-sensitivity cardiac troponin T (hs-cTnT) to examine if the effect of a high mean heart radiation dose on hs-cTnT varies by baseline characteristics.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5968-5982"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142751737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guoqiao Wang, Jason Hassenstab, Yan Li, Andrew J Aschenbrenner, Eric M McDade, Jorge Llibre-Guerra, Randall J Bateman, Chengjie Xiong
{"title":"Unlocking Cognitive Analysis Potential in Alzheimer's Disease Clinical Trials: Investigating Hierarchical Linear Models for Analyzing Novel Measurement Burst Design Data.","authors":"Guoqiao Wang, Jason Hassenstab, Yan Li, Andrew J Aschenbrenner, Eric M McDade, Jorge Llibre-Guerra, Randall J Bateman, Chengjie Xiong","doi":"10.1002/sim.10292","DOIUrl":"10.1002/sim.10292","url":null,"abstract":"<p><p>Measurement burst designs typically administer brief cognitive tests four times per day for 1 week, resulting in a maximum of 28 data points per week per test for every 6 months. In Alzheimer's disease clinical trials, utilizing measurement burst designs holds great promise for boosting statistical power by collecting huge amount of data. However, appropriate methods for analyzing these complex datasets are not well investigated. Furthermore, the large amount of burst design data also poses tremendous challenges for traditional computational procedures such as SAS mixed or Nlmixed. We propose to analyze burst design data using novel hierarchical linear mixed effects models or hierarchical mixed models for repeated measures. Through simulations and real-world data applications using the novel SAS procedure Hpmixed, we demonstrate these hierarchical models' efficiency over traditional models. Our sample simulation and analysis code can serve as a catalyst to facilitate the methodology development for burst design data.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5898-5910"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142717271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sierra Pugh, Andrew T Levin, Gideon Meyerowitz-Katz, Satej Soman, Nana Owusu-Boaitey, Anthony B Zwi, Anup Malani, Ander Wilson, Bailey K Fosdick
{"title":"A Hierarchical Bayesian Model for Estimating Age-Specific COVID-19 Infection Fatality Rates in Developing Countries.","authors":"Sierra Pugh, Andrew T Levin, Gideon Meyerowitz-Katz, Satej Soman, Nana Owusu-Boaitey, Anthony B Zwi, Anup Malani, Ander Wilson, Bailey K Fosdick","doi":"10.1002/sim.10259","DOIUrl":"10.1002/sim.10259","url":null,"abstract":"<p><p>The COVID-19 infection fatality rate (IFR) is the proportion of individuals infected with SARS-CoV-2 who subsequently die. As COVID-19 disproportionately affects older individuals, age-specific IFR estimates are imperative to facilitate comparisons of the impact of COVID-19 between locations and prioritize distribution of scarce resources. However, there lacks a coherent method to synthesize available data to create estimates of IFR and seroprevalence that vary continuously with age and adequately reflect uncertainties inherent in the underlying data. In this article, we introduce a novel Bayesian hierarchical model to estimate IFR as a continuous function of age that acknowledges heterogeneity in population age structure across locations and accounts for uncertainty in the estimates due to seroprevalence sampling variability and the imperfect serology test assays. Our approach simultaneously models test assay characteristics, serology, and death data, where the serology and death data are often available only for binned age groups. Information is shared across locations through hierarchical modeling to improve estimation of the parameters with limited data. Modeling data from 26 developing country locations during the first year of the COVID-19 pandemic, we found seroprevalence did not change dramatically with age, and the IFR at age 60 was above the high-income country estimate for most locations.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5667-5680"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Risk Assessment of Time-to-Event Targets With Adaptive Information Transfer.","authors":"Jie Ding, Jialiang Li, Ping Xie, Xiaoguang Wang","doi":"10.1002/sim.10290","DOIUrl":"10.1002/sim.10290","url":null,"abstract":"<p><p>Using informative sources to enhance statistical analysis in target studies has become an increasingly popular research topic. However, cohorts with time-to-event outcomes have not received sufficient attention, and external studies often encounter issues of incomparability due to population heterogeneity and unmeasured risk factors. To improve individualized risk assessments, we propose a novel methodology that adaptively borrows information from multiple incomparable sources. By extracting aggregate statistics through transitional models applied to both the external sources and the target population, we incorporate this information efficiently using the control variate technique. This approach eliminates the need to load individual-level records from sources directly, resulting in low computational complexity and strong privacy protection. Asymptotically, our estimators of both relative and baseline risks are more efficient than traditional results, and the power of covariate effects testing is much enhanced. We demonstrate the practical performance of our method via extensive simulations and a real case study.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"6026-6041"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142772452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ye Tian, Henry Rusinek, Arjun V Masurkar, Yang Feng
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\"><ns0:math> <ns0:semantics> <ns0:mrow> <ns0:msub><ns0:mrow><ns0:mi>ℓ</ns0:mi></ns0:mrow> <ns0:mrow><ns0:mn>1</ns0:mn></ns0:mrow> </ns0:msub> </ns0:mrow> <ns0:annotation>$$ {ell}_1 $$</ns0:annotation></ns0:semantics> </ns0:math> -Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia Subtypes.","authors":"Ye Tian, Henry Rusinek, Arjun V Masurkar, Yang Feng","doi":"10.1002/sim.10263","DOIUrl":"10.1002/sim.10263","url":null,"abstract":"<p><p>High-dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast-based <math> <semantics> <mrow> <msub><mrow><mi>ℓ</mi></mrow> <mrow><mn>1</mn></mrow> </msub> </mrow> <annotation>$$ {ell}_1 $$</annotation></semantics> </math> -penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> value of the individual hypothesis test. We also examine cases of model misspecification and non-identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5711-5747"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Inference for Counting Processes Under Shape Heterogeneity.","authors":"Ying Sheng, Yifei Sun","doi":"10.1002/sim.10280","DOIUrl":"10.1002/sim.10280","url":null,"abstract":"<p><p>Proportional rate models are among the most popular methods for analyzing recurrent event data. Although providing a straightforward rate-ratio interpretation of covariate effects, the proportional rate assumption implies that covariates do not modify the shape of the rate function. When the proportionality assumption fails to hold, we propose to characterize covariate effects on the rate function through two types of parameters: the shape parameters and the size parameters. The former allows the covariates to flexibly affect the shape of the rate function, and the latter retains the interpretability of covariate effects on the magnitude of the rate function. To overcome the challenges in simultaneously estimating the two sets of parameters, we propose a conditional pseudolikelihood approach to eliminate the size parameters in shape estimation, followed by an event count projection approach for size estimation. The proposed estimators are asymptotically normal with a root- <math> <semantics><mrow><mi>n</mi></mrow> <annotation>$$ n $$</annotation></semantics> </math> convergence rate. Simulation studies and an analysis of recurrent hospitalizations using SEER-Medicare data are conducted to illustrate the proposed methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5849-5861"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142676818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dadong Zhang, Jingye Wang, Suqin Cai, Johan Surtihadi
{"title":"Skewness-Corrected Confidence Intervals for Predictive Values in Enrichment Studies.","authors":"Dadong Zhang, Jingye Wang, Suqin Cai, Johan Surtihadi","doi":"10.1002/sim.10283","DOIUrl":"10.1002/sim.10283","url":null,"abstract":"<p><p>The positive predictive value (PPV) and negative predictive value (NPV) can be expressed as functions of disease prevalence ( <math> <semantics><mrow><mi>ρ</mi></mrow> <annotation>$$ rho $$</annotation></semantics> </math> ) and the ratios of two binomial proportions ( <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> ), where <math> <semantics> <mrow><msub><mi>ϕ</mi> <mi>ppv</mi></msub> <mo>=</mo> <mfrac><mrow><mn>1</mn> <mo>-</mo> <mtext>specificity</mtext></mrow> <mtext>sensitivity</mtext></mfrac> </mrow> <annotation>$$ {phi}_{ppv}=frac{1- specificity}{sensitivity} $$</annotation></semantics> </math> and <math> <semantics> <mrow><msub><mi>ϕ</mi> <mi>npv</mi></msub> <mo>=</mo> <mfrac><mrow><mn>1</mn> <mo>-</mo> <mtext>sensitivity</mtext></mrow> <mtext>specificity</mtext></mfrac> </mrow> <annotation>$$ {phi}_{npv}=frac{1- sensitivity}{specificity} $$</annotation></semantics> </math> . In prospective studies, where the proportion of subjects with the disease in the study cohort is an unbiased estimate of the disease prevalence, the confidence intervals (CIs) of PPV and NPV can be estimated using established methods for single proportion. However, in enrichment studies, such as case-control studies, where the proportion of diseased subjects significantly differs from disease prevalence, estimating CIs for PPV and NPV remains a challenge in terms of skewness and overall coverage, especially under extreme conditions (e.g., <math> <semantics><mrow><mi>NPV</mi> <mo>=</mo> <mn>1</mn></mrow> <annotation>$$ mathrm{NPV}=1 $$</annotation></semantics> </math> ). In this article, we extend the method adopted by Li, where CIs for PPV and NPV were derived from those of <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> . We explored additional CI methods for <math> <semantics><mrow><mi>ϕ</mi></mrow> <annotation>$$ phi $$</annotation></semantics> </math> , including those by Gart & Nam (GN), MoverJ, and Walter and convert their corresponding CIs for PPV and NPV. Through simulations, we compared these methods with established CI methods, Fieller, Pepe, and Delta in terms of skewness and overall coverage. While no method proves universally optimal, GN and MoverJ methods generally emerge as recommended choices.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5862-5871"},"PeriodicalIF":1.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}