Yi Shi, Michael T Eadon, Yao Chen, Anna Sun, Yuedi Yang, Chienwei Chiang, Macarius Donneyong, Jing Su, Pengyue Zhang
{"title":"A Precision Mixture Risk Model to Identify Adverse Drug Events in Subpopulations Using a Case-Crossover Design.","authors":"Yi Shi, Michael T Eadon, Yao Chen, Anna Sun, Yuedi Yang, Chienwei Chiang, Macarius Donneyong, Jing Su, Pengyue Zhang","doi":"10.1002/sim.10216","DOIUrl":"10.1002/sim.10216","url":null,"abstract":"<p><p>Despite the success of pharmacovigilance studies in detecting signals of adverse drug events (ADEs) from real-world data, the risks of ADEs in subpopulations warrant increased scrutiny to prevent them in vulnerable individuals. Recently, the case-crossover design has been implemented to leverage large-scale administrative claims data for ADE detection, while controlling both observed confounding effects and short-term fixed unobserved confounding effects. Additionally, as the case-crossover design only includes cases, subpopulations can be conveniently derived. In this manuscript, we propose a precision mixture risk model (PMRM) to identify ADE signals from subpopulations under the case-crossover design. The proposed model is able to identify signals from all ADE-subpopulation-drug combinations, while controlling for false discovery rate (FDR) and confounding effects. We applied the PMRM to an administrative claims data. We identified ADE signals in subpopulations defined by demographic variables, comorbidities, and detailed diagnosis codes. Interestingly, certain drugs were associated with a higher risk of ADE only in subpopulations, while these drugs had a neutral association with ADE in the general population. Additionally, the PMRM could control FDR at a desired level and had a higher probability to detect true ADE signals than the widely used McNemar's test. In conclusion, the PMRM is able to identify subpopulation-specific ADE signals from a tremendous number of ADE-subpopulation-drug combinations, while controlling for both FDR and confounding effects.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5088-5099"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142295957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent Archetypes of the Spatial Patterns of Cancer.","authors":"Thaís Pacheco Menezes, Marcos Oliveira Prates, Renato Assunção, Mônica Silva Monteiro De Castro","doi":"10.1002/sim.10232","DOIUrl":"10.1002/sim.10232","url":null,"abstract":"<p><p>The cancer atlas edited by several countries is the main resource for the analysis of the geographic variation of cancer risk. Correlating the observed spatial patterns with known or hypothesized risk factors is time-consuming work for epidemiologists who need to deal with each cancer separately, breaking down the patterns according to sex and race. The recent literature has proposed to study more than one cancer simultaneously looking for common spatial risk factors. However, this previous work has two constraints: they consider only a very small (2-4) number of cancers previously known to share risk factors. In this article, we propose an exploratory method to search for latent spatial risk factors of a large number of supposedly unrelated cancers. The method is based on the singular value decomposition and nonnegative matrix factorization, it is computationally efficient, scaling easily with the number of regions and cancers. We carried out a simulation study to evaluate the method's performance and apply it to cancer atlas from the USA, England, France, Australia, Spain, and Brazil. We conclude that with very few latent maps, which can represent a reduction of up to 90% of atlas maps, most of the spatial variability is conserved. By concentrating on the epidemiological analysis of these few latent maps a substantial amount of work is saved and, at the same time, high-level explanations affecting many cancers simultaneously can be reached.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5115-5137"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142372925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Causal Inference Over a Subpopulation: The Effect of Malaria Vaccine in Women During Pregnancy.","authors":"Zonghui Hu, Dean Follmann","doi":"10.1002/sim.10228","DOIUrl":"10.1002/sim.10228","url":null,"abstract":"<p><p>Preventing malaria during pregnancy is of critical importance, yet there are no approved malaria vaccines for pregnant women due to lack of efficacy results within this population. Conducting a randomized trial in pregnant women throughout the entire duration of pregnancy is impractical. Instead, a randomized trial was conducted among women of childbearing potential (WOCBP), and some participants became pregnant during the 2-year study. We explore a statistical method for estimating vaccine effect within the target subpopulation-women who can naturally become pregnant, namely, women who can become pregnant under a placebo condition-within the causal inference framework. Two vaccine effect estimators are employed to effectively utilize baseline characteristics and account for the fact that certain baseline characteristics were only available from pregnant participants. The first estimator considers all participants but can only utilize baseline variables collected from the entire participant pool. In contrast, the second estimator, which includes only pregnant participants, utilizes all available baseline information. Both estimators are evaluated numerically through simulation studies and applied to the WOCBP trial to assess vaccine effect against pregnancy malaria.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5193-5202"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583954/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yushuf Sharker, Zaynab Diallo, Wasiur R KhudaBukhsh, Eben Kenah
{"title":"Pairwise Accelerated Failure Time Regression Models for Infectious Disease Transmission in Close-Contact Groups With External Sources of Infection.","authors":"Yushuf Sharker, Zaynab Diallo, Wasiur R KhudaBukhsh, Eben Kenah","doi":"10.1002/sim.10226","DOIUrl":"10.1002/sim.10226","url":null,"abstract":"<p><p>Many important questions in infectious disease epidemiology involve associations between covariates (e.g., age or vaccination status) and infectiousness or susceptibility. Because disease transmission produces dependent outcomes, these questions are difficult or impossible to address using standard regression models from biostatistics. Pairwise survival analysis handles dependent outcomes by calculating likelihoods in terms of contact interval distributions in ordered pairs of individuals. The contact interval in the ordered pair <math> <semantics><mrow><mi>i</mi> <mi>j</mi></mrow> <annotation>$$ ij $$</annotation></semantics> </math> is the time from the onset of infectiousness in <math> <semantics><mrow><mi>i</mi></mrow> <annotation>$$ i $$</annotation></semantics> </math> to infectious contact from <math> <semantics><mrow><mi>i</mi></mrow> <annotation>$$ i $$</annotation></semantics> </math> to <math> <semantics><mrow><mi>j</mi></mrow> <annotation>$$ j $$</annotation></semantics> </math> , where an infectious contact is sufficient to infect <math> <semantics><mrow><mi>j</mi></mrow> <annotation>$$ j $$</annotation></semantics> </math> if they are susceptible. Here, we introduce a pairwise accelerated failure time regression model for infectious disease transmission that allows the rate parameter of the contact interval distribution to depend on individual-level infectiousness covariates for <math> <semantics><mrow><mi>i</mi></mrow> <annotation>$$ i $$</annotation></semantics> </math> , individual-level susceptibility covariates for <math> <semantics><mrow><mi>j</mi></mrow> <annotation>$$ j $$</annotation></semantics> </math> , and pair-level covariates (e.g., type of relationship). This model can simultaneously handle internal infections (caused by transmission between individuals under observation) and external infections (caused by environmental or community sources of infection). We show that this model produces consistent and asymptotically normal parameter estimates. In a simulation study, we evaluate bias and confidence interval coverage probabilities, explore the role of epidemiologic study design, and investigate the effects of model misspecification. We use this regression model to analyze household data from Los Angeles County during the 2009 influenza A (H1N1) pandemic, where we find that the ability to account for external sources of infection increases the statistical power to estimate the effect of antiviral prophylaxis.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5138-5154"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142372926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingwei Lu, Grace Y Yi, Denis Rustand, Patrick Parfrey, Laurent Briollais, Yun-Hee Choi
{"title":"Trivariate Joint Modeling for Family Data with Longitudinal Counts, Recurrent Events and a Terminal Event with Application to Lynch Syndrome.","authors":"Jingwei Lu, Grace Y Yi, Denis Rustand, Patrick Parfrey, Laurent Briollais, Yun-Hee Choi","doi":"10.1002/sim.10210","DOIUrl":"10.1002/sim.10210","url":null,"abstract":"<p><p>Trivariate joint modeling for longitudinal count data, recurrent events, and a terminal event for family data has increased interest in medical studies. For example, families with Lynch syndrome (LS) are at high risk of developing colorectal cancer (CRC), where the number of polyps and the frequency of colonoscopy screening visits are highly associated with the risk of CRC among individuals and families. To assess how screening visits influence polyp detection, which in turn influences time to CRC, we propose a clustered trivariate joint model. The proposed model facilitates longitudinal count data that are zero-inflated and over-dispersed and invokes individual-specific and family-specific random effects to account for dependence among individuals and families. We formulate our proposed model as a latent Gaussian model to use the Bayesian estimation approach with the integrated nested Laplace approximation algorithm and evaluate its performance using simulation studies. Our trivariate joint model is applied to a series of 18 families from Newfoundland, with the occurrence of CRC taken as the terminal event, the colonoscopy screening visits as recurrent events, and the number of polyps detected at each visit as zero-inflated count data with overdispersion. We showed that our trivariate model fits better than alternative bivariate models and that the cluster effects should not be ignored when analyzing family data. Finally, the proposed model enables us to quantify heterogeneity across families and individuals in polyp detection and CRC risk, thus helping to identify individuals and families who would benefit from more intensive screening visits.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5000-5022"},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142295958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Corentin Ségalas, Catherine Helmer, Robin Genuer, Cécile Proust-Lima
{"title":"Functional Principal Component Analysis as an Alternative to Mixed-Effect Models for Describing Sparse Repeated Measures in Presence of Missing Data.","authors":"Corentin Ségalas, Catherine Helmer, Robin Genuer, Cécile Proust-Lima","doi":"10.1002/sim.10214","DOIUrl":"10.1002/sim.10214","url":null,"abstract":"<p><p>Analyzing longitudinal data in health studies is challenging due to sparse and error-prone measurements, strong within-individual correlation, missing data and various trajectory shapes. While mixed-effect models (MM) effectively address these challenges, they remain parametric models and may incur computational costs. In contrast, functional principal component analysis (FPCA) is a non-parametric approach developed for regular and dense functional data that flexibly describes temporal trajectories at a potentially lower computational cost. This article presents an empirical simulation study evaluating the behavior of FPCA with sparse and error-prone repeated measures and its robustness under different missing data schemes in comparison with MM. The results show that FPCA is well-suited in the presence of missing at random data caused by dropout, except in scenarios involving most frequent and systematic dropout. Like MM, FPCA fails under missing not at random mechanism. The FPCA was applied to describe the trajectories of four cognitive functions before clinical dementia and contrast them with those of matched controls in a case-control study nested in a population-based aging cohort. The average cognitive declines of future dementia cases showed a sudden divergence from those of their matched controls with a sharp acceleration 5 to 2.5 years prior to diagnosis.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4899-4912"},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142154969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selection of number of clusters and warping penalty in clustering functional electrocardiogram.","authors":"Wei Yang, Harold I Feldman, Wensheng Guo","doi":"10.1002/sim.10192","DOIUrl":"10.1002/sim.10192","url":null,"abstract":"<p><p>Clustering functional data aims to identify unique functional patterns in the entire domain, but this can be challenging due to phase variability that distorts the observed patterns. Curve registration can be used to remove this variability, but determining the appropriate level of warping flexibility can be complicated. Curve registration also requires a target to which a functional object is aligned, typically the cross-sectional mean of functional objects within the same cluster. However, this mean is unknown prior to clustering. Furthermore, there is a trade-off between flexible warping and the number of resulting clusters. Removing more phase variability through curve registration can lead to fewer remaining variations in the functional data, resulting in a smaller number of clusters. Thus, the optimal number of clusters and warping flexibility cannot be uniquely identified. We propose to use external information to solve the identification issue. We define a cross validated Kullback-Leibler information criterion to select the number of clusters and the warping penalty. The criterion is derived from the predictive classification likelihood considering the joint distribution of both the functional data and external variable and penalizes the uncertainty in the cluster membership. We evaluate our method through simulation and apply it to electrocardiographic data collected in the Chronic Renal Insufficiency Cohort study. We identify two distinct clusters of electrocardiogram (ECG) profiles, with the second cluster exhibiting ST segment depression, an indication of cardiac ischemia, compared to the normal ECG profiles in the first cluster.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4913-4927"},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11499710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142154970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan C Moyer, Fan Li, Andrea J Cook, Patrick J Heagerty, Sherri L Pals, Elizabeth L Turner, Rui Wang, Yunji Zhou, Qilu Yu, Xueqi Wang, David M Murray
{"title":"Evaluating analytic models for individually randomized group treatment trials with complex clustering in nested and crossed designs.","authors":"Jonathan C Moyer, Fan Li, Andrea J Cook, Patrick J Heagerty, Sherri L Pals, Elizabeth L Turner, Rui Wang, Yunji Zhou, Qilu Yu, Xueqi Wang, David M Murray","doi":"10.1002/sim.10206","DOIUrl":"10.1002/sim.10206","url":null,"abstract":"<p><p>Many individually randomized group treatment (IRGT) trials randomly assign individuals to study arms but deliver treatments via shared agents, such as therapists, surgeons, or trainers. Post-randomization interactions induce correlations in outcome measures between participants sharing the same agent. Agents can be nested in or crossed with trial arm, and participants may interact with a single agent or with multiple agents. These complications have led to ambiguity in choice of models but there have been no systematic efforts to identify appropriate analytic models for these study designs. To address this gap, we undertook a simulation study to examine the performance of candidate analytic models in the presence of complex clustering arising from multiple membership, single membership, and single agent settings, in both nested and crossed designs and for a continuous outcome. With nested designs, substantial type I error rate inflation was observed when analytic models did not account for multiple membership and when analytic model weights characterizing the association with multiple agents did not match the data generating mechanism. Conversely, analytic models for crossed designs generally maintained nominal type I error rates unless there was notable imbalance in the number of participants that interact with each agent.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4796-4818"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142120561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis.","authors":"Jinyu Nie, Zhilong Qin, Wei Liu","doi":"10.1002/sim.10213","DOIUrl":"10.1002/sim.10213","url":null,"abstract":"<p><p>The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4836-4849"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenyi Lin, Jingjing Zou, Chongzhi Di, Cheryl L Rock, Loki Natarajan
{"title":"Multilevel Longitudinal Functional Principal Component Model.","authors":"Wenyi Lin, Jingjing Zou, Chongzhi Di, Cheryl L Rock, Loki Natarajan","doi":"10.1002/sim.10207","DOIUrl":"10.1002/sim.10207","url":null,"abstract":"<p><p>Sensor devices, such as accelerometers, are widely used for measuring physical activity (PA). These devices provide outputs at fine granularity (e.g., 10-100 Hz or minute-level), which while providing rich data on activity patterns, also pose computational challenges with multilevel densely sampled data, resulting in PA records that are measured continuously across multiple days and visits. On the other hand, a scalar health outcome (e.g., BMI) is usually observed only at the individual or visit level. This leads to a discrepancy in numbers of nested levels between the predictors (PA) and outcomes, raising analytic challenges. To address this issue, we proposed a multilevel longitudinal functional principal component analysis (mLFPCA) model to directly model multilevel functional PA inputs in a longitudinal study, and then implemented a longitudinal functional principal component regression (FPCR) to explore the association between PA and obesity-related health outcomes. Additionally, we conducted a comprehensive simulation study to examine the impact of imbalanced multilevel data on both mLFPCA and FPCR performance and offer guidelines for selecting optimal methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4781-4795"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142126751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}