{"title":"Test Statistics and Statistical Inference for Data With Informative Cluster Sizes","authors":"Soyoung Kim, Michael J. Martens, Kwang Woo Ahn","doi":"10.1002/bimj.70021","DOIUrl":"10.1002/bimj.70021","url":null,"abstract":"<div>\u0000 \u0000 <p>In biomedical studies, investigators often encounter clustered data. The cluster sizes are said to be informative if the outcome depends on the cluster size. Ignoring informative cluster sizes in the analysis leads to biased parameter estimation in marginal and mixed-effect regression models. Several methods to analyze data with informative cluster sizes have been proposed; however, methods to test the informativeness of the cluster sizes are limited, particularly for the marginal model. In this paper, we propose a score test and a Wald test to examine the informativeness of the cluster sizes for a generalized linear model, a Cox model, and a proportional subdistribution hazards model. Statistical inference can be conducted through weighted estimating equations. The simulation results show that both tests control Type I error rates well, but the score test has higher power than the Wald test for right-censored data while the power of the Wald test is generally higher than the score test for the binary outcome. We apply the Wald and score tests to hematopoietic cell transplant data and compare regression analysis results with/without adjusting for informative cluster sizes.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization","authors":"Benoit Liquet, Sarat Moka, Samuel Muller","doi":"10.1002/bimj.70015","DOIUrl":"10.1002/bimj.70015","url":null,"abstract":"<div>\u0000 \u0000 <p>The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high-dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering. In particular, these approaches build principal components, new variables that are combinations of all the original variables. A main drawback of principal components is the difficulty to interpret them when the number of variables is large. To define principal components from the most relevant variables, we propose to cast the best subset solution path method into principal component analysis and partial least square frameworks. We offer a new alternative by exploiting a continuous optimization algorithm for best subset solution path. Empirical studies show the efficacy of our approach for providing the best subset solution path. The usage of our algorithm is further exposed through the analysis of two real data sets. The first data set is analyzed using the principle component analysis while the analysis of the second data set is based on partial least square framework.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Goodness-of-Fit Testing for a Regression Model With a Doubly Truncated Response","authors":"Jacobo de Uña-Álvarez","doi":"10.1002/bimj.70022","DOIUrl":"10.1002/bimj.70022","url":null,"abstract":"<p>In survival analysis and epidemiology, among other fields, interval sampling is often employed. With interval sampling, the individuals undergoing the event of interest within a calendar time interval are recruited. This results in doubly truncated event times. Double truncation, which may appear with other sampling designs too, induces a selection bias, so ordinary statistical methods are generally inconsistent. In this paper, we introduce goodness-of-fit procedures for a regression model when the response variable is doubly truncated. With this purpose, a marked empirical process based on weighted residuals is constructed and its weak convergence is established. Kolmogorov–Smirnov– and Cramér–von Mises–type tests are consequently derived from such core process, and a bootstrap approximation for their practical implementation is given. The performance of the proposed tests is investigated through simulations. An application to model selection for AIDS incubation time as depending on age at infection is provided.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yujie Zhao, Qi Liu, Linda Z. Sun, Keaven M. Anderson
{"title":"Adjusted Inference for Multiple Testing Procedure in Group-Sequential Designs","authors":"Yujie Zhao, Qi Liu, Linda Z. Sun, Keaven M. Anderson","doi":"10.1002/bimj.70020","DOIUrl":"10.1002/bimj.70020","url":null,"abstract":"<div>\u0000 \u0000 <p>Adjustment of statistical significance levels for repeated analysis in group-sequential trials has been understood for some time. Adjustment accounting for testing multiple hypotheses is also well understood. There is limited research on simultaneously adjusting for both multiple hypothesis testing and repeated analyses of one or more hypotheses. We address this gap by proposing <i>adjusted-sequential p-values</i> that reject when they are less than or equal to the family-wise Type I error rate (FWER). We also propose sequential <span></span><math>\u0000 <semantics>\u0000 <mi>p</mi>\u0000 <annotation>$p$</annotation>\u0000 </semantics></math>-values for intersection hypotheses to compute adjusted-sequential <span></span><math>\u0000 <semantics>\u0000 <mi>p</mi>\u0000 <annotation>$p$</annotation>\u0000 </semantics></math>-values for elementary hypotheses. We demonstrate the application using weighted Bonferroni tests and weighted parametric tests for inference on each elementary hypothesis tested.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Interactions in High-Dimensional Data Using Cross Leverage Scores","authors":"Sven Teschke, Katja Ickstadt, Alexander Munteanu","doi":"10.1002/bimj.70014","DOIUrl":"https://doi.org/10.1002/bimj.70014","url":null,"abstract":"<p>We develop a variable selection method for interactions in regression models on large data in the context of genetics. The method is intended for investigating the influence of single-nucleotide polymorphisms (SNPs) and their interactions on health outcomes, which is a <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 <mo>≫</mo>\u0000 <mi>n</mi>\u0000 </mrow>\u0000 <annotation>$pgg n$</annotation>\u0000 </semantics></math> problem. We introduce cross leverage scores (CLSs) to detect interactions of variables while maintaining interpretability. Using this method, it is not necessary to consider every possible interaction between variables individually, which would be very time-consuming even for moderate amounts of variables. Instead, we calculate the CLS for each variable and obtain a measure of importance for this variable. Calculating the scores remains time-consuming for large data sets. The key idea for scaling to large data is to divide the data into smaller random batches or consecutive windows of variables. This avoids complex and time-consuming computations on high-dimensional matrices by performing the computations only for small subsets of the data, which is less costly. We compare these methods to provable approximations of CLS based on sketching, which aims at summarizing data succinctly. In a simulation study, we show that the CLSs are directly linked to the importance of a variable in the sense of an interaction effect. We further show that the approximation approaches are appropriate for performing the calculations efficiently on arbitrarily large data while preserving the interaction detection effect of the CLS. This underlines their scalability to genome wide data. In addition, we evaluate the methods on real data from the HapMap project.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142749303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model Selection for Ordinary Differential Equations: A Statistical Testing Approach","authors":"Itai Dattner, Shota Gugushvili, Oleksandr Laskorunskyi","doi":"10.1002/bimj.70013","DOIUrl":"10.1002/bimj.70013","url":null,"abstract":"<p>Ordinary differential equations (ODEs) are foundational tools in modeling intricate dynamics across a gamut of scientific disciplines. Yet, a possibility to represent a single phenomenon through multiple ODE models, driven by different understandings of nuances in internal mechanisms or abstraction levels, presents a model selection challenge. This study introduces a testing-based approach for ODE model selection amidst statistical noise. Rooted in the model misspecification framework, we adapt classical statistical paradigms (Vuong and Hotelling) to the ODE context, allowing for the comparison and ranking of diverse causal explanations without the constraints of nested models. Our simulation studies numerically investigate the statistical properties of the test, demonstrating its attainment of the nominal size and power across various settings. Real-world data examples further underscore the algorithm's applicability in practice. To foster accessibility and encourage real-world applications, we provide a user-friendly Python implementation of our model selection algorithm, bridging theoretical advancements with hands-on tools for the scientific community.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"τ\u0000 $tau$\u0000 -Inflated Beta Regression Model for Estimating \u0000 \u0000 τ\u0000 $tau$\u0000 -Restricted Means and Event-Free Probabilities for Censored Time-to-Event Data","authors":"Yizhuo Wang, Susan Murray","doi":"10.1002/bimj.70009","DOIUrl":"10.1002/bimj.70009","url":null,"abstract":"<p>In this research, we propose analysis of <span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></math>-restricted censored time-to-event data via a <span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></math>-inflated beta regression (<span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></math>-IBR) model. The outcome of interest is <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>min</mi>\u0000 <mo>(</mo>\u0000 <mi>τ</mi>\u0000 <mo>,</mo>\u0000 <mi>T</mi>\u0000 <mo>)</mo>\u0000 </mrow>\u0000 <annotation>${rm min}(tau,T)$</annotation>\u0000 </semantics></math>, where <span></span><math>\u0000 <semantics>\u0000 <mi>T</mi>\u0000 <annotation>$T$</annotation>\u0000 </semantics></math> and <span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></math> are the time-to-event and follow-up duration, respectively. Our analysis goals include estimation and inference related to <span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></math>-restricted mean survival time (<span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></math>-RMST) values and event-free probabilities at <span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></math> that address the censored nature of the data. In this setting, it is common to observe many individuals with <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>min</mi>\u0000 <mo>(</mo>\u0000 <mi>τ</mi>\u0000 <mo>,</mo>\u0000 <mi>T</mi>\u0000 <mo>)</mo>\u0000 <mo>=</mo>\u0000 <mi>τ</mi>\u0000 </mrow>\u0000 <annotation>${rm min}(tau,T)=tau$</annotation>\u0000 </semantics></math>, a point mass that is typically overlooked in <span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></math>-restricted event-time analyses. Our proposed <span></span><math>\u0000 <semantics>\u0000 <mi>τ</mi>\u0000 <annotation>$tau$</annotation>\u0000 </semantics></","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}