{"title":"Post-Estimation Shrinkage in Full and Selected Linear Regression Models in Low-Dimensional Data Revisited","authors":"Edwin Kipruto, Willi Sauerbrei","doi":"10.1002/bimj.202300368","DOIUrl":"https://doi.org/10.1002/bimj.202300368","url":null,"abstract":"<p>The fit of a regression model to new data is often worse due to overfitting. Analysts use variable selection techniques to develop parsimonious regression models, which may introduce bias into regression estimates. Shrinkage methods have been proposed to mitigate overfitting and reduce bias in estimates. Post-estimation shrinkage is an alternative to penalized methods. This study evaluates effectiveness of post-estimation shrinkage in improving prediction performance of full and selected models. Through a simulation study, results were compared with ordinary least squares (OLS) and ridge in full models, and best subset selection (BSS) and lasso in selected models. We focused on prediction errors and the number of selected variables. Additionally, we proposed a modified version of the parameter-wise shrinkage (PWS) approach named non-negative PWS (NPWS) to address weaknesses of PWS. Results showed that no method was superior in all scenarios. In full models, NPWS outperformed global shrinkage, whereas PWS was inferior to OLS. In low correlation with moderate-to-high signal-to-noise ratio (SNR), NPWS outperformed ridge, but ridge performed best in small sample sizes, high correlation, and low SNR. In selected models, all post-estimation shrinkage performed similarly, with global shrinkage slightly inferior. Lasso outperformed BSS and post-estimation shrinkage in small sample sizes, low SNR, and high correlation but was inferior when the opposite was true. Our study suggests that, with sufficient information, NPWS is more effective than global shrinkage in improving prediction accuracy of models. However, in high correlation, small sample sizes, and low SNR, penalized methods generally outperform post-estimation shrinkage methods.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 7","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300368","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142324583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Gertheiss, David Rügamer, Bernard X. W. Liew, Sonja Greven
{"title":"Functional Data Analysis: An Introduction and Recent Developments","authors":"Jan Gertheiss, David Rügamer, Bernard X. W. Liew, Sonja Greven","doi":"10.1002/bimj.202300363","DOIUrl":"https://doi.org/10.1002/bimj.202300363","url":null,"abstract":"<p>Functional data analysis (FDA) is a statistical framework that allows for the analysis of curves, images, or functions on higher dimensional domains. The goals of FDA, such as descriptive analyses, classification, and regression, are generally the same as for statistical analyses of scalar-valued or multivariate data, but FDA brings additional challenges due to the high- and infinite dimensionality of observations and parameters, respectively. This paper provides an introduction to FDA, including a description of the most common statistical analysis techniques, their respective software implementations, and some recent developments in the field. The paper covers fundamental concepts such as descriptives and outliers, smoothing, amplitude and phase variation, and functional principal component analysis. It also discusses functional regression, statistical inference with functional data, functional classification and clustering, and machine learning approaches for functional data analysis. The methods discussed in this paper are widely applicable in fields such as medicine, biophysics, neuroscience, and chemistry and are increasingly relevant due to the widespread use of technologies that allow for the collection of functional data. Sparse functional data methods are also relevant for longitudinal data analysis. All presented methods are demonstrated using available software in R by analyzing a dataset on human motion and motor control. To facilitate the understanding of the methods, their implementation, and hands-on application, the code for these practical examples is made available through a code and data supplement and on GitHub.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 7","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300363","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142324584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael J. Cartwright, Tim Friede, David Lawrence, Emma May, Tobias Mütze, Kit Roes
{"title":"Stakeholders' Perspectives on Current Issues in Data Monitoring Committees","authors":"Michael J. Cartwright, Tim Friede, David Lawrence, Emma May, Tobias Mütze, Kit Roes","doi":"10.1002/bimj.202300384","DOIUrl":"10.1002/bimj.202300384","url":null,"abstract":"<div>\u0000 \u0000 <p>Data Monitoring Committees (DMCs) are groups of experts that review accumulating data from one or more ongoing clinical studies and advise the Sponsor regarding the continuing safety of study subjects along with the continuing validity and scientific merit of the study. Although DMCs are widely used, considerable variability exists in their conduct. This paper offers recommendations, derived from sessions given at the 2023 Central European Network International Biometric and Statisticians in the Pharmaceutical Industry Conferences' and the authors' experiences. We focus on four topics that are part of the DMC process and where there is unclarity and inconsistency in current practices: (1) Communication with the DMC—We reflect on the importance of effective, proper communication channels between the DMC and relevant stakeholders to foster collaboration and exchange of critical information while retaining study integrity throughout. (2) Open sessions—We discuss the benefits of incorporating open sessions in DMC meetings to enhance transparency, inclusivity, and the consideration of diverse perspectives, as well as pitfalls of open sessions. (3) Access to efficacy data—We highlight the need for appropriate access to efficacy data by DMCs and discuss how to implement this in practice and how to address potential concerns regarding multiplicity. (4) Interactive data displays—We outline the utilization of interactive data displays to facilitate a more intuitive understanding of study results by the DMC. By addressing these topics, we aim to provide comprehensive practical recommendations that bridge the gap between current practices and optimal DMC functionality.</p>\u0000 </div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 7","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142301496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claudia Angelini, Daniela De Canditiis, Italia De Feis, Antonella Iuliano
{"title":"A Network-Constrain Weibull AFT Model for Biomarkers Discovery","authors":"Claudia Angelini, Daniela De Canditiis, Italia De Feis, Antonella Iuliano","doi":"10.1002/bimj.202300272","DOIUrl":"10.1002/bimj.202300272","url":null,"abstract":"<p>We propose AFTNet, a novel network-constraint survival analysis method based on the Weibull accelerated failure time (AFT) model solved by a penalized likelihood approach for variable selection and estimation. When using the log-linear representation, the inference problem becomes a structured sparse regression problem for which we explicitly incorporate the correlation patterns among predictors using a double penalty that promotes both sparsity and grouping effect. Moreover, we establish the theoretical consistency for the AFTNet estimator and present an efficient iterative computational algorithm based on the proximal gradient descent method. Finally, we evaluate AFTNet performance both on synthetic and real data examples.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 7","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300272","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142301494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multivariate Scalar on Multidimensional Distribution Regression With Application to Modeling the Association Between Physical Activity and Cognitive Functions","authors":"Rahul Ghosal, Marcos Matabuena","doi":"10.1002/bimj.202400042","DOIUrl":"10.1002/bimj.202400042","url":null,"abstract":"<p>We develop a new method for multivariate scalar on multidimensional distribution regression. Traditional approaches typically analyze isolated univariate scalar outcomes or consider unidimensional distributional representations as predictors. However, these approaches are suboptimal because (i) they fail to utilize the dependence between the distributional predictors and (ii) neglect the correlation structure of the response. To overcome these limitations, we propose a multivariate distributional analysis framework that harnesses the power of multivariate density functions and multitask learning. We develop a computationally efficient semiparametric estimation method for modeling the effect of the latent joint density on the multivariate response of interest. Additionally, we introduce a new conformal prediction algorithm for quantifying the uncertainty of our multivariate predictions based on subject characteristics and individualized distributional predictors, providing valuable insights into the conditional distribution of the response. We validate the effectiveness of our proposed method through comprehensive numerical simulations, clearly demonstrating its superior performance compared to traditional methods. The application of the proposed method is demonstrated on triaxial accelerometer data from the National Health and Nutrition Examination Survey 2011–2014 for modeling the association between cognitive scores across various domains and distributional representation of physical activity among the older adult population. Our results highlight the advantages of the proposed approach, emphasizing the significance of incorporating multidimensional distributional information in the triaxial accelerometer data.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 7","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202400042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142301495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the Heterogeneity of “Study Twins”","authors":"Christian Röver, Tim Friede","doi":"10.1002/bimj.202300387","DOIUrl":"10.1002/bimj.202300387","url":null,"abstract":"<p>Meta-analyses are commonly performed based on random-effects models, while in certain cases one might also argue in favor of a common-effect model. One such case may be given by the example of two “study twins” that are performed according to a common (or at least very similar) protocol. Here we investigate the particular case of meta-analysis of a pair of studies, for example, summarizing the results of two confirmatory clinical trials in phase III of a clinical development program. Thereby, we focus on the question of to what extent homogeneity or heterogeneity may be discernible and include an empirical investigation of published (“twin”) pairs of studies. A pair of estimates from two studies only provide very little evidence of homogeneity or heterogeneity of effects, and ad hoc decision criteria may often be misleading.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 6","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyu Zheng, Alexander C. McLain, Joshua Habiger, Christopher Rorden, Julius Fridriksson
{"title":"False Discovery Rate Control for Lesion-Symptom Mapping With Heterogeneous Data via Weighted p-Values","authors":"Siyu Zheng, Alexander C. McLain, Joshua Habiger, Christopher Rorden, Julius Fridriksson","doi":"10.1002/bimj.202300198","DOIUrl":"10.1002/bimj.202300198","url":null,"abstract":"<p>Lesion-symptom mapping studies provide insight into what areas of the brain are involved in different aspects of cognition. This is commonly done via behavioral testing in patients with a naturally occurring brain injury or lesions (e.g., strokes or brain tumors). This results in high-dimensional observational data where lesion status (present/absent) is nonuniformly distributed, with some voxels having lesions in very few (or no) subjects. In this situation, mass univariate hypothesis tests have severe power heterogeneity where many tests are known a priori to have little to no power. Recent advancements in multiple testing methodologies allow researchers to weigh hypotheses according to side information (e.g., information on power heterogeneity). In this paper, we propose the use of <i>p</i>-value weighting for voxel-based lesion-symptom mapping studies. The weights are created using the distribution of lesion status and spatial information to estimate different non-null prior probabilities for each hypothesis test through some common approaches. We provide a <i>monotone minimum weight</i> criterion, which requires minimum a priori power information. Our methods are demonstrated on dependent simulated data and an aphasia study investigating which regions of the brain are associated with the severity of language impairment among stroke survivors. The results demonstrate that the proposed methods have robust error control and can increase power. Further, we showcase how weights can be used to identify regions that are inconclusive due to lack of power.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 6","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300198","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142005983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charlotte Behning, Alexander Bigerl, Marvin N. Wright, Peggy Sekula, Moritz Berger, Matthias Schmid
{"title":"Random Survival Forests With Competing Events: A Subdistribution-Based Imputation Approach","authors":"Charlotte Behning, Alexander Bigerl, Marvin N. Wright, Peggy Sekula, Moritz Berger, Matthias Schmid","doi":"10.1002/bimj.202400014","DOIUrl":"10.1002/bimj.202400014","url":null,"abstract":"<p>Random survival forests (RSF) can be applied to many time-to-event research questions and are particularly useful in situations where the relationship between the independent variables and the event of interest is rather complex. However, in many clinical settings, the occurrence of the event of interest is affected by competing events, which means that a patient can experience an outcome other than the event of interest. Neglecting the competing event (i.e., regarding competing events as censoring) will typically result in biased estimates of the cumulative incidence function (CIF). A popular approach for competing events is Fine and Gray's subdistribution hazard model, which directly estimates the CIF by fitting a single-event model defined on a subdistribution timescale. Here, we integrate concepts from the subdistribution hazard modeling approach into the RSF. We develop several imputation strategies that use weights as in a discrete-time subdistribution hazard model to impute censoring times in cases where a competing event is observed. Our simulations show that the CIF is well estimated if the imputation already takes place outside the forest on the overall dataset. Especially in settings with a low rate of the event of interest or a high censoring rate, competing events must not be neglected, that is, treated as censoring. When applied to a real-world epidemiological dataset on chronic kidney disease, the imputation approach resulted in highly plausible predictor–response relationships and CIF estimates of renal events.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 6","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202400014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142005984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semiparametric Additive Modeling of the Restricted Mean Survival Time","authors":"Yuan Zhang, Douglas E. Schaubel","doi":"10.1002/bimj.202200371","DOIUrl":"10.1002/bimj.202200371","url":null,"abstract":"<div>\u0000 \u0000 <p>Analysis of the restricted mean survival time (RMST) has become increasingly common in biomedical studies during the last decade as a means of estimating treatment or covariate effects on survival. Advantages of RMST over the hazard ratio (HR) include increased interpretability and lack of reliance on the often tenuous proportional hazards assumption. Some authors have argued that RMST regression should generally be the frontline analysis as opposed to methods based on counting process increments. However, in order for the use of the RMST to be more mainstream, it is necessary to broaden the range of data structures to which pertinent methods can be applied. In this report, we address this issue from two angles. First, most of existing methodological development for directly modeling RMST has focused on multiplicative models. An additive model may be preferred due to goodness of fit and/or parameter interpretation. Second, many settings encountered nowadays feature high-dimensional categorical (nuisance) covariates, for which parameter estimation is best avoided. Motivated by these considerations, we propose stratified additive models for direct RMST analysis. The proposed methods feature additive covariate effects. Moreover, nuisance factors can be factored out of the estimation, akin to stratification in Cox regression, such that focus can be appropriately awarded to the parameters of chief interest. Large-sample properties of the proposed estimators are derived, and a simulation study is performed to assess finite-sample performance. In addition, we provide techniques for evaluating a fitted model with respect to risk discrimination and predictive accuracy. The proposed methods are then applied to liver transplant data to estimate the effects of donor characteristics on posttransplant survival time.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 6","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}