{"title":"Paired count regressions for modeling the number of doctor consultations and non-prescribed drugs intake.","authors":"Jussiane Nader Gonçalves, Wagner Barreto-Souza, Hernando Ombao","doi":"10.1177/09622802251345332","DOIUrl":"https://doi.org/10.1177/09622802251345332","url":null,"abstract":"<p><p>There are sundry practical situations in which paired count variables are correlated, thus requiring a joint estimation method. In this article, we introduce a flexible class of bivariate mixed Poisson regression models, which settle into an exponential-family (EF) distributed component for unobserved heterogeneity. The proposed bivariate mixed Poisson models deal with the phenomenon of overdispersion, typical of count data, and have flexibility in terms of the correlation structure. Thus, this novel class of models has a distinct advantage over the most widely used models because it captures both positive and negative correlations in the count data. Under the bivariate mixed Poisson model, inference of the parameters is conducted through the maximum likelihood method. Monte Carlo studies on assessing the finite-sample performance of the estimators of the parameters are presented. Furthermore, we employ a likelihood ratio statistic for testing the significance of certain sources of correlation and evaluate its performance via simulation studies. Moreover, model adequacy is addressed by using simulated envelopes for residual analysis, and also a randomized probability integral transformation for calibration model control. The proposed bivariate mixed Poisson model is considered for analyzing a healthcare dataset from the Australian Health Survey, where our aim is to study the association between the number of consultations with a doctor and the number of non-prescribed drug intake.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251345332"},"PeriodicalIF":1.6,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Influence function-based empirical likelihood for area under the receiver operating characteristic curve in presence of covariates.","authors":"Baoying Yang, Xinjie Hu, Gengsheng Qin","doi":"10.1177/09622802251345343","DOIUrl":"https://doi.org/10.1177/09622802251345343","url":null,"abstract":"<p><p>In receiver operating characteristicROC analysis, the area under the ROC curve (AUC) is a popular one number summary of the discriminatory accuracy of a diagnostic test. AUC measures the overall diagnostic accuracy of a test but fails to account for the effect of covariates when covariates are present and associated with the test results. Adjustment for covariate effects can greatly improve the diagnostic accuracy of a test. In this paper, using information provided by the influence function, empirical likelihood (EL) methods are proposed for inferences of AUC in presence of covariates. For parameters in the AUC regression model, it is shown that the asymptotic distribution of the influence function-based empirical log-likelihood ratio statistic is a standard chi-square distribution. Hence, confidence regions for the regression parameters can be obtained without any variance estimation. Simulation studies are conducted to compare the finite sample performances of the proposed EL based methods with the existing normal approximation (NA) based method in the AUC regression. Simulation results indicate that the bootstrap-calibrated influence function-based empirical likelihood (BIFEL ) confidence region outperforms the NA-based confidence region in terms of coverage probability. We also propose an interval estimation method for the covariate-adjusted AUC based on the BIFEL confidence region. Finally, we illustrate the recommended method with a real prostate-specific antigen data example.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251345343"},"PeriodicalIF":1.6,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gilbert Kiprotich, Diego Ignacio Gallardo, Pedro Luiz Ramos, Thomas Augustin
{"title":"A shared frailty regression model for clustered survival data.","authors":"Gilbert Kiprotich, Diego Ignacio Gallardo, Pedro Luiz Ramos, Thomas Augustin","doi":"10.1177/09622802251338984","DOIUrl":"https://doi.org/10.1177/09622802251338984","url":null,"abstract":"<p><p>In this article, we propose a new frailty model based on a mixture of inverse Gaussian distributions for multivariate lifetimes. This approach provides an advantage over previous models, as the weights are directly determined through parameterization of the mixture, removing the need for arbitrary guesswork in the weighting process. Moreover, the closed-form Laplace transform of the model facilitates the quantification of Kendall's tau measure of dependence. The frailty model's parametric and flexible parametric variants are examined. For parameter estimation, the expectation-maximization technique is employed, taking advantage of the hierarchical representation of the frailty distribution, providing a simpler and more stable method than directly maximizing the observed log-likelihood function. The performance of the estimators is assessed numerically using Monte Carlo simulations. We apply our methodology to two medical datasets on cancer. The results indicate the benefits of the proposed model over existing frailty models in the literature. The implementation of the procedure is added to the R package extrafrail.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251338984"},"PeriodicalIF":1.6,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark A van de Wiel, Gwenaël Gr Leday, Martijn W Heymans, Erik W van Zwet, Ailko H Zwinderman, Jeroen Hoogland
{"title":"Alternatives to default shrinkage methods can improve prediction accuracy, calibration, and coverage: A methods comparison study.","authors":"Mark A van de Wiel, Gwenaël Gr Leday, Martijn W Heymans, Erik W van Zwet, Ailko H Zwinderman, Jeroen Hoogland","doi":"10.1177/09622802251338440","DOIUrl":"https://doi.org/10.1177/09622802251338440","url":null,"abstract":"<p><p>While shrinkage is essential in high-dimensional settings, its use for low-dimensional regression-based prediction has been debated. It reduces variance, often leading to improved prediction accuracy. However, it also inevitably introduces bias, which may harm two other measures of predictive performance: calibration and coverage of confidence intervals. Here, the latter evaluates whether the amount of uncertainty is correctly quantified. Much of the criticism stems from the usage of standard shrinkage methods, such as lasso and ridge with a single, cross-validated penalty. Our aim is to show that readily available alternatives may improve predictive performance, in terms of accuracy, calibration or coverage. We study linear and logistic regression. For linear regression, we use small sample splits of a large, fairly typical epidemiological data set to illustrate that usage of differential ridge penalties for covariate groups may enhance prediction accuracy, while calibration and coverage benefit from additional shrinkage of the penalties. Bayesian hierarchical modeling facilitates the latter, including local shrinkage. In the logistic regression setting, we apply an external simulation to illustrate that local shrinkage may improve calibration with respect to global shrinkage, while providing better prediction accuracy than other solutions, like Firth's correction. The potential benefits of the alternative shrinkage methods are easily accessible via example implementations in R, including the estimation of multiple penalties. A synthetic copy of the large data set is shared for reproducibility.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251338440"},"PeriodicalIF":1.6,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian inference for nonlinear mixed-effects location scale and interval-censoring cure-survival models: An application to pregnancy miscarriage.","authors":"Danilo Alvares, Cristian Meza, Rolando De la Cruz","doi":"10.1177/09622802251345485","DOIUrl":"https://doi.org/10.1177/09622802251345485","url":null,"abstract":"<p><p>Motivated by a pregnancy miscarriage study, we propose a Bayesian joint model for longitudinal and time-to-event outcomes that takes into account different complexities of the problem. In particular, the longitudinal process is modeled by means of a nonlinear specification with subject-specific error variance. In addition, the exact time of fetal death is unknown, and a subgroup of women is not susceptible to miscarriage. Hence, we model the survival process via a mixture cure model for interval-censored data. Finally, both processes are linked through the subject-specific longitudinal mean and variance. A simulation study is conducted in order to validate our joint model. In the real application, we use individual weighted and Cox-Snell residuals to assess the goodness-of-fit of our proposal versus a joint model that shares only the subject-specific longitudinal mean (standard approach). In addition, the leave-one-out cross-validation criterion is applied to compare the predictive ability of both models.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251345485"},"PeriodicalIF":1.6,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ornella Moro, Inger Torhild Gram, Maja-Lisa Løchen, Marit B Veierød, Ana Maria Wägner, Giovanni Sebastiani
{"title":"Quantification of the influence of risk factors with application to cardiovascular diseases in subjects with type 1 diabetes.","authors":"Ornella Moro, Inger Torhild Gram, Maja-Lisa Løchen, Marit B Veierød, Ana Maria Wägner, Giovanni Sebastiani","doi":"10.1177/09622802251327680","DOIUrl":"https://doi.org/10.1177/09622802251327680","url":null,"abstract":"<p><p>Future occurrence of a disease can be highly influenced by some specific risk factors. This work presents a comprehensive approach to quantify the event probability as a function of each separate risk factor by means of a parametric model. The proposed methodology is mainly described and applied here in the case of a linear model, but the non-linear case is also addressed. To improve estimation accuracy, three distinct methods are developed and their results are integrated. One of them is Bayesian, based on a non-informative prior. Each of the other two, uses aggregation of sample elements based on their factor values, which is optimized by means of a different specific criterion. For one of these two, optimization is performed by Simulated Annealing. The methodology presented is applicable across various diseases but here we quantify the risk for cardiovascular diseases in subjects with type 1 diabetes. The results obtained combining the three different methods show accurate estimates of cardiovascular risk variation rates for the factors considered. Furthermore, the detection of a biological activation phenomenon for one of the factors is also illustrated. To quantify the performances of the proposed methodology and to compare them with those from a known method used for this type of models, a large simulation study is done, whose results are illustrated here.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251327680"},"PeriodicalIF":1.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144111965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingjie Qiu, Mengyi Lu, Yan Han, Wenxian Zhou, Yi Zhao, Leng Han, Yong Zang
{"title":"A model-free phase I/II dose optimization design for immunotherapy trials.","authors":"Yingjie Qiu, Mengyi Lu, Yan Han, Wenxian Zhou, Yi Zhao, Leng Han, Yong Zang","doi":"10.1177/09622802251340246","DOIUrl":"https://doi.org/10.1177/09622802251340246","url":null,"abstract":"<p><p>We present a model-free phase I/II clinical trial design, referred to as the UFO design, to optimize the dose of immunotherapy by jointly modeling toxicity, efficacy, and immune response outcomes. Instead of relying on complex parametric modeling approaches, we propose a model-free approach that uses the inherent correlations among different types of outcomes in immunotherapy and the constrained dose-outcome order to facilitate information sharing across different doses. This approach ensures the efficiency and transparency of the UFO design to be implemented in clinical practice. The UFO design is also extended to accommodate the delayed outcomes. It demonstrates favorable operating characteristics through simulation studies. The R Shniy app for simulation and trial implementation using the UFO design is also provided at iusccc.shinyapps.io/smartdesign.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251340246"},"PeriodicalIF":1.6,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144080692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reproducible feature selection in heterogeneous multicenter datasets via sign-consistency criteria.","authors":"Xun Zhao, Yalu Ping","doi":"10.1177/09622802251338375","DOIUrl":"https://doi.org/10.1177/09622802251338375","url":null,"abstract":"<p><p>The identification of risk features associated with disease plays a crucial role in biomedical fields. These features are often used to provide evidence for clinical decision-making. However, in the presence of between-center heterogeneity, covariate effects across data centers may exhibit inconsistent directions, making feature selection challenging. In this work, we propose a novel framework to select reproducible risk features whose underlying effects are consistent across different centers. We quantify the feature reproducibility based on the sign-consistency criterion, which provides an acceptable level of heterogeneity in effect sizes and ensures the reasonable similarity of reproducible signals. Compared with the existing feature selection methods, our proposed method effectively protects data privacy and does not rely on the assumption of data homogeneity. Extensive simulations demonstrated that the proposed method has greater power than existing methods do. We apply the proposed approach to analyze data from the China Health and Retirement Study Longitudinal Study (CHARLS) and identify nine important risk factors that show reproducible associations with depression.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251338375"},"PeriodicalIF":1.6,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144080762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oya Kalaycıoğlu, Menelaos Pavlou, Serhat E Akhanlı, Mark A de Belder, Gareth Ambler, Rumana Z Omar
{"title":"Evaluating the sample size requirements of tree-based ensemble machine learning techniques for clinical risk prediction.","authors":"Oya Kalaycıoğlu, Menelaos Pavlou, Serhat E Akhanlı, Mark A de Belder, Gareth Ambler, Rumana Z Omar","doi":"10.1177/09622802251338983","DOIUrl":"https://doi.org/10.1177/09622802251338983","url":null,"abstract":"<p><p>Machine learning techniques (MLTs) are increasingly being used to develop clinical risk prediction models for binary health outcomes but the sample size requirements for developing and validating such models remain unclear. This study investigates whether sample size guidelines that target mean absolute prediction error (MAPE) for logistic regression models can be applied to tree-based ensemble MLTs (bagging, random forests, and boosting). Simulations based on two large cardiovascular datasets were used to evaluate the performance of MLTs in terms of MAPE, calibration, the <i>C</i>-statistic and Brier score, across six data-generating mechanisms (DGMs) and varying sample sizes. When the DGM and analysis model matched, boosting required a sample size 2-3 times larger than recommended; random forests and bagging did not achieve the target MAPE even with a 12-fold increase. For a neutral DGM that did not match any of the analysis models, logistic regression with only main effects and boosting resulted in target MAPE values with a 12-fold increase in the recommended sample size. For external validation, our simulations showed that sample size guidelines to achieve a target precision of the estimated <i>C</i>-statistic were suitable, and thus may be used to inform sample size calculations for MLTs.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251338983"},"PeriodicalIF":1.6,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144080758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Permutation-based global rank test with adaptive weights for multiple primary endpoints.","authors":"Satoshi Yoshida, Yusuke Yamaguchi, Kazushi Maruo, Masahiko Gosho","doi":"10.1177/09622802251334886","DOIUrl":"https://doi.org/10.1177/09622802251334886","url":null,"abstract":"<p><p>Multiple efficacy endpoints are investigated in clinical trials, and selecting the appropriate primary endpoints is key to the study's success. The global test is an analysis approach that can handle multiple endpoints without multiplicity adjustment. This test, which aggregates the statistics from multiple primary endpoints into a single statistic using weights for the statistical comparison, has been gaining increasing attention. A key consideration in the global test is determination of the weights. In this study, we propose a novel global rank test in which the weights for each endpoint are estimated based on the current study data to maximize the test statistic, and the permutation test is applied to control the type I error rate. Simulation studies conducted to compare the proposed test with other global tests show that the proposed test can control the type I error rate at the nominal level, regardless of the number of primary endpoints and correlations between endpoints. Additionally, the proposed test offers higher statistical powers when the efficacy is considerably different between endpoints or when endpoints are moderately correlated, such as when the correlation coefficient is greater than or equal to 0.5.</p>","PeriodicalId":22038,"journal":{"name":"Statistical Methods in Medical Research","volume":" ","pages":"9622802251334886"},"PeriodicalIF":1.6,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144080760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}