{"title":"Optimal design for matched pair cluster randomized trials with heterogeneous correlations and costs.","authors":"Arpan Singh","doi":"10.1080/02664763.2025.2537126","DOIUrl":"https://doi.org/10.1080/02664763.2025.2537126","url":null,"abstract":"<p><p>Conducting studies using cluster randomized trials (CRTs) is often a costly endeavor. When budget constraints are present, it becomes crucial to design CRTs optimally under the given cost limitations. A matched pair CRT, is a trial in which clusters are paired based on similar baseline characteristics, and one cluster from each pair is randomly assigned to the intervention while the other serves as control. Commonly, CRT designs assume equal intra-class correlation and sampling costs across clusters, but this assumption is rarely met in practice due to various factors. This article proposes optimal subject allocations within clusters for matched pair CRTs. The allocation is derived by minimizing the variance of treatment effect estimator under very general conditions, assuming heterogeneity in intra-class correlation, matching correlation, and sampling costs across clusters. The proposed design proves to be more efficient than the commonly used balanced design. To address the dependency on unknown parameters, min-max and pseudo-Bayesian optimal designs are also explored. Numerical examples based on real world data are provided to supplement the theoretical findings.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 5","pages":"778-797"},"PeriodicalIF":1.1,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13045187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online monitoring and early detection of influenza outbreaks using exponentially weighted spatial lasso: a case study in China during 2014-2020.","authors":"Shoumi Sarkar, Yuhang Zhou, Yang Yang, Peihua Qiu","doi":"10.1080/02664763.2025.2534915","DOIUrl":"https://doi.org/10.1080/02664763.2025.2534915","url":null,"abstract":"<p><p>Influenza poses a persistent public health threat in China, with substantial impacts on health and the economy, especially during seasonal epidemics and emerging outbreaks. Seasonality, local clustering, and serial correlation inherent in influenza data introduce spatio-temporal complexities that traditional statistical process control (SPC) methods cannot adequately capture. This study introduces a novel nonparametric framework for real-time influenza monitoring across 300+ Chinese cities from 2014 to 2020. Reference periods are selected to establish baseline incidence patterns and fit a nonparametric spatio-temporal model to estimate mean and covariance structures. These estimates enable the setting of dynamic outbreak thresholds. Next, exponentially weighted spatial LASSO (EWSL) charting statistics are computed for the monitoring period, prioritizing recent observations and detecting subtle mean shifts in small, clustered regions - well-suited to influenza's progression dynamics. Charting statistics exceeding control limits trigger timely outbreak warnings. Results demonstrate that our method consistently outperforms alternative methods, and existing literature corroborates that its early signals correspond to actual outbreaks - including those for H7N9 strains, influenza A and B viruses, and the initial spread of COVID-19. These findings highlight the potential of our approach as an effective epidemic monitoring tool, addressing complex spatio-temporal patterns and supporting timely, data-driven public health interventions.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 5","pages":"914-936"},"PeriodicalIF":1.1,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13045203/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Divan A Burger, Sean van der Merwe, Emmanuel Lesaffre
{"title":"Addressing outliers in mixed-effects logistic regression: a more robust modeling approach.","authors":"Divan A Burger, Sean van der Merwe, Emmanuel Lesaffre","doi":"10.1080/02664763.2025.2538076","DOIUrl":"https://doi.org/10.1080/02664763.2025.2538076","url":null,"abstract":"<p><p>This study introduces an outlier-robust model for analyzing hierarchically structured bounded count data within a Bayesian framework, utilizing a logistic regression approach implemented in JAGS. Our model incorporates a <i>t</i>-distributed latent variable to address overdispersion and outliers, improving robustness compared to conventional models such as the beta-binomial, binomial-logit-normal, and standard binomial models. Notably, our model targets a pseudo-median that differs from the true discrete median by less than one count; this closed-form quantity provides a robust and interpretable measure of central tendency. For comparability between all models, we additionally make predictions based on the mean proportion; however, this involves an integration step for the <i>t</i>-distributed nuisance parameter. While limited literature specifically addresses outliers in mixed models for bounded count data, this research fills that gap. The practical utility of the model is demonstrated using a longitudinal medication adherence dataset, where patient behavior often results in abrupt changes and outliers within individual trajectories. A simulation study demonstrates the binomial-logit-<i>t</i> model's strong performance, with comparison statistics favoring it among the four evaluated models. An additional data contamination simulation confirms its robustness against outliers. Our robust approach maintains the integrity of the dataset, effectively handling outliers to provide more accurate and reliable parameter estimates.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 5","pages":"832-854"},"PeriodicalIF":1.1,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13045181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naiara C A Dos Santos, Jorge L Bazán, Artur J Lemonte
{"title":"A mixed Bell regression model for overdispersed medical count data.","authors":"Naiara C A Dos Santos, Jorge L Bazán, Artur J Lemonte","doi":"10.1080/02664763.2025.2538084","DOIUrl":"https://doi.org/10.1080/02664763.2025.2538084","url":null,"abstract":"<p><p>In this article, we consider the discrete Bell distribution to introduce a new mixed-effects regression model that may be an interesting alternative to traditional mixed-effects models for count response variables. The new regression model can be applied in several areas including health data. We consider the frequentist and Bayesian approaches to perform inferences in this class of mixed regression models. We provide Monte Carlo simulation experiments to verify the performance of these approaches in estimating the mixed Bell regression model parameters. The simulation results are quite promising and indicate that these approaches are effective in doing that. We also consider model comparison criteria based on the frequentist and Bayesian approaches and simulations are considered to verify the performance of these criteria. Two empirical applications to real data of the proposed mixed-effects model are provided, and comparisons with the Poisson mixed-effects model, as well as the Poisson inverse Gaussian mixed-effects model, are made. The real data applications confirm that the proposed mixed-effects Bell regression model can be an interesting alternative in the modeling of count response variables.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 5","pages":"855-873"},"PeriodicalIF":1.1,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13045179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal group sizes for testing group mean differences using the Bayes factor.","authors":"Mirjam Moerbeek","doi":"10.1080/02664763.2025.2534898","DOIUrl":"https://doi.org/10.1080/02664763.2025.2534898","url":null,"abstract":"<p><p>Determination of group sizes is an important issue when planning a study that aims to compare mean outcomes across groups. Using equal group sizes is not the best choice in the case of heterogeneous costs and/or variances. Conventional optimal design methodology has shown that groups with higher variance and lower costs should include more subjects. However, these results are based on the framework of null hypothesis significance testing, which has received severe criticism over the past decades. The Bayesian approach to hypothesis testing has been proposed as an alternative and uses the Bayes factor to quantify the support of a hypothesis given the data. Group sizes that maximize the Bayes factor are determined, and it is shown how these optimal group sizes depend on the variances, costs and group means. Furthermore, it is shown to what degree the Bayes factor becomes smaller while using conventional optimal design methodology or equal group sizes. The optimal design methodology is illustrated using examples on multidisciplinary pain management and psychological status and asthma outcomes. A Shiny app has been made available to facilitate the use of the optimal design methodology.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 4","pages":"710-728"},"PeriodicalIF":1.1,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147468089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minh Hanh Nguyen, Thomas Neyens, Andrew B Lawson, Christel Faes
{"title":"Assessing the impact of neighborhood structures in Bayesian disease mapping.","authors":"Minh Hanh Nguyen, Thomas Neyens, Andrew B Lawson, Christel Faes","doi":"10.1080/02664763.2025.2533479","DOIUrl":"https://doi.org/10.1080/02664763.2025.2533479","url":null,"abstract":"<p><p>In Bayesian disease mapping, defining the neighborhood structure is crucial when fitting the conditional auto-regressive model. Yet, there has been little assessment of how different structures affect the model performance in case of fine-scale data. This paper explores this gap. In a case study examining COVID-19 pandemic effects, 2020 mortality is contrasted with pre-pandemic rates in small areas in Limburg (Belgium). Data are modeled using BYM and BYM2, with three broadening queen-neighborhood structures up to the fifth-order neighbors and two weight schemes. A simulation study assesses model performance in reproducing the pairwise spatial correlation at different neighbor orders. Models are compared regarding WAIC, goodness-of-fit, parameter estimates, and computation time. Results show that the order-based weight matrix performs better than the binary matrix. The simple first-order neighborhood structure shows comparable performance to larger higher-order structures while requiring much less computation time. The BYM model is more impacted by the choice of the neighborhood as compared to the BYM2 model. Our findings suggest minimal advantages in employing higher-order neighborhood matrices. In conclusion, our study indicates that opting for a simple first-order neighborhood structure is a pragmatic and suitable choice when applying a conditional auto-regressive model to fine-scale data in Bayesian disease mapping.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 3","pages":"537-553"},"PeriodicalIF":1.1,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12954808/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147354953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiwei Rong, Jiali Song, Fengyu Sun, Chenbo Zhang, Lan Mi, Yuqin Song, Yan Hou
{"title":"Bayesian biomarker effect estimate for combining data from multiple biomarker studies.","authors":"Zhiwei Rong, Jiali Song, Fengyu Sun, Chenbo Zhang, Lan Mi, Yuqin Song, Yan Hou","doi":"10.1080/02664763.2025.2528362","DOIUrl":"https://doi.org/10.1080/02664763.2025.2528362","url":null,"abstract":"<p><p>Pooling data from multiple studies enhances statistical power and precision for quantifying biomarker-disease associations. However, inter-study variability in biomarker measurements exists, requiring calibration to a reference assay to standardize biomarker data across contributing studies before pooling. In this study, we develop a novel Bayesian Biomarker Pooling (BBP) method to aggregate biomarker data from multiple study sources, which considers the reference measurements of biospecimens that have not been re-assayed as unobservable latent variables. We establish a two-level model of studies and biospecimens to delineate the relationships among reference measurements, local measurements, and outcomes. Furthermore, we compare the proposed BBP method with several prevalent methodologies: the internalized method, the full calibration method, the two-stage method, the naïve method, and the x-only method. Our results demonstrate that the BBP method outperforms the other methods. This advantage is particularly pronounced in scenarios involving high noise and strong effects. As an illustrative example, we apply these methods in a pooling analysis to evaluate the association between Human Epidermal Growth Factor Receptor 2 (HER2) gene expression levels and breast cancer risk. The full package is available online at https://github.com/luyiyun/bayesian_biomark_pooling.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 4","pages":"614-632"},"PeriodicalIF":1.1,"publicationDate":"2025-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147468167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ridge-penalized Zero-Inflated Probit Bell model for multicollinearity in count data.","authors":"Essoham Ali, Adewale F Lukman","doi":"10.1080/02664763.2025.2530551","DOIUrl":"https://doi.org/10.1080/02664763.2025.2530551","url":null,"abstract":"<p><p>This article develops a ridge estimator for the Zero-Inflated Probit Bell (ZIPBell) regression model. The ZIPBell model adapts the Zero-Inflated Bell (ZIBell) model originally proposed by Lemonte et al. (2019) by employing a probit link function for the zero-inflation component. Our contribution lies in incorporating ridge penalization into this framework, providing a methodology that stabilizes parameter estimates by reducing variance and mitigating multicollinearity effects without excluding correlated predictors. A numerical study and an empirical application illustrate the robustness of this approach across varying levels of multicollinearity and data sparsity, offering a reliable tool for analyzing complex count data with structural zeros and correlated predictors.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 4","pages":"633-658"},"PeriodicalIF":1.1,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981273/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147468118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jana Burkotová, Ivana Pavlů, Hiba Nassar, Jitka Machalová, Karel Hron
{"title":"Efficient spline orthogonal basis for representation of density functions.","authors":"Jana Burkotová, Ivana Pavlů, Hiba Nassar, Jitka Machalová, Karel Hron","doi":"10.1080/02664763.2025.2532621","DOIUrl":"https://doi.org/10.1080/02664763.2025.2532621","url":null,"abstract":"<p><p>Probability density functions form a specific class of functional data objects with intrinsic properties of scale invariance and relative scale characterized by the unit integral constraint. The Bayes spaces methodology respects their specific nature, and the centred log-ratio transformation enables processing such functional data in the standard Lebesgue space of square-integrable functions. As the data representing densities are frequently observed in their discrete form, the focus has been on their spline representation. Therefore, the crucial step in the approximation is to construct a proper spline basis reflecting their specific properties. Since the centred log-ratio transformation forms a subspace of functions with a zero integral constraint, the standard <i>B</i>-spline basis is no longer suitable. Recently, a new spline basis incorporating this zero integral property, called <math><mi>Z</mi> <mspace></mspace> <mi>B</mi></math> -splines, was developed. However, this basis does not possess the orthogonal property which is beneficial from computational and application point of view. As a result of this paper, we describe an efficient method for constructing an orthogonal <math><mi>Z</mi> <mspace></mspace> <mi>B</mi></math> -splines basis, called <math><mi>Z</mi> <mspace></mspace> <mi>B</mi></math> -splinets. The advantages of the <math><mi>Z</mi> <mspace></mspace> <mi>B</mi></math> -splinet approach are foremost a computational efficiency and locality of basis supports that is desirable for data interpretability, e.g. in the context of functional principal component analysis. The proposed approach is demonstrated on two empirical datasets.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 4","pages":"673-709"},"PeriodicalIF":1.1,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12985404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147468120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyu Zhou, Yunling Kang, Guidong Liu, Guoqiao You
{"title":"An improved LDA dimension reduction algorithm for multivariate time series classification.","authors":"Hongyu Zhou, Yunling Kang, Guidong Liu, Guoqiao You","doi":"10.1080/02664763.2025.2530580","DOIUrl":"https://doi.org/10.1080/02664763.2025.2530580","url":null,"abstract":"<p><p>In recent years, multivariate time series (MTS) classification has gradually become a research hotspot. However, due to the high-dimensional nature of MTS, directly classifying them often leads to suboptimal results. As a result, existing methods typically apply dimension reduction to the MTS dataset before classification. But the traditional MTS dimension reduction methods often lead to significant information redundancy or loss when dealing with unequal-length MTS dataset. To minimize information loss, this paper proposes a novel extraction method that helps transform unequal-length MTS dataset into equal-length MTS dataset. Furthermore, since existing dimension reduction methods ignore the fact that different MTS may have the same feature points at different time moments, this paper proposes a supervised dimension reduction method based on Linear Discriminant Analysis (LDA). This method aims to find the projection plane at each time point that minimizes the within-class scatter and maximizes the between-class scatter, thereby improving the effectiveness of dimension reduction. Experiments were conducted on 16 publicly available datasets. The results show that the proposed method effectively enhances classification performance after dimension reduction, achieving good experimental results.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 4","pages":"659-672"},"PeriodicalIF":1.1,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981257/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147468179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}