Jiwoong Yu, Chanhee Kim, Jaeseong Oh, An-Shun Tai, Woojoo Lee
{"title":"On the robustness of truncated negative binomial regression model: application to field epidemiology.","authors":"Jiwoong Yu, Chanhee Kim, Jaeseong Oh, An-Shun Tai, Woojoo Lee","doi":"10.1080/02664763.2025.2545890","DOIUrl":"https://doi.org/10.1080/02664763.2025.2545890","url":null,"abstract":"<p><p>Truncated count data are often obtained from field investigations conducted for individuals with some health-related symptoms to discover the possible causes of food-borne outbreaks quickly and accurately. This study shows two robust properties of the truncated negative binomial (TNB) model. First, by characterizing the whole set of models leading to the same likelihood function as the TNB model, we find a practical meaning that the TNB model gives reliable inference for the regression coefficients even zero inflation is allowed, but a careful interpretation of the regression coefficients is needed. Second, the TNB model can be derived from the Poisson distribution with the random intercept following a gamma distribution, however, it is difficult to justify the distribution assumption for the random intercept. We find that the TNB model presents robust inference for the slope parameters against a misspecified random effect distribution. With some analytic justifications, our numerical study shows that the empirical coverage based on the TNB model is close to its nominal level, even when the random effect distribution is misspecified. The TNB model is applied to analyze truncated count data from the food-borne outbreak that occurred in South Korea.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 6","pages":"1056-1074"},"PeriodicalIF":1.1,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13134751/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147815590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Change-point analysis using two-sample empirical likelihood method with applications to climatology.","authors":"Svetlana Aniskevich, Reinis Alksnis, Janis Valeinis, Lidija Dame","doi":"10.1080/02664763.2025.2543051","DOIUrl":"https://doi.org/10.1080/02664763.2025.2543051","url":null,"abstract":"<p><p>The change-point detection in time series analysis is the problem of discovering time points at which the properties of data change. In this paper, we deal with detecting shifts in mean values for weakly dependent data. This covers a broad range of real-world problems since the real data may have a dependence structure that violates the assumptions of some popular statistical tests. For the change-point detection, we establish and propose to use the two-sample blockwise empirical likelihood for the difference of two-sample means. We recommend to produce the adjusted <i>p</i>-value graphs showing not only the statistical significance, but allowing also to detect the location of the change-point graphically and numerically. We compare the two-sample empirical likelihood method by the simulation study with some classical methods for the change-point detection and show the advantages of the method for weakly dependent observations. Using the historical wind speed observations in Latvia, we demonstrate the applicability of the proposed method to the real data. The method has been implemented using the R-package <i>EL</i>, which deals with different two-sample problems.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 6","pages":"1029-1055"},"PeriodicalIF":1.1,"publicationDate":"2025-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13134753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147815596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinran Zhang, Ruonan Zheng, Min-Qian Liu, Jian-Feng Yang
{"title":"Order-of-addition experiments for sequential adjacency relationship problems.","authors":"Xinran Zhang, Ruonan Zheng, Min-Qian Liu, Jian-Feng Yang","doi":"10.1080/02664763.2025.2547801","DOIUrl":"https://doi.org/10.1080/02664763.2025.2547801","url":null,"abstract":"<p><p>Order-of-addition (OofA) experiments are widely utilized in diverse fields, such as industry and pharmacy. The two most commonly employed models are the pairwise ordering model and the component-position model. However, in certain experimental problems, the response only depends on the adjacency relationship (AR) between components, rather than their absolute or relative positions. This is referred to as the AR problem. Among the different types of AR problems, spatial AR and sequential AR are frequently discussed, yet research on sequential AR remains rather limited. In this paper, we introduce OofAM, a method grounded in OofA experiments to tackle the sequential AR problem. The proposed method encompasses a novel model, designs with certain theoretical properties, along with some analytical techniques for inferring the optimal orders. As a first attempt to apply OofA experiments to solve the sequential AR problem, OofAM is both straightforward and information-efficient. Moreover, case studies demonstrate that the proposed method outperforms other methods in terms of efficiency, especially for large-scale problems.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 6","pages":"1075-1097"},"PeriodicalIF":1.1,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13148098/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147838318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel two-sample joint unified hybrid censoring scheme with the application of insulating fluid data.","authors":"Subhankar Dutta, Deepak Prajapati, Debasis Kundu","doi":"10.1080/02664763.2025.2542423","DOIUrl":"https://doi.org/10.1080/02664763.2025.2542423","url":null,"abstract":"<p><p>In life testing tests, various censoring schemes are employed, primarily Type-I and Type-II censoring schemes and their modified forms. In life testing experiments, most tests are based on a single sample. When conducting comparative life tests of products from different production lines within the same facility, a joint censoring scheme is quite useful. In this article, a novel joint unified hybrid censoring scheme has been proposed for two sample populations. Based on the assumption that the lifetime distributions of the two populations follow a Weibull distribution, we provide the maximum likelihood estimators of the unknown parameters. The asymptotic confidence intervals for the parameters have been constructed using the observed Fisher information matrix. Further, the Bayes estimates have been derived using informative gamma priors under symmetric and asymmetric loss functions. The Markov chain Monte Carlo method has been employed to obtain the Bayes estimates. The results indicated that the Bayes estimates outperform the other estimators in a very satisfactory manner. A comparison of expected test time is done with another censoring scheme, where the proposed joint censoring scheme performs well. Finally, a real-life data set has been analyzed to demonstrate the utility of the presented techniques in investigating such phenomena.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 6","pages":"978-1003"},"PeriodicalIF":1.1,"publicationDate":"2025-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13134754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147815521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A data-analytics framework for exploring regression associations in multivariate categorical data of firefighters' PTSD.","authors":"Saebom Jeon, Daeyoung Kim","doi":"10.1080/02664763.2025.2543043","DOIUrl":"https://doi.org/10.1080/02664763.2025.2543043","url":null,"abstract":"<p><p>We propose a data-analytics framework for exploratory research that aims for a comprehensive understanding of potential associations in the survey data regarding firefighters' PTSD (Post-Traumatic Stress Disorder). The primary focus is to obtain insights regarding joint, marginal and conditional regression associations between an ordinal response variable, firefighters' PTSD, and a set of categorical risk factors in a comprehensive and integrated manner. To achieve this goal, the proposed framework incorporates two established data-driven methodologies: the recently developed non-model based regression association measure named as SCCRAM (Scaled Checkerboard Copula Regression Association Measure) and resampling (bootstrap/permutation) methods. The former facilitates the identification of subsets of risk factors that more effectively account for the overall regression association with PTSD, while also elucidating the roles of the relevant risk factors in both marginal and conditional aspects. The latter provides valuable information pertaining to uncertainties and statistical significances, as well as potential biases and the credibility of the estimated regression associations in multi-dimensional contingency tables which are often subject to sparseness or imbalance. Utilizing the proposed approach, our empirical findings indicate that disorder/mental health related factors have a more substantial association with PTSD, and the relationship between demographic/job-related factors and PTSD becomes more pronounced when accounting for the disorder/mental health factors.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 6","pages":"1004-1028"},"PeriodicalIF":1.1,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13148093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147838108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joseph Boccardo, William Tanberg, Jeffrey C Miecznikowski
{"title":"Cardinality-based sparse singular value decomposition for similarity matrices.","authors":"Joseph Boccardo, William Tanberg, Jeffrey C Miecznikowski","doi":"10.1080/02664763.2025.2537120","DOIUrl":"https://doi.org/10.1080/02664763.2025.2537120","url":null,"abstract":"<p><p>Sparse decomposition methods have been studied in the context of principal component analysis (PCA). Many of these methods control the number of non-zero elements of the eigenvector through the tuning of a regularization parameter(s). Other approaches allow for the direct choice of cardinality to create sparse eigenvectors. As PCA is not applicable to all settings, such as analyzing the cross-correlation matrix, we extend cardinality-based PCA to cardinality-based singular value decomposition (SVD). Our method allows the user to independently input their desired cardinality of the left and right singular vectors of any continuous data matrix. This will create sparse singular vectors consisting of the most impactful variables. In addition, we extend our method from a rank-1 SVD approximation to an SVD approximation greater than rank 1, and create left and right matrices of sparse singular vectors.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 5","pages":"751-777"},"PeriodicalIF":1.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13045202/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147622996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated identification of autocorrelated control chart patterns utilizing developed convolutional neural networks.","authors":"Soheila Nazari, Fatemeh Sogandi","doi":"10.1080/02664763.2025.2542412","DOIUrl":"https://doi.org/10.1080/02664763.2025.2542412","url":null,"abstract":"<p><p>Control Chart Pattern Recognition (CCPR) plays a crucial role in maintaining product quality. This paper presents some recognition models that utilizes a custom convolutional network alongside pretrained VGG19, MobileNet, and LeNet networks to identify Control Chart Patterns (CCP) for autocorrelated processes. The suggested architectures autonomously extract features from input data, in contrast to conventional methods that necessitate manual feature engineering. This study addresses the challenge of training deep networks with insufficient training data by employing transfer learning to refine a VGG19, MobileNet, and LeNet model, for a new CCPR task. The comparison of the performance between pre-trained networks and the extended convolutional network as a 2D CNN, which does not utilize transfer learning, indicates that pre-trained networks attain superior recognition accuracies with a smaller training data. On the other hand, the pretrained VGG19 demonstrates superior performance when compared to 1D CNN and conventional machine learning techniques, highlighting the advantages of utilizing transfer learning. Additionally, the utilization of pre-trained models addresses the challenge of the intricate design involved in the feature extraction component of deep networks that possess numerous hyperparameters. The suggested method has demonstrated significant potential for identifying CCPs, as evidenced by comparative experimental findings and a real-world case study.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 6","pages":"959-977"},"PeriodicalIF":1.1,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13134752/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147815592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kundan Singh, Chandrakant Lodhi, Yogesh Mani Tripathi, Liang Wang
{"title":"Inference under balanced joint progressive type-II censoring scheme.","authors":"Kundan Singh, Chandrakant Lodhi, Yogesh Mani Tripathi, Liang Wang","doi":"10.1080/02664763.2025.2537130","DOIUrl":"https://doi.org/10.1080/02664763.2025.2537130","url":null,"abstract":"<p><p>In this paper, we consider a balanced joint progressive type-II censoring scheme and develop inference procedures for populations exhibiting bathtub-shaped hazard rates. Such models are appropriate for modeling phenomena which indicate non-monotone failure pattern. We focus on finding useful inferences upon model parameters by considering the Chen distribution. Point and interval estimation are considered using maximum likelihood and Bayesian methods. The existence and uniqueness of the maximum likelihood estimators are established. Furthermore, asymptotic confidence intervals and bootstrap-based intervals are constructed for the model parameters. Bayesian estimates and corresponding highest posterior density intervals are obtained using an importance sampling technique under general prior assumptions. The performance of the Bayesian estimators is assessed and compared with classical estimates through extensive Monte Carlo simulation studies. A real data example is also presented to demonstrate the practical applicability of proposed methods.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 5","pages":"798-831"},"PeriodicalIF":1.1,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13045186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unveiling topic dependencies through a multilevel topic model: a hierarchical approach to enhanced interpretability.","authors":"Youngsun Kim, Hwan Chung, Saebom Jeon","doi":"10.1080/02664763.2025.2540380","DOIUrl":"https://doi.org/10.1080/02664763.2025.2540380","url":null,"abstract":"<p><p>Topic modeling is a process that discovers key themes in unstructured text data by identifying the distribution of topics and words in a document, revealing hidden dimensions. Latent Dirichlet allocation is a widely used generative probabilistic topic model, but it cannot capture the dependency between topics. Generally, the topics within a document are primarily influenced by its overarching theme which naturally interrelates the topics. Thus, it is imperative to unveil such relationships between the topics. To this end, this study proposes a multilevel topic model (MTM) to unearth the hidden topic dependency in a corpus through multilevel latent structure. The MTM allows word-based topic proportions to vary across the higher-level latent structure. The parameters are estimated with a modified EM algorithm using an upward-downward approach to alleviate the computational complexity. Empirical studies on corpora have also been conducted on the multilevel topic model and the hierarchy of multilevel topic model have been interpreted. These analyses have demonstrated that the proposed multilevel topic model outperforms latent Dirichlet allocation in terms of systematic interpretability.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 5","pages":"894-913"},"PeriodicalIF":1.1,"publicationDate":"2025-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13045173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147623157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Minotto, Ingrid Hobæk Haff, Enrico Riccardi, Geir K Sandve
{"title":"Detecting statistical interactions in immune receptor data: a comparative study.","authors":"Thomas Minotto, Ingrid Hobæk Haff, Enrico Riccardi, Geir K Sandve","doi":"10.1080/02664763.2025.2533483","DOIUrl":"https://doi.org/10.1080/02664763.2025.2533483","url":null,"abstract":"<p><p>Statistical interactions are part of numerous data generating processes and several methods have been developed to detect them. We here study immune receptors binding to antigens, where advanced machine learning techniques have proved useful for binding prediction, suggesting significant intra amino acid chain interactions. We reviewed detection methods based on logistic lasso, logic regression, random forests and neural networks. We compared detection performance in simulated immune data, and how it is affected by the order of interactions, their strength related to the main effects, their frequency of occurrence and the size of the data. Interactions were implanted as motifs of amino acids that determined the binding status of sequences through a logistic regression model. Results show that pairwise interactions were retrieved from just 1000 sequences in the dataset, and optimal detection happened for an implantation rate of around 20 percent. For higher-order interactions, the best performance was obtained by logic regression and random forest based methods. The running time for the neural network-based method was several orders of magnitude lower, followed by the lasso-based methods. We applied the methods on an experimental dataset and identified several pairwise interactions as well as a three-way interaction, enhancing the accuracy of prediction models.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 4","pages":"729-750"},"PeriodicalIF":1.1,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147468086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}