{"title":"A sequential feature selection approach to change point detection in mean-shift change point models","authors":"","doi":"10.1007/s00362-024-01548-y","DOIUrl":"https://doi.org/10.1007/s00362-024-01548-y","url":null,"abstract":"<h3>Abstract</h3> <p>Change point detection is an important area of scientific research and has applications in a wide range of fields. In this paper, we propose a sequential change point detection (SCPD) procedure for mean-shift change point models. Unlike classical feature selection based approaches, the SCPD method detects change points in the order of the conditional change sizes and makes full use of the identified change points information. The extended Bayesian information criterion (EBIC) is employed as the stopping rule in the SCPD procedure. We investigate the theoretical property of the procedure and compare its performance with other methods existing in the literature. It is established that the SCPD procedure has the property of detection consistency. Simulation studies and real data analyses demonstrate that the SCPD procedure has the edge over the other methods in terms of detection accuracy and robustness.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"33 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hypothesis testing for varying coefficient models in tail index regression","authors":"Koki Momoki, Takuma Yoshida","doi":"10.1007/s00362-024-01538-0","DOIUrl":"https://doi.org/10.1007/s00362-024-01538-0","url":null,"abstract":"<p>This study examines the varying coefficient model in tail index regression. The varying coefficient model is an efficient semiparametric model that avoids the curse of dimensionality when including large covariates in the model. In fact, the varying coefficient model is useful in mean, quantile, and other regressions. The tail index regression is not an exception. However, the varying coefficient model is flexible, but leaner and simpler models are preferred for applications. Therefore, it is important to evaluate whether the estimated coefficient function varies significantly with covariates. If the effect of the non-linearity of the model is weak, the varying coefficient structure is reduced to a simpler model, such as a constant or zero. Accordingly, the hypothesis test for model assessment in the varying coefficient model has been discussed in mean and quantile regression. However, there are no results in tail index regression. In this study, we investigate the asymptotic properties of an estimator and provide a hypothesis testing method for varying coefficient models for tail index regression.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"41 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimum contrast for the first-order intensity estimation of spatial and spatio-temporal point processes","authors":"Nicoletta D’Angelo, Giada Adelfio","doi":"10.1007/s00362-024-01541-5","DOIUrl":"https://doi.org/10.1007/s00362-024-01541-5","url":null,"abstract":"<p>In this paper, we harness a result in point process theory, specifically the expectation of the weighted <i>K</i>-function, where the weighting is done by the true first-order intensity function. This theoretical result can be employed as an estimation method to derive parameter estimates for a particular model assumed for the data. The underlying motivation is to avoid the difficulties associated with dealing with complex likelihoods in point process models and their maximization. The exploited result makes our method theoretically applicable to any model specification. In this paper, we restrict our study to Poisson models, whose likelihood represents the base for many more complex point process models. In this context, our proposed method can estimate the vector of local parameters that correspond to the points within the analyzed point pattern without introducing any additional complexity compared to the global estimation. We illustrate the method through simulation studies for both purely spatial and spatio-temporal point processes and show complex scenarios based on the Poisson model through the analysis of two real datasets concerning environmental problems.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The resampling method via representative points","authors":"Long-Hao Xu, Yinan Li, Kai-Tai Fang","doi":"10.1007/s00362-024-01536-2","DOIUrl":"https://doi.org/10.1007/s00362-024-01536-2","url":null,"abstract":"<p>The bootstrap method relies on resampling from the empirical distribution to provide inferences about the population with a distribution <i>F</i>. The empirical distribution serves as an approximation to the population. It is possible, however, to resample from another approximating distribution of <i>F</i> to conduct simulation-based inferences. In this paper, we utilize representative points to form an alternative approximating distribution of <i>F</i> for resampling. The representative points in terms of minimum mean squared error from <i>F</i> have been widely applied to numerical integration, simulation, and the problems of grouping, quantization, and classification. The method of resampling via representative points can be used to estimate the sampling distribution of a statistic of interest. A basic theory for the proposed method is established. We prove the convergence of higher-order moments of the new approximating distribution of <i>F</i>, and establish the consistency of sampling distribution approximation in the cases of the sample mean and sample variance under the Kolmogorov metric and Mallows–Wasserstein metric. Based on some numerical studies, it has been shown that the proposed resampling method improves the nonparametric bootstrap in terms of confidence intervals for mean and variance.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"84 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An heuristic scree plot criterion for the number of factors","authors":"","doi":"10.1007/s00362-023-01517-x","DOIUrl":"https://doi.org/10.1007/s00362-023-01517-x","url":null,"abstract":"<h3>Abstract</h3> <p>Cattel’s (Multivar Behav Res 1:245–276, 1966) heuristic determines the number of factors as the elbow point between ‘steep’ and ‘not steep’ in the scree plot. In contrast, an elbow is by definition absent in points on a hyberbole with corresponding equisized surfaces. We formalize this heuristic and propose a criterion to determine the number of factors by comparing surfaces under the scree plot. Monte Carlo simulations shows that the finite-sample properties of our proposed criterion outperform benchmarks in the dynamic factor model literature.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"44 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A semi-orthogonal nonnegative matrix tri-factorization algorithm for overlapping community detection","authors":"Zhaoyang Li, Yuehan Yang","doi":"10.1007/s00362-024-01537-1","DOIUrl":"https://doi.org/10.1007/s00362-024-01537-1","url":null,"abstract":"<p>In this paper, we focus on overlapping community detection and propose an efficient semi-orthogonal nonnegative matrix tri-factorization (semi-ONMTF) algorithm. This method factorizes a matrix <i>X</i> into an orthogonal matrix <i>U</i>, a nonnegative matrix <i>B</i>, and a transposed matrix <span>(U^mathrm {scriptscriptstyle T} )</span>. We use the Cayley Transformation to maintain strict orthogonality of <i>U</i> that each iteration stays on the Stiefel Manifold. This algorithm is computationally efficient because the solutions of <i>U</i> and <i>B</i> are simplified into a matrix-wise update algorithm. Applying this method, we detect overlapping communities by the belonging coefficient vector and analyse associations between communities by the unweighted network of communities. We conduct simulations and applications to show that the proposed method has wide applicability. In a real data example, we apply the semi-ONMTF to a stock data set and construct a directed association network of companies. Based on the modularity for directed and overlapping communities, we obtain five overlapping communities, 17 overlapping nodes, and five outlier nodes in the network. We also discuss the associations between communities, providing insights into the overlapping community detection on the stock market network.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"395 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abbas Parchami, Przemyslaw Grzegorzewski, Maciej Romaniuk
{"title":"Statistical simulations with LR random fuzzy numbers","authors":"Abbas Parchami, Przemyslaw Grzegorzewski, Maciej Romaniuk","doi":"10.1007/s00362-024-01533-5","DOIUrl":"https://doi.org/10.1007/s00362-024-01533-5","url":null,"abstract":"<p>Computer simulations are a powerful tool in many fields of research. This also applies to the broadly understood analysis of experimental data, which are frequently burdened with multiple imperfections. Often the underlying imprecision or vagueness can be suitably described in terms of fuzzy numbers which enable also the capture of subjectivity. On the other hand, due to the random nature of the experimental data, the tools for their description must take into account their statistical nature. In this way, we come to random fuzzy numbers that model fuzzy data and are also solidly formalized within the probabilistic setting. In this contribution, we introduce the so-called LR random fuzzy numbers that can be used in various Monte-Carlo simulations on fuzzy data. The proposed method of generating fuzzy numbers with membership functions given by probability densities is both simple and rich, well-grounded mathematically, and has a high application potential.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"23 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140070462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimax weight learning for absorbing MDPs","authors":"Fengying Li, Yuqiang Li, Xianyi Wu","doi":"10.1007/s00362-023-01491-4","DOIUrl":"https://doi.org/10.1007/s00362-023-01491-4","url":null,"abstract":"<p>Reinforcement learning policy evaluation problems are often modeled as finite or discounted/averaged infinite-horizon Markov Decision Processes (MDPs). In this paper, we study undiscounted off-policy evaluation for absorbing MDPs. Given the dataset consisting of i.i.d episodes under a given truncation level, we propose an algorithm (referred to as MWLA in the text) to directly estimate the expected return via the importance ratio of the state-action occupancy measure. The Mean Square Error (MSE) bound of the MWLA method is provided and the dependence of statistical errors on the data size and the truncation level are analyzed. The performance of the algorithm is illustrated by means of computational experiments under an episodic taxi environment</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"43 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuyi Liang, Kai-Tai Fang, Xin-Wei Huang, Yijing Xin, Chang-Xing Ma
{"title":"Homogeneity tests and interval estimations of risk differences for stratified bilateral and unilateral correlated data","authors":"Shuyi Liang, Kai-Tai Fang, Xin-Wei Huang, Yijing Xin, Chang-Xing Ma","doi":"10.1007/s00362-024-01532-6","DOIUrl":"https://doi.org/10.1007/s00362-024-01532-6","url":null,"abstract":"<p>In clinical trials studying paired parts of a subject with binary outcomes, it is expected to collect measurements bilaterally. However, there are cases where subjects contribute measurements for only one part. By utilizing combined data, it is possible to gain additional information compared to using bilateral or unilateral data alone. With the combined data, this article investigates homogeneity tests of risk differences with the presence of stratification effects and proposes interval estimations of a common risk difference if stratification does not introduce underlying dissimilarities. Under Dallal’s model (Biometrics 44:253–257, 1988), we propose three test statistics and evaluate their performances regarding type I error controls and powers. Confidence intervals of a common risk difference with satisfactory coverage probabilities and interval length are constructed. Our simulation results show that the score test is the most robust and the profile likelihood confidence interval outperforms other methods proposed. Data from a study of acute otitis media is used to illustrate our proposed procedures.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"55 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140033154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Welch’s t test is more sensitive to real world violations of distributional assumptions than student’s t test but logistic regression is more robust than either","authors":"David Curtis","doi":"10.1007/s00362-024-01531-7","DOIUrl":"https://doi.org/10.1007/s00362-024-01531-7","url":null,"abstract":"<p>It has previously been pointed out that Student’s <i>t</i> test, which assumes that samples are drawn from populations with equal standard deviations, can have an inflated Type I error rate if this assumption is violated. Hence it has been recommended that Welch’s <i>t</i> test should be preferred. In the context of carrying out gene-wise weighted burden tests for detecting association of rare variants with psoriasis we observe that Welch’s test performs unsatisfactorily. We show that if the assumption of normality is violated and observations follow a Poisson distribution, then with unequal sample sizes Welch’s <i>t</i> test has an inflated Type I error rate, is systematically biased and is prone to produce extremely low <i>p</i> values. We argue that such data can arise in a variety of real world situations and believe that researchers should be aware of this issue. Student’s <i>t</i> test performs much better in this scenario but a likelihood ratio test based on logistic regression models performs better still and we suggest that this might generally be a preferable method to test for a difference in distributions between two samples.</p><p>This research has been conducted using the UK Biobank Resource.</p>","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"239 ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140037982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}