{"title":"Ranking handball teams from statistical strength estimation","authors":"Florian Felice","doi":"10.1007/s00180-024-01522-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01522-0","url":null,"abstract":"<p>In this work, we present a methodology to estimate the strength of handball teams. We propose the use of the Conway-Maxwell-Poisson distribution to model the number of goals scored by a team as a flexible discrete distribution which can handle situations of non equi-dispersion. From its parameters, we derive a mathematical formula to determine the strength of a team. We propose a ranking based on the estimated strengths to compare teams across different championships. Applied to female handball club data from European competitions over the 2022/2023 season, we show that our new proposed ranking can have an echo in real sports events and is linked to recent results from European competitions.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"24 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyunman Sim, Sungjeong Lee, Bo-Hyung Kim, Eun Shin, Woojoo Lee
{"title":"Hypothesis testing in Cox models when continuous covariates are dichotomized: bias analysis and bootstrap-based test","authors":"Hyunman Sim, Sungjeong Lee, Bo-Hyung Kim, Eun Shin, Woojoo Lee","doi":"10.1007/s00180-024-01520-2","DOIUrl":"https://doi.org/10.1007/s00180-024-01520-2","url":null,"abstract":"<p>Hypothesis testing for the regression coefficient associated with a dichotomized continuous covariate in a Cox proportional hazards model has been considered in clinical research. Although most existing testing methods do not allow covariates, except for a dichotomized continuous covariate, they have generally been applied. Through an analytic bias analysis and a numerical study, we show that the current practice is not free from an inflated type I error and a loss of power. To overcome this limitation, we develop a bootstrap-based test that allows additional covariates and dichotomizes two-dimensional covariates into a binary variable. In addition, we develop an efficient algorithm to speed up the calculation of the proposed test statistic. Our numerical study demonstrates that the proposed bootstrap-based test maintains the type I error well at the nominal level and exhibits higher power than other methods, as well as that the proposed efficient algorithm reduces computational costs.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trend of high dimensional time series estimation using low-rank matrix factorization: heuristics and numerical experiments via the TrendTM package","authors":"Emilie Lebarbier, Nicolas Marie, Amélie Rosier","doi":"10.1007/s00180-024-01519-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01519-9","url":null,"abstract":"<p>This article focuses on the practical issue of a recent theoretical method proposed for trend estimation in high dimensional time series. This method falls within the scope of the low-rank matrix factorization methods in which the temporal structure is taken into account. It consists of minimizing a penalized criterion, theoretically efficient but which depends on two constants to be chosen in practice. We propose a two-step strategy to solve this question based on two different known heuristics. The performance and a comparison of the strategies are studied through an important simulation study in various scenarios. In order to make the estimation method with the best strategy available to the community, we implemented the method in an R package <span>TrendTM</span> which is presented and used here. Finally, we give a geometric interpretation of the results by linking it to PCA and use the results to solve a high-dimensional curve clustering problem. The package is available on CRAN.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"3 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liwen Wang, Yongda Wang, Shifeng Xiong, Jiankui Yang
{"title":"Some aspects of nonlinear dimensionality reduction","authors":"Liwen Wang, Yongda Wang, Shifeng Xiong, Jiankui Yang","doi":"10.1007/s00180-024-01514-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01514-0","url":null,"abstract":"<p>In this paper we discuss nonlinear dimensionality reduction within the framework of principal curves. We formulate dimensionality reduction as problems of estimating principal subspaces for both noiseless and noisy cases, and propose the corresponding iterative algorithms that modify existing principal curve algorithms. An R squared criterion is introduced to estimate the dimension of the principal subspace. In addition, we present new regression and density estimation strategies based on our dimensionality reduction algorithms. Theoretical analyses and numerical experiments show the effectiveness of the proposed methods.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"202 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong
{"title":"Double truncation method for controlling local false discovery rate in case of spiky null","authors":"Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong","doi":"10.1007/s00180-024-01510-4","DOIUrl":"https://doi.org/10.1007/s00180-024-01510-4","url":null,"abstract":"<p>Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives,\" but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method.\" After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"25 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymptotic properties of kernel density and hazard rate function estimators with censored widely orthant dependent data","authors":"Yi Wu, Wei Wang, Wei Yu, Xuejun Wang","doi":"10.1007/s00180-024-01509-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01509-x","url":null,"abstract":"<p>Kernel estimators of density function and hazard rate function are very important in nonparametric statistics. The paper aims to investigate the uniformly strong representations and the rates of uniformly strong consistency for kernel smoothing density and hazard rate function estimation with censored widely orthant dependent data based on the Kaplan–Meier estimator. Under some mild conditions, the rates of the remainder term and strong consistency are shown to be <span>(Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )~a.s.)</span> and <span>(Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )+Obig (b_{n}^{2}big )~a.s.)</span>, respectively, where <i>g</i>(<i>n</i>) are the dominating coefficients of widely orthant dependent random variables. Some numerical simulations and a real data analysis are also presented to confirm the theoretical results based on finite sample performances.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"128 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expectile regression averaging method for probabilistic forecasting of electricity prices","authors":"Joanna Janczura","doi":"10.1007/s00180-024-01508-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01508-y","url":null,"abstract":"<p>In this paper we propose a new method for probabilistic forecasting of electricity prices. It is based on averaging point forecasts from different models combined with expectile regression. We show that deriving the predicted distribution in terms of expectiles, might be in some cases advantageous to the commonly used quantiles. We apply the proposed method to the day-ahead electricity prices from the German market and compare its accuracy with the Quantile Regression Averaging method and quantile- as well as expectile-based historical simulation. The obtained results indicate that using the expectile regression improves the accuracy of the probabilistic forecasts of electricity prices, but a variance stabilizing transformation should be applied prior to modelling.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Projection predictive variable selection for discrete response families with finite support","authors":"Frank Weber, Änne Glass, Aki Vehtari","doi":"10.1007/s00180-024-01506-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01506-0","url":null,"abstract":"<p>The projection predictive variable selection is a decision-theoretically justified Bayesian variable selection approach achieving an outstanding trade-off between predictive performance and sparsity. Its projection problem is not easy to solve in general because it is based on the Kullback–Leibler divergence from a restricted posterior predictive distribution of the so-called reference model to the parameter-conditional predictive distribution of a candidate model. Previous work showed how this projection problem can be solved for response families employed in generalized linear models and how an approximate latent-space approach can be used for many other response families. Here, we present an exact projection method for all response families with discrete and finite support, called the augmented-data projection. A simulation study for an ordinal response family shows that the proposed method performs better than or similarly to the previously proposed approximate latent-space projection. The cost of the slightly better performance of the augmented-data projection is a substantial increase in runtime. Thus, if the augmented-data projection’s runtime is too high, we recommend the latent projection in the early phase of the model-building workflow and the augmented-data projection for final results. The ordinal response family from our simulation study is supported by both projection methods, but we also include a real-world cancer subtyping example with a nominal response family, a case that is not supported by the latent projection.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"42 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient regression analyses with zero-augmented models based on ranking","authors":"Deborah Kanda, Jingjing Yin, Xinyan Zhang, Hani Samawi","doi":"10.1007/s00180-024-01503-3","DOIUrl":"https://doi.org/10.1007/s00180-024-01503-3","url":null,"abstract":"<p>Several zero-augmented models exist for estimation involving outcomes with large numbers of zero. Two of such models for handling count endpoints are zero-inflated and hurdle regression models. In this article, we apply the extreme ranked set sampling (ERSS) scheme in estimation using zero-inflated and hurdle regression models. We provide theoretical derivations showing superiority of ERSS compared to simple random sampling (SRS) using these zero-augmented models. A simulation study is also conducted to compare the efficiency of ERSS to SRS and lastly, we illustrate applications with real data sets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"5 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140935059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaohui Liu, Yuzi Liu, Petra Laketa, Stanislav Nagy, Yuting Chen
{"title":"Exact and approximate computation of the scatter halfspace depth","authors":"Xiaohui Liu, Yuzi Liu, Petra Laketa, Stanislav Nagy, Yuting Chen","doi":"10.1007/s00180-024-01500-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01500-6","url":null,"abstract":"<p>The scatter halfspace depth (<b>sHD</b>) is an extension of the location halfspace (also called Tukey) depth that is applicable in the nonparametric analysis of scatter. Using <b>sHD</b>, it is possible to define minimax optimal robust scatter estimators for multivariate data. The problem of exact computation of <b>sHD</b> for data of dimension <span>(d ge 2)</span> has, however, not been addressed in the literature. We develop an exact algorithm for the computation of <b>sHD</b> in any dimension <i>d</i> and implement it efficiently for any dimension <span>(d ge 1)</span>. Since the exact computation of <b>sHD</b> is slow especially for higher dimensions, we also propose two fast approximate algorithms. All our programs are freely available in the <span>R</span> package <span>scatterdepth</span>.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"43 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}