Salvatore D. Tomarchio, Luca Bagnato, Antonio Punzo
{"title":"Model-based clustering via new parsimonious mixtures of heavy-tailed distributions","authors":"Salvatore D. Tomarchio, Luca Bagnato, Antonio Punzo","doi":"10.1007/s10182-021-00430-8","DOIUrl":"10.1007/s10182-021-00430-8","url":null,"abstract":"<div><p>Two families of parsimonious mixture models are introduced for model-based clustering. They are based on two multivariate distributions-the shifted exponential normal and the tail-inflated normal-recently introduced in the literature as heavy-tailed generalizations of the multivariate normal. Parsimony is attained by the eigen-decomposition of the component scale matrices, as well as by the imposition of a constraint on the tailedness parameters. Identifiability conditions are also provided. Two variants of the expectation-maximization algorithm are presented for maximum likelihood parameter estimation. Parameter recovery and clustering performance are investigated via a simulation study. Comparisons with the unconstrained mixture models are obtained as by-product. A further simulated analysis is conducted to assess how sensitive our and some well-established parsimonious competitors are to their own generative scheme. Lastly, our and the competing models are evaluated in terms of fitting and clustering on three real datasets.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50025982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagnostic checking of multiple imputation models","authors":"Yang Zhao","doi":"10.1007/s10182-021-00429-1","DOIUrl":"10.1007/s10182-021-00429-1","url":null,"abstract":"<div><p>Model checking in multiple imputation (MI, Rubin in Multiple imputation for nonresponse in surveys, Wiley, New York, 1987) becomes increasingly important with the recent developments in MI and its widespread use in statistical analysis with missing data (e.g. van Buuren et al. in J Stat Comput Simul 76(12):1049–1064, 2006; van Buuren and Groothuis-Oudshoorn in J Stat Soft 45(3):1–67, 2011; Chen et al. in Biometrics 67:799–809, 2011; Nguyen et al. in Emerg Themes Epidemiol 14(8):1–12, 2017). The currently recommended posterior predictive checking method (He and Zaslavsky in Stat Med 31:1–18, 2012; Nguyen et al. in Biom J 4:676–694, 2015) is less effective when the proportion of missing values increases and its produced posterior predictive <i>p</i> value is not supported by a null distribution as a standard <i>p</i> value (Meng in Annu Stat 22:1142–1160, 1994). This research develops a new diagnostic method for checking MI models and proposes a test statistic with a standard <i>p</i> value. The new diagnostic checking method is effective and flexible. It does not depend on the proportion of missing values and can deal with data sets with arbitrary nonmonotone missing data patterns. We examine the performance of the proposed method in a simulation study and illustrate the method in a study of coronary disease and associated factors.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00429-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50050052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A spatial randomness test based on the box-counting dimension","authors":"Yolanda Caballero, Ramón Giraldo, Jorge Mateu","doi":"10.1007/s10182-021-00434-4","DOIUrl":"10.1007/s10182-021-00434-4","url":null,"abstract":"<div><p>Statistical modelling of a spatial point pattern often begins by testing the hypothesis of spatial randomness. Classical tests are based on quadrat counts and distance-based methods. Alternatively, we propose a new statistical test of spatial randomness based on the fractal dimension, calculated through the box-counting method providing an inferential perspective contrary to the more often descriptive use of this method. We also develop a graphical test based on the log–log plot to calculate the box-counting dimension. We evaluate the performance of our methodology by conducting a simulation study and analysing a COVID-19 dataset. The results reinforce the good performance of the method that arises as an alternative to the more classical distances-based strategies.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00434-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39809141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integrated local depth measure","authors":"Lucas Fernandez-Piana, Marcela Svarc","doi":"10.1007/s10182-021-00424-6","DOIUrl":"10.1007/s10182-021-00424-6","url":null,"abstract":"<div><p>We introduce the Integrated Dual Local Depth, which is a local depth measure for data in a Banach space based on the use of one-dimensional projections. The properties of a depth measure are analyzed under this setting and a proper definition of local symmetry is given. Moreover, strong consistency results for the local depth and also, for local depth regions are attained. Finally, applications to descriptive data analysis and classification are analyzed, making a special focus on multivariate functional data, where we obtain very promising results.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00424-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50010270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the causal treatment effect estimation with propensity scores by the bootstrap","authors":"Maeregu W. Arisido, Fulvia Mecatti, Paola Rebora","doi":"10.1007/s10182-021-00427-3","DOIUrl":"10.1007/s10182-021-00427-3","url":null,"abstract":"<div><p>When observational studies are used to establish the causal effects of treatments, the estimated effect is affected by treatment selection bias. The inverse propensity score weight (IPSW) is often used to deal with such bias. However, IPSW requires strong assumptions whose misspecifications and strategies to correct the misspecifications were rarely studied. We present a bootstrap bias correction of IPSW (BC-IPSW) to improve the performance of propensity score in dealing with treatment selection bias in the presence of failure to the ignorability and overlap assumptions. The approach was motivated by a real observational study to explore the potential of anticoagulant treatment for reducing mortality in patients with end-stage renal disease. The benefit of the treatment to enhance survival was demonstrated; the suggested BC-IPSW method indicated a statistically significant reduction in mortality for patients receiving the treatment. Using extensive simulations, we show that BC-IPSW substantially reduced the bias due to the misspecification of the ignorability and overlap assumptions. Further, we showed that IPSW is still useful to account for the lack of treatment randomization, but its advantages are stringently linked to the satisfaction of ignorability, indicating that the existence of relevant though unmeasured or unused covariates can worsen the selection bias.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00427-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46055305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lore Zumeta-Olaskoaga, Maximilian Weigert, Jon Larruskain, Eder Bikandi, Igor Setuain, Josean Lekue, Helmut Küchenhoff, Dae-Jin Lee
{"title":"Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models","authors":"Lore Zumeta-Olaskoaga, Maximilian Weigert, Jon Larruskain, Eder Bikandi, Igor Setuain, Josean Lekue, Helmut Küchenhoff, Dae-Jin Lee","doi":"10.1007/s10182-021-00428-2","DOIUrl":"10.1007/s10182-021-00428-2","url":null,"abstract":"<div><p>Data-based methods and statistical models are given special attention to the study of sports injuries to gain in-depth understanding of its risk factors and mechanisms. The objective of this work is to evaluate the use of shared frailty Cox models for the prediction of occurring sports injuries, and to compare their performance with different sets of variables selected by several regularized variable selection approaches. The study is motivated by specific characteristics commonly found for sports injury data, that usually include reduced sample size and even fewer number of injuries, coupled with a large number of potentially influential variables. Hence, we conduct a simulation study to address these statistical challenges and to explore regularized Cox model strategies together with shared frailty models in different controlled situations. We show that predictive performance greatly improves as more player observations are available. Methods that result in sparse models and favour interpretability, e.g. Best Subset Selection and Boosting, are preferred when the sample size is small. We include a real case study of injuries of female football players of a Spanish football club.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00428-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46564718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Pablo Burgard, Domingo Morales, Anna-Lena Wölwer
{"title":"Small area estimation of socioeconomic indicators for sampled and unsampled domains","authors":"Jan Pablo Burgard, Domingo Morales, Anna-Lena Wölwer","doi":"10.1007/s10182-021-00426-4","DOIUrl":"10.1007/s10182-021-00426-4","url":null,"abstract":"<div><p>Socioeconomic indicators play a crucial role in monitoring political actions over time and across regions. Income-based indicators such as the median income of sub-populations can provide information on the impact of measures, e.g., on poverty reduction. Regional information is usually published on an aggregated level. Due to small sample sizes, these regional aggregates are often associated with large standard errors or are missing if the region is unsampled or the estimate is simply not published. For example, if the median income of Hispanic or Latino Americans from the American Community Survey is of interest, some county-year combinations are not available. Therefore, a comparison of different counties or time-points is partly not possible. We propose a new predictor based on small area estimation techniques for aggregated data and bivariate modeling. This predictor provides empirical best predictions for the partially unavailable county-year combinations. We provide an analytical approximation to the mean squared error. The theoretical findings are backed up by a large-scale simulation study. Finally, we return to the problem of estimating the county-year estimates for the median income of Hispanic or Latino Americans and externally validate the estimates.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00426-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50038566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Small area estimation of socioeconomic indicators for sampled and unsampled domains","authors":"J. P. Burgard, D. Morales, Anna-Lena Wölwer","doi":"10.1007/s10182-021-00426-4","DOIUrl":"https://doi.org/10.1007/s10182-021-00426-4","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51998129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introducing LASSO-type penalisation to generalised joint regression modelling for count data","authors":"Hendrik van der Wurp, Andreas Groll","doi":"10.1007/s10182-021-00425-5","DOIUrl":"10.1007/s10182-021-00425-5","url":null,"abstract":"<div><p>In this work, we propose an extension of the versatile joint regression framework for bivariate count responses of the <span>R</span> package <span>GJRM</span> by Marra and Radice (R package version 0.2-3, 2020) by incorporating an (adaptive) LASSO-type penalty. The underlying estimation algorithm is based on a quadratic approximation of the penalty. The method enables variable selection and the corresponding estimates guarantee shrinkage and sparsity. Hence, this approach is particularly useful in high-dimensional count response settings. The proposal’s empirical performance is investigated in a simulation study and an application on FIFA World Cup football data.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2021-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-021-00425-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44427263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}