StatsPub Date : 2023-01-04DOI: 10.3390/stats6010007
Magda Monteiro, M. Costa
{"title":"Change Point Detection by State Space Modeling of Long-Term Air Temperature Series in Europe","authors":"Magda Monteiro, M. Costa","doi":"10.3390/stats6010007","DOIUrl":"https://doi.org/10.3390/stats6010007","url":null,"abstract":"This work presents the statistical analysis of a monthly average temperatures time series in several European cities using a state space approach, which considers models with a deterministic seasonal component and a stochastic trend. Temperature rise rates in Europe seem to have increased in the last decades when compared with longer periods. Therefore, change point detection methods, both parametric and non-parametric methods, were applied to the standardized residuals of the state space models (or some other related component) in order to identify these possible changes in the monthly temperature rise rates. All of the used methods have identified at least one change point in each of the temperature time series, particularly in the late 1980s or early 1990s. The differences in the average temperature trend are more evident in Eastern European cities than in Western Europe. The smoother-based t-test framework proposed in this work showed an advantage over the other methods, precisely because it considers the time correlation presented in time series. Moreover, this framework focuses the change point detection on the stochastic trend component.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43119824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2023-01-01DOI: 10.3390/stats6010006
Isa Muqattash, Jiaqiao Hu
{"title":"An ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes","authors":"Isa Muqattash, Jiaqiao Hu","doi":"10.3390/stats6010006","DOIUrl":"https://doi.org/10.3390/stats6010006","url":null,"abstract":"We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45930650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-29DOI: 10.3390/stats6010005
V. Timiryanova, Dina Krasnoselskaya, N. Kuzminykh
{"title":"Applying the Multilevel Approach in Estimation of Income Population Differences","authors":"V. Timiryanova, Dina Krasnoselskaya, N. Kuzminykh","doi":"10.3390/stats6010005","DOIUrl":"https://doi.org/10.3390/stats6010005","url":null,"abstract":"Income inequality remains one of the most burning issues discussed in the world. The difficulty of the problem arises from its multiple manifestations at regional and local levels and unique patterns within countries. This paper employs a multilevel approach to identify factors that influence income and wage inequalities at regional and municipal scales in Russia. We carried out the study on data from 2017 municipalities of 75 Russian regions from 2015 to 2019. A Hierarchical Linear Model with Cross-Classified Random Effects (HLMHCM) allowed us to establish that most of the total variances in population income and average wages accounted for the regional scale. Our analysis revealed different variances of income per capita and average wage; we disclosed the reasons for these disparities. We also found a mixed relationship between income inequality and social transfers. These variables influence income growth but change the relationship between income and labour productivity. Our study underlined that the impacts of shares of employees in agriculture and manufacturing should be considered together with labour productivity in these industries.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43768845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-27DOI: 10.3390/stats6010003
D. Politis, P. Tarassenko, V. Vasiliev
{"title":"Estimating Smoothness and Optimal Bandwidth for Probability Density Functions","authors":"D. Politis, P. Tarassenko, V. Vasiliev","doi":"10.3390/stats6010003","DOIUrl":"https://doi.org/10.3390/stats6010003","url":null,"abstract":"The properties of non-parametric kernel estimators for probability density function from two special classes are investigated. Each class is parametrized with distribution smoothness parameter. One of the classes was introduced by Rosenblatt, another one is introduced in this paper. For the case of the known smoothness parameter, the rates of mean square convergence of optimal (on the bandwidth) density estimators are found. For the case of unknown smoothness parameter, the estimation procedure of the parameter is developed and almost surely convergency is proved. The convergence rates in the almost sure sense of these estimators are obtained. Adaptive estimators of densities from the given class on the basis of the constructed smoothness parameter estimators are presented. It is shown in examples how parameters of the adaptive density estimation procedures can be chosen. Non-asymptotic and asymptotic properties of these estimators are investigated. Specifically, the upper bounds for the mean square error of the adaptive density estimators for a fixed sample size are found and their strong consistency is proved. The convergence of these estimators in the almost sure sense is established. Simulation results illustrate the realization of the asymptotic behavior when the sample size grows large.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48907693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-24DOI: 10.3390/stats6010002
Pedro Chaim, M. Laurini
{"title":"Data Cloning Estimation and Identification of a Medium-Scale DSGE Model","authors":"Pedro Chaim, M. Laurini","doi":"10.3390/stats6010002","DOIUrl":"https://doi.org/10.3390/stats6010002","url":null,"abstract":"We apply the data cloning method to estimate a medium-scale dynamic stochastic general equilibrium model. The data cloning algorithm is a numerical method that employs replicas of the original sample to approximate the maximum likelihood estimator as the limit of Bayesian simulation-based estimators. We also analyze the identification properties of the model. We measure the individual identification strength of each parameter by observing the posterior volatility of data cloning estimates and access the identification problem globally through the maximum eigenvalue of the posterior data cloning covariance matrix. Our results corroborate existing evidence suggesting that the DSGE model of Smeets and Wouters is only poorly identified. The model displays weak global identification properties, and many of its parameters seem locally ill-identified.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43653798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-22DOI: 10.3390/stats6010001
Chathurangi H. Pathiravasan, B. Bhattacharya
{"title":"A Semiparametric Tilt Optimality Model","authors":"Chathurangi H. Pathiravasan, B. Bhattacharya","doi":"10.3390/stats6010001","DOIUrl":"https://doi.org/10.3390/stats6010001","url":null,"abstract":"Practitioners often face the situation of comparing any set of k distributions, which may follow neither normality nor equality of variances. We propose a semiparametric model to compare those distributions using an exponential tilt method. This extends the classical analysis of variance models when all distributions are unknown by relaxing its assumptions. The proposed model is optimal when one of the distributions is known. Large-sample estimates of the model parameters are derived, and the hypotheses for the equality of the distributions are tested for one-at-a-time and simultaneous comparison cases. Real data examples from NASA meteorology experiments and social credit card limits are analyzed to illustrate our approach. The proposed approach is shown to be preferable in a simulated power comparison with existing parametric and nonparametric methods.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46743794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-16DOI: 10.3390/stats5040081
D. Griffith, R. Plant
{"title":"Statistical Analysis in the Presence of Spatial Autocorrelation: Selected Sampling Strategy Effects","authors":"D. Griffith, R. Plant","doi":"10.3390/stats5040081","DOIUrl":"https://doi.org/10.3390/stats5040081","url":null,"abstract":"Fundamental to most classical data collection sampling theory development is the random drawings assumption requiring that each targeted population member has a known sample selection (i.e., inclusion) probability. Frequently, however, unrestricted random sampling of spatially autocorrelated data is impractical and/or inefficient. Instead, randomly choosing a population subset accounts for its exhibited spatial pattern by utilizing a grid, which often provides improved parameter estimates, such as the geographic landscape mean, at least via its precision. Unfortunately, spatial autocorrelation latent in these data can produce a questionable mean and/or standard error estimate because each sampled population member contains information about its nearby members, a data feature explicitly acknowledged in model-based inference, but ignored in design-based inference. This autocorrelation effect prompted the development of formulae for calculating an effective sample size (i.e., the equivalent number of sample selections from a geographically randomly distributed population that would yield the same sampling error) estimate. Some researchers recently challenged this and other aspects of spatial statistics as being incorrect/invalid/misleading. This paper seeks to address this category of misconceptions, demonstrating that the effective geographic sample size is a valid and useful concept regardless of the inferential basis invoked. Its spatial statistical methodology builds upon the preceding ingredients.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48024679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-14DOI: 10.3390/stats5040080
S. Dutta
{"title":"Robust Testing of Paired Outcomes Incorporating Covariate Effects in Clustered Data with Informative Cluster Size","authors":"S. Dutta","doi":"10.3390/stats5040080","DOIUrl":"https://doi.org/10.3390/stats5040080","url":null,"abstract":"Paired outcomes are common in correlated clustered data where the main aim is to compare the distributions of the outcomes in a pair. In such clustered paired data, informative cluster sizes can occur when the number of pairs in a cluster (i.e., a cluster size) is correlated to the paired outcomes or the paired differences. There have been some attempts to develop robust rank-based tests for comparing paired outcomes in such complex clustered data. Most of these existing rank tests developed for paired outcomes in clustered data compare the marginal distributions in a pair and ignore any covariate effect on the outcomes. However, when potentially important covariate data is available in observational studies, ignoring these covariate effects on the outcomes can result in a flawed inference. In this article, using rank based weighted estimating equations, we propose a robust procedure for covariate effect adjusted comparison of paired outcomes in a clustered data that can also address the issue of informative cluster size. Through simulated scenarios and real-life neuroimaging data, we demonstrate the importance of considering covariate effects during paired testing and robust performances of our proposed method in covariate adjusted paired comparisons in complex clustered data settings.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47985565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-13DOI: 10.3390/stats5040079
Bruno Mathis
{"title":"Extracting Proceedings Data from Court Cases with Machine Learning","authors":"Bruno Mathis","doi":"10.3390/stats5040079","DOIUrl":"https://doi.org/10.3390/stats5040079","url":null,"abstract":"France is rolling out an open data program for all court cases, but with few metadata attached. Reusers will have to use named-entity recognition (NER) within the text body of the case to extract any value from it. Any court case may include up to 26 variables, or labels, that are related to the proceeding, regardless of the case substance. These labels are from different syntactic types: some of them are rare; others are ubiquitous. This experiment compares different algorithms, namely CRF, SpaCy, Flair and DeLFT, to extract proceedings data and uses the learning model assessment capabilities of Kairntech, an NLP platform. It shows that an NER model can apply to this large and diverse set of labels and extract data of high quality. We achieved an 87.5% F1 measure with Flair trained on more than 27,000 manual annotations. Quality may yet be improved by combining NER models by data type.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43284543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-07DOI: 10.3390/stats5040078
C. Caroni
{"title":"Regression Models for Lifetime Data: An Overview","authors":"C. Caroni","doi":"10.3390/stats5040078","DOIUrl":"https://doi.org/10.3390/stats5040078","url":null,"abstract":"Two methods dominate the regression analysis of time-to-event data: the accelerated failure time model and the proportional hazards model. Broadly speaking, these predominate in reliability modelling and biomedical applications, respectively. However, many other methods have been proposed, including proportional odds, proportional mean residual life and several other “proportional” models. This paper presents an overview of the field and the concept behind each of these ideas. Multi-parameter modelling is also discussed, in which (in contrast to, say, the proportional hazards model) more than one parameter of the lifetime distribution may depend on covariates. This includes first hitting time (or threshold) regression based on an underlying latent stochastic process. Many of the methods that have been proposed have seen little or no practical use. Lack of user-friendly software is certainly a factor in this. Diagnostic methods are also lacking for most methods.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43006947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}