StatsPub Date : 2024-07-23DOI: 10.3390/stats7030047
Evangelia Georgakopoulou, T. Tsapanos, A. Makrides, Emmanuel Scordilis, Alex Karagrigoriou, Alexandra Papadopoulou, Vassilios Karastathis
{"title":"Seismic Evaluation Based on Poisson Hidden Markov Models—The Case of Central and South America","authors":"Evangelia Georgakopoulou, T. Tsapanos, A. Makrides, Emmanuel Scordilis, Alex Karagrigoriou, Alexandra Papadopoulou, Vassilios Karastathis","doi":"10.3390/stats7030047","DOIUrl":"https://doi.org/10.3390/stats7030047","url":null,"abstract":"A study of earthquake seismicity is undertaken over the areas of Central and South America, the tectonics of which are of great interest. The whole territory is divided into 10 seismic zones based on some seismotectonic characteristics, as in previously published studies. The earthquakes used in the present study are extracted from the catalogs of the International Seismological Center, cover the period of 1900–2021, and are restricted to shallow depths (≤60 km) and a magnitude M≥4.5. Fore- and aftershocks are removed according to Reasenberg’s technique. The paper confines itself to the evaluation of earthquake occurrence probabilities in the seismic zones covering parts of Central and South America, and we implement the hidden Markov model (HMM) and apply the EM algorithm.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141812571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-07-22DOI: 10.3390/stats7030046
Anas Eisa Abdelkreem Mohammed, H. Mwambi, B. Omolo
{"title":"Time-Varying Correlations between JSE.JO Stock Market and Its Partners Using Symmetric and Asymmetric Dynamic Conditional Correlation Models","authors":"Anas Eisa Abdelkreem Mohammed, H. Mwambi, B. Omolo","doi":"10.3390/stats7030046","DOIUrl":"https://doi.org/10.3390/stats7030046","url":null,"abstract":"The extent of correlation or co-movement among the returns of developed and emerging stock markets remains pivotal for efficiently diversifying global portfolios. This correlation is prone to variation over time as a consequence of escalating economic interdependence fostered by international trade and financial markets. In this study, the time-varying correlation and co-movement between the JSE.JO stock market of South Africa and its developed and developing stock market partners are analyzed. The dynamic conditional correlation–exponential generalized autoregressive conditional heteroscedasticity (DCC-EGARCH) methodology is employed with different multivariate distributions to explore the time-varying correlation and volatilities between the JSE.JO stock market and its partners. Based on the conditional correlation results, the JSE.JO stock market is integrated and co-moves with its partners, and the conditional correlation for all markets exhibits time-variant behavior. The conditional volatility results show that the JSE.JO stock market behaves differently from other markets, especially after 2015, indicating a positive sign for investors to diversify between the JSE.JO and its partners. The highest value of conditional volatility for markets was in 2020 during the COVID-19 pandemic, representing the riskiest period that investors should avoid due to the lack of diversification opportunities during crises.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141817805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-07-20DOI: 10.3390/stats7030045
P. Pramanik, Edward L. Boone, R. Ghanam
{"title":"Parametric Estimation in Fractional Stochastic Differential Equation","authors":"P. Pramanik, Edward L. Boone, R. Ghanam","doi":"10.3390/stats7030045","DOIUrl":"https://doi.org/10.3390/stats7030045","url":null,"abstract":"Fractional Stochastic Differential Equations are becoming more popular in the literature as they can model phenomena in financial data that typical Stochastic Differential Equations models cannot. In the formulation considered here, the Hurst parameter, H, controls the Fraction of Differentiation, which needs to be estimated from the data. Fortunately, the covariance structure among observations in time is easily expressed in terms of the Hurst parameter which means that a likelihood is easily defined. This work derives the Maximum Likelihood Estimator for H, which shows that it is biased and is not a consistent estimator. Simulation data used to understand the bias of the estimator is used to create an empirical bias correction function and a bias-corrected estimator is proposed and studied. Via simulation, the bias-corrected estimator is shown to be minimally biased and its simulation-based standard error is created, which is then used to create a 95% confidence interval for H. A simulation study shows that the 95% confidence intervals have decent coverage probabilities for large n. This method is then applied to the S&P500 and VIX data before and after the 2008 financial crisis.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141819759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-07-18DOI: 10.3390/stats7030044
Hyemin Han
{"title":"Bayesian Model Averaging and Regularized Regression as Methods for Data-Driven Model Exploration, with Practical Considerations","authors":"Hyemin Han","doi":"10.3390/stats7030044","DOIUrl":"https://doi.org/10.3390/stats7030044","url":null,"abstract":"Methodological experts suggest that psychological and educational researchers should employ appropriate methods for data-driven model exploration, such as Bayesian Model Averaging and regularized regression, instead of conventional hypothesis-driven testing, if they want to explore the best prediction model. I intend to discuss practical considerations regarding data-driven methods for end-user researchers without sufficient expertise in quantitative methods. I tested three data-driven methods, i.e., Bayesian Model Averaging, LASSO as a form of regularized regression, and stepwise regression, with datasets in psychology and education. I compared their performance in terms of cross-validity indicating robustness against overfitting across different conditions. I employed functionalities widely available via R with default settings to provide information relevant to end users without advanced statistical knowledge. The results demonstrated that LASSO showed the best performance and Bayesian Model Averaging outperformed stepwise regression when there were many candidate predictors to explore. Based on these findings, I discussed appropriately using the data-driven model exploration methods across different situations from laypeople’s perspectives.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141825846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-07-16DOI: 10.3390/stats7030043
Manuel Salas‐Velasco
{"title":"Transitioning from the University to the Workplace: A Duration Model with Grouped Data","authors":"Manuel Salas‐Velasco","doi":"10.3390/stats7030043","DOIUrl":"https://doi.org/10.3390/stats7030043","url":null,"abstract":"Labor market surveys usually measure unemployment duration in time intervals. In these cases, traditional duration models such as Cox regression and parametric survival models are not suitable for studying the duration of unemployment spells. In order to deal with this above issue, we use Han and Hausman’s ordered logit model for grouped durations, which has more flexibility than standard specifications. In particular, its flexibility arises from the fact that we do not need to specify any functional form for the baseline hazard function—it also circumvents problems associated with heterogeneity. The focus of interest is on the first unemployment duration of higher education graduates. The analysis is accomplished by using a large dataset from a graduate survey of Spanish university graduates. The results show that the university-to-work transition of higher education graduates is significantly associated with the graduate’s age, participation in internship programs, field of study, type of university, and gender. Specifically, graduates who participated in internship programs, engineering graduates, and graduates from private universities experience a smooth transition.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141642195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-07-12DOI: 10.3390/stats7030041
Arantxa Ortega-Leon, Arnaud Gucciardi, A. Segado-Arenas, Isabel Benavente-Fernández, Daniel Urda, Ignacio Turias
{"title":"Neurodevelopmental Impairments Prediction in Premature Infants Based on Clinical Data and Machine Learning Techniques","authors":"Arantxa Ortega-Leon, Arnaud Gucciardi, A. Segado-Arenas, Isabel Benavente-Fernández, Daniel Urda, Ignacio Turias","doi":"10.3390/stats7030041","DOIUrl":"https://doi.org/10.3390/stats7030041","url":null,"abstract":"Preterm infants are prone to NeuroDevelopmental Impairment (NDI). Some previous works have identified clinical variables that can be potential predictors of NDI. However, machine learning (ML)-based models still present low predictive capabilities when addressing this problem. This work attempts to evaluate the application of ML techniques to predict NDI using clinical data from a cohort of very preterm infants recruited at birth and assessed at 2 years of age. Six different classification models were assessed, using all features, clinician-selected features, and mutual information feature selection. The best results were obtained by ML models trained using mutual information-selected features and employing oversampling, for cognitive and motor impairment prediction, while for language impairment prediction the best setting was clinician-selected features. Although the performance indicators in this local cohort are consistent with similar previous works and still rather poor. This is a clear indication that, in order to obtain better performance rates, further analysis and methods should be considered, and other types of data should be taken into account together with the clinical variables.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141653085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-07-05DOI: 10.3390/stats7030042
M. Lamboni
{"title":"Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions","authors":"M. Lamboni","doi":"10.3390/stats7030042","DOIUrl":"https://doi.org/10.3390/stats7030042","url":null,"abstract":"Computing cross-partial derivatives using fewer model runs is relevant in modeling, such as stochastic approximation, derivative-based ANOVA, exploring complex models, and active subspaces. This paper introduces surrogates of all the cross-partial derivatives of functions by evaluating such functions at N randomized points and using a set of L constraints. Randomized points rely on independent, central, and symmetric variables. The associated estimators, based on NL model runs, reach the optimal rates of convergence (i.e., O(N−1)), and the biases of our approximations do not suffer from the curse of dimensionality for a wide class of functions. Such results are used for (i) computing the main and upper bounds of sensitivity indices, and (ii) deriving emulators of simulators or surrogates of functions thanks to the derivative-based ANOVA. Simulations are presented to show the accuracy of our emulators and estimators of sensitivity indices. The plug-in estimates of indices using the U-statistics of one sample are numerically much stable.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141673227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-07-01DOI: 10.3390/stats7030040
M. Chalikias, Georgios X. Papageorgiou, Dimitrios P. Zarogiannis
{"title":"Estimator Comparison for the Prediction of Election Results","authors":"M. Chalikias, Georgios X. Papageorgiou, Dimitrios P. Zarogiannis","doi":"10.3390/stats7030040","DOIUrl":"https://doi.org/10.3390/stats7030040","url":null,"abstract":"Cluster randomized experiments and estimator comparisons are well-documented topics. In this paper, using the datasets of the popular vote in the presidential elections of the United States of America (2012, 2016, 2020), we evaluate the properties (SE, MSE) of three cluster sampling estimators: Ratio estimator, Horvitz–Thompson estimator and the linear regression estimator. While both the Ratio and Horvitz–Thompson estimators are widely used in cluster analysis, we propose a linear regression estimator defined for unequal cluster sizes, which, in many scenarios, performs better than the other two. The main objective of this paper is twofold. Firstly, to indicate which estimator is most suited for predicting the outcome of the popular vote in the United States of America. We do so by applying the single-stage cluster sampling technique to our data. In the first partition, we use the 50 states plus the District of Columbia as primary sampling units, whereas in the second one, we use 3112 counties instead. Secondly, based on the results of the aforementioned procedure, we estimate the number of clusters in a sample for a set standard error while also considering the diminishing returns from increasing the number of clusters in the sample. The linear regression estimator is best in the majority of the examined cases. This type of comparison can also be used for the estimation of any other country’s elections if prior voting results are available.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-05-20DOI: 10.3390/stats7020030
C. R. Fidelis, E. M. Ortega, G. Cordeiro
{"title":"Residual Analysis for Poisson-Exponentiated Weibull Regression Models with Cure Fraction","authors":"C. R. Fidelis, E. M. Ortega, G. Cordeiro","doi":"10.3390/stats7020030","DOIUrl":"https://doi.org/10.3390/stats7020030","url":null,"abstract":"The use of cure-rate survival models has grown in recent years. Even so, proposals to perform the goodness of fit of these models have not been so frequent. However, residual analysis can be used to check the adequacy of a fitted regression model. In this context, we provide Cox–Snell residuals for Poisson-exponentiated Weibull regression with cure fraction. We developed several simulations under different scenarios for studying the distributions of these residuals. They were applied to a melanoma dataset for illustrative purposes.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141119452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2024-05-15DOI: 10.3390/stats7020029
J. C. W. Rayner, G. C. Livingston
{"title":"Testing for Level–Degree Interaction Effects in Two-Factor Fixed-Effects ANOVA When the Levels of Only One Factor Are Ordered","authors":"J. C. W. Rayner, G. C. Livingston","doi":"10.3390/stats7020029","DOIUrl":"https://doi.org/10.3390/stats7020029","url":null,"abstract":"In testing for main effects, the use of orthogonal contrasts for balanced designs with the factor levels not ordered is well known. Here, we consider two-factor fixed-effects ANOVA with the levels of one factor ordered and one not ordered. The objective is to extend the idea of decomposing the main effect to decomposing the interaction. This is achieved by defining level–degree coefficients and testing if they are zero using permutation testing. These tests give clear insights into what may be causing a significant interaction, even for the unbalanced model.","PeriodicalId":93142,"journal":{"name":"Stats","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140974920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}