{"title":"Selecting Key Features of Online Behaviour on South African Informative Websites Prior to Unsupervised Machine Learning","authors":"Judah Soobramoney, R. Chifurira, T. Zewotir","doi":"10.19139/soic-2310-5070-1139","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1139","url":null,"abstract":"The main aim of the study was to explore the feature selection process of online web data prior to unsupervised machine learning models. At the time of writing, no such literature could be found reporting the use of feature selection in this context. Feature selection was determined by inspecting the variability and association between features. The variability of numeric features were quantified using the variance, mean absolute difference and dispersion ratio metrics whilst the coefficient of unalikeability was employed for categorical features. To quantify association, correlation matrices were used for numeric features, chi-squared independence tests between categorical features and box-and-whisker plots between mixed features. The main findings showed the variance, mean absolute difference, dispersion ratio and coefficient of unalikeability metrics have successfully highlighted features with very low variability within the observed data. Whilst the correlation matrix, chi-squared test for independence and box-and-whisker plots highlighted possible redundancy, natural relationships and insightful relationships between the features thereby suggesting features to be considered for omission prior to unsupervised modelling. The proposed methods and findings can be applied to various other applications of feature selection and exploration.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122902367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Norouzirad, M. Arashi, F. Marques, Naushad A. Mamod Khan
{"title":"Feasible Stein-Type and Preliminary Test Estimations in the System Regression Model","authors":"M. Norouzirad, M. Arashi, F. Marques, Naushad A. Mamod Khan","doi":"10.19139/soic-2310-5070-1589","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1589","url":null,"abstract":"In a system of regression models, finding a feasible shrinkage is demanding since the covariance structure is unknown and cannot be ignored. On the other hand, specifying sub-space restrictions for adequate shrinkage is vital. This study proposes feasible shrinkage estimation strategies where the sub-space restriction is obtained from LASSO. Therefore, some feasible LASSO-based Stein-type estimators are introduced, and their asymptotic performance is studied. Extensive Monte Carlo simulation and a real-data experiment support the superior performance of the proposed estimators compared to the feasible generalized least-squared estimator.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116626378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Singh, Ravindra Pratap Singh, Amit Singh Nayal, Abhishek Tyagi
{"title":"Discrete Inverted Nadarajah-Haghighi Distribution: Properties and Classical Estimation with Application to Complete and Censored data","authors":"B. Singh, Ravindra Pratap Singh, Amit Singh Nayal, Abhishek Tyagi","doi":"10.19139/soic-2310-5070-1365","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1365","url":null,"abstract":"In this article, we have developed the discrete version of the continuous inverted Nadarajah-Haghighi distribution and called it a discrete inverted Nadarajah-Haghighi distribution. The present model is well enough to model not only the over-dispersed and positively skewed data but it can also model upside-down bathtub-shaped, decreasing failure rate, and randomly right-censored data. Here, we have developed some important statistical properties for the proposed model such as quantile, median, moments, skewness, kurtosis, index of dispersion, entropy, expected inactivity time function, stress-strength reliability, and order statistics. We have estimated the model parameters through the method of maximum likelihood under complete and censored data. An algorithm to generate randomly right-censored data from the proposed model is also presented. The extensive simulation studies are presented to test the behavior of the estimators with complete and censored data. Finally, two complete and two censored data are used to illustrate the utility of the proposed model.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116704396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The k-nearest Neighbor Classification of Histogram- and Trapezoid-Valued Data","authors":"M. Razmkhah, Fathimah al-Ma’shumah, S. Effati","doi":"10.19139/soic-2310-5070-1451","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1451","url":null,"abstract":"A histogram-valued observation is a specific type of symbolic objects that represents its value by a list of bins (intervals) along with their corresponding relative frequencies or probabilities. \u0000In the literature, the raw data in bins of all histogram-valued data have been assumed to be uniformly distributed. A new representation of such observations is proposed in this paper by assuming that the raw data in each bin are linearly distributed, which are called trapezoid-valued data. \u0000Moreover, new definitions of union and intersection between trapezoid-valued observations are made. \u0000This study proposes the k-nearest neighbor technique for classifying histogram-valued data using various dissimilarity measures. \u0000Further, the limiting behavior of the computational complexities based on the performed dissimilarity measures are compared. \u0000Some simulations are done to study the performance of the proposed procedures. Also, the results are applied to three various real data sets. \u0000Eventually, some conclusions are stated.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124420951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bruno Caparroz Lopes de Freitas, J. Achcar, Marcos Vinicius de Oliveira Peres, E. Martinez
{"title":"Discrete Bilal Distribution in the Presence of Right-Censored Data and a Cure Fraction","authors":"Bruno Caparroz Lopes de Freitas, J. Achcar, Marcos Vinicius de Oliveira Peres, E. Martinez","doi":"10.19139/soic-2310-5070-1414","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1414","url":null,"abstract":"The statistical literature presents many continuous probability distributions with only one parameter, which are extensively used in the analysis of lifetime data, such as the exponential, the Lindley, and the Rayleigh distributions. Alternatively, the use of discretized versions of these distributions can provide a better fit for the data in many applications. As the novelty of this study, we present inferences for the discrete Bilal distribution (DB) with one parameter introduced by Altun et al. (2020) in the presence of right-censored data and cure fraction. We assume standard maximum likelihood methods based on asymptotic normality of the maximum likelihood estimators and also a Bayesian approach based on MCMC (Markov Chain Monte Carlo) simulation methods to get inferences for the parameters of the discrete BD distribution. The use of the proposed model was illustrated with three examples considering real medical lifetime data sets. From these applications, we concluded that the proposed model based on the discrete DB distribution has good performance even with the inclusion of a cure fraction in comparison to other existing discrete models, such as the DsFx-I, Lindley, Rayleigh, and Burr-Hatke probability distributions. Moreover, the model can be easily implemented in standard existing software, such as the R package. Under a Bayesian approach, we assumed a gamma prior distribution for the parameter of the DB discrete distribution. We also provided a brief sensitivity analysis assuming the half-normal distribution in place of the gamma distribution for the parameter of the DB distribution. From the obtained results of this study, we can conclude that the proposed methodology can be very useful for researchers dealing with medical discrete lifetime data in the presence of right-censored data and cure fraction.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131797440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sricharan Shah, P. Hazarika, S. Chakraborty, M. Alizadeh
{"title":"The Balakrishnan-Alpha-Beta-Skew-Laplace Distribution: Properties and Applications","authors":"Sricharan Shah, P. Hazarika, S. Chakraborty, M. Alizadeh","doi":"10.19139/soic-2310-5070-1247","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1247","url":null,"abstract":"In this paper, a new form of alpha-beta-skew-Laplace distribution is proposed under Balakrishnan [3] mechanism and investigated some of its related distributions. The moments, distributional properties and some extensions of the proposed distribution have also studied. Finally, the suitability and the appropriateness of the proposed distribution has tested by conducting data fitting experiment and comparing the values of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) with the values of some other related distributions. Likelihood Ratio test is used for discriminating between the nested models.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115527226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Economic Dispatch of Electrical Power in South Africa: An Application to the Northern Cape Province","authors":"Thakhani Ravele, C. Sigauke, L. Jhamba","doi":"10.19139/soic-2310-5070-1057","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1057","url":null,"abstract":"Power utility companies rely on forecasting for the operation of electricity demand. This presents an applicationof linear quantile regression, non-linear quantile regression, and additive quantile regression models for forecasting extreme electricity demand at peak hours such as 18:00, 19:00, 20:00 and 21:00 using Northern Cape data for the period 01 January 2000 to 31 March 2014. The selection of variables was done using the least absolute shrinkage and selection operator. Additive quantile regression models were found to be the best fitting models for hours 18:00, and 19:00, whereas linear quantile regression models were found to be the best fitting models for hours 20:00, and 21:00. Out of sample forecasts for seven days (01 to 07 April 2014) were used to solve the unit commitment problem using mixed-integer programming. The unit commitment problem results showed that it is less costly to use all the generating units such as hydroelectric, wind power, concentrated solar power and solar photovoltaic. The main contribution of this study is in the development of models for forecasting hourly extreme peak electricity demand. These results could be useful to system operators in the energy sector who have to maintain the minimum cost by scheduling and dispatching electricity during peak hours when the grid is constrained due to peak load demand.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132488828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Best Linear Unbiased Estimation and Prediction of Record Values Based on Kumaraswamy Distributed Data","authors":"R. A. Aldallal","doi":"10.19139/soic-2310-5070-1397","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1397","url":null,"abstract":"To predict a future upper record value based on Kumaraswamy distributed data, an explicit expression for single and product moments has been established along with some enhanced expressions that makes the applying process on mathematical softwares easier. The best linear unbiased estimator approach for estimating the parameters and the prediction of future record values have been considered and some important tables have been created to help in the calculation processes. Two illustrative examples based on a simulation study and a real-life data are provided to assess the performance of the introduced results.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122423193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jorge Cabral, V. Afreixo, Cristiana J. Silva, A. Tavares, A. Marques
{"title":"A Multiobjective Optimization Approach to Pulmonary Rehabilitation Effectiveness in COPD","authors":"Jorge Cabral, V. Afreixo, Cristiana J. Silva, A. Tavares, A. Marques","doi":"10.19139/soic-2310-5070-1505","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1505","url":null,"abstract":"Chronic obstructive pulmonary disease (COPD) is a common disease that accounts for a significant individual and societal burden. Pulmonary rehabilitation (PR) is a key management strategy but it is highly inaccessible, making prioritisation highly needed. This study aimed to determine and optimize predictive models of PR outcomes and build a tool to help healthcare professionals in their clinical decision-making about PR prioritisation. Data from patients who performed a 12-week community-based PR programme were analysed. Exercise capacity with the six-minutes walk test distance (6MWD), isometric quadriceps muscle strength with the handheld dynamometry (QMS) and dyspnoea with the modified Medical Research Council dyspnoea scale (mMRC) were assessed before and after PR. Multiple linear regression models were determined based on the Akaike information criteria and a cross-validation method. The resultant multiobjective problem was solved using the Nondominated Sorting Genetic Algorithm-II. R Shiny package was used to create a web-based user interface. Data from 95 patients with COPD (median age of 69 years, 19 female and generally overweight), resulted in linear predictive models for the post-pre difference of the 6MWD, QMS and mMRC with cross-validation R2 of 0.49, 0.53 and 0.51, respectively. 6MWD and mMRC were common statistically significant predictors. Pareto front patients were obese ex-smoker women that do not do long-term oxygen therapy and that performed PR. The distance to the Pareto front along with the estimates given by our models are easily obtained using the designed R Shiny interface and may help healthcare professionals decide on the prioritisation to PR programmes.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128313270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-parametric Multivariate Kernel Regression Estimation to Describe Cognitive Processes and Mental Representations","authors":"S. Slama, Y. Slaoui, Gwendoline Le Du, C. Perret","doi":"10.19139/soic-2310-5070-1507","DOIUrl":"https://doi.org/10.19139/soic-2310-5070-1507","url":null,"abstract":"In this research paper, we set forward a non-parametric multivariate recursive kernel regression estimator under missing data using the propensity score approach in order to describe writing word production. Our main objective is to explore cognitive processes and mental representations mobilized when a human being prepares to write a word according to the idea developed in Perret and Olive (2019). We investigate the asymptotic properties of the proposed recursive estimator and compare them to the well known Nadaraya-Watson’s regression estimator. We calculate the bias and the variance of the proposed estimator which depend on the choice of some parameters such as the stepsize and the bandwidth. We examine some data-driven procedures to select these parameters. Thus, we demonstrate that, under some optimal choices of these parameters, the MSE (Mean Squared Error) of the proposed estimator can be smaller than the one obtained by using Nadaraya Watson’s regression estimator. The elaborated estimator is then applied to the behavioral data to classify some participants in groups. This classification may stand for a departure point to tackle written behavior variations.","PeriodicalId":131002,"journal":{"name":"Statistics, Optimization & Information Computing","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131815104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}