StatsPub Date : 2022-12-16DOI: 10.3390/stats5040081
D. Griffith, R. Plant
{"title":"Statistical Analysis in the Presence of Spatial Autocorrelation: Selected Sampling Strategy Effects","authors":"D. Griffith, R. Plant","doi":"10.3390/stats5040081","DOIUrl":"https://doi.org/10.3390/stats5040081","url":null,"abstract":"Fundamental to most classical data collection sampling theory development is the random drawings assumption requiring that each targeted population member has a known sample selection (i.e., inclusion) probability. Frequently, however, unrestricted random sampling of spatially autocorrelated data is impractical and/or inefficient. Instead, randomly choosing a population subset accounts for its exhibited spatial pattern by utilizing a grid, which often provides improved parameter estimates, such as the geographic landscape mean, at least via its precision. Unfortunately, spatial autocorrelation latent in these data can produce a questionable mean and/or standard error estimate because each sampled population member contains information about its nearby members, a data feature explicitly acknowledged in model-based inference, but ignored in design-based inference. This autocorrelation effect prompted the development of formulae for calculating an effective sample size (i.e., the equivalent number of sample selections from a geographically randomly distributed population that would yield the same sampling error) estimate. Some researchers recently challenged this and other aspects of spatial statistics as being incorrect/invalid/misleading. This paper seeks to address this category of misconceptions, demonstrating that the effective geographic sample size is a valid and useful concept regardless of the inferential basis invoked. Its spatial statistical methodology builds upon the preceding ingredients.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48024679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-14DOI: 10.3390/stats5040080
S. Dutta
{"title":"Robust Testing of Paired Outcomes Incorporating Covariate Effects in Clustered Data with Informative Cluster Size","authors":"S. Dutta","doi":"10.3390/stats5040080","DOIUrl":"https://doi.org/10.3390/stats5040080","url":null,"abstract":"Paired outcomes are common in correlated clustered data where the main aim is to compare the distributions of the outcomes in a pair. In such clustered paired data, informative cluster sizes can occur when the number of pairs in a cluster (i.e., a cluster size) is correlated to the paired outcomes or the paired differences. There have been some attempts to develop robust rank-based tests for comparing paired outcomes in such complex clustered data. Most of these existing rank tests developed for paired outcomes in clustered data compare the marginal distributions in a pair and ignore any covariate effect on the outcomes. However, when potentially important covariate data is available in observational studies, ignoring these covariate effects on the outcomes can result in a flawed inference. In this article, using rank based weighted estimating equations, we propose a robust procedure for covariate effect adjusted comparison of paired outcomes in a clustered data that can also address the issue of informative cluster size. Through simulated scenarios and real-life neuroimaging data, we demonstrate the importance of considering covariate effects during paired testing and robust performances of our proposed method in covariate adjusted paired comparisons in complex clustered data settings.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47985565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-13DOI: 10.3390/stats5040079
Bruno Mathis
{"title":"Extracting Proceedings Data from Court Cases with Machine Learning","authors":"Bruno Mathis","doi":"10.3390/stats5040079","DOIUrl":"https://doi.org/10.3390/stats5040079","url":null,"abstract":"France is rolling out an open data program for all court cases, but with few metadata attached. Reusers will have to use named-entity recognition (NER) within the text body of the case to extract any value from it. Any court case may include up to 26 variables, or labels, that are related to the proceeding, regardless of the case substance. These labels are from different syntactic types: some of them are rare; others are ubiquitous. This experiment compares different algorithms, namely CRF, SpaCy, Flair and DeLFT, to extract proceedings data and uses the learning model assessment capabilities of Kairntech, an NLP platform. It shows that an NER model can apply to this large and diverse set of labels and extract data of high quality. We achieved an 87.5% F1 measure with Flair trained on more than 27,000 manual annotations. Quality may yet be improved by combining NER models by data type.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43284543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-07DOI: 10.3390/stats5040078
C. Caroni
{"title":"Regression Models for Lifetime Data: An Overview","authors":"C. Caroni","doi":"10.3390/stats5040078","DOIUrl":"https://doi.org/10.3390/stats5040078","url":null,"abstract":"Two methods dominate the regression analysis of time-to-event data: the accelerated failure time model and the proportional hazards model. Broadly speaking, these predominate in reliability modelling and biomedical applications, respectively. However, many other methods have been proposed, including proportional odds, proportional mean residual life and several other “proportional” models. This paper presents an overview of the field and the concept behind each of these ideas. Multi-parameter modelling is also discussed, in which (in contrast to, say, the proportional hazards model) more than one parameter of the lifetime distribution may depend on covariates. This includes first hitting time (or threshold) regression based on an underlying latent stochastic process. Many of the methods that have been proposed have seen little or no practical use. Lack of user-friendly software is certainly a factor in this. Diagnostic methods are also lacking for most methods.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43006947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-04DOI: 10.3390/stats5040077
M. Ichino
{"title":"The Lookup Table Regression Model for Histogram-Valued Symbolic Data","authors":"M. Ichino","doi":"10.3390/stats5040077","DOIUrl":"https://doi.org/10.3390/stats5040077","url":null,"abstract":"This paper presents the Lookup Table Regression Model (LTRM) for histogram-valued symbolic data. We first transform the given symbolic data to a numerical data table by the quantile method. Then, under the selected response variable, we apply the Monotone Blocks Segmentation (MBS) to the obtained numerical data table. If the selected response variable and some remained explanatory variable(s) organize a monotone structure, the MBS generates a Lookup Table composed of interval values. For a given object, we search the nearest value of an explanatory variable, then the corresponding value of the response variable becomes the estimated value. If the response variable and the explanatory variable(s) are covariate but they follow to a non-monotonic structure, we need to divide the given data into several monotone substructures. For this purpose, we apply the hierarchical conceptual clustering to the given data, and we obtain Multiple Lookup Tables by applying the MBS to each of substructures. We show the usefulness of the proposed method by using an artificial data set and real data sets.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44834828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-02DOI: 10.3390/stats5040076
Tingting Zhou, M. Elliott, R. Little
{"title":"Addressing Disparities in the Propensity Score Distributions for Treatment Comparisons from Observational Studies","authors":"Tingting Zhou, M. Elliott, R. Little","doi":"10.3390/stats5040076","DOIUrl":"https://doi.org/10.3390/stats5040076","url":null,"abstract":"Propensity score (PS) based methods, such as matching, stratification, regression adjustment, simple and augmented inverse probability weighting, are popular for controlling for observed confounders in observational studies of causal effects. More recently, we proposed penalized spline of propensity prediction (PENCOMP), which multiply-imputes outcomes for unassigned treatments using a regression model that includes a penalized spline of the estimated selection probability and other covariates. For PS methods to work reliably, there should be sufficient overlap in the propensity score distributions between treatment groups. Limited overlap can result in fewer subjects being matched or in extreme weights causing numerical instability and bias in causal estimation. The problem of limited overlap suggests (a) defining alternative estimands that restrict inferences to subpopulations where all treatments have the potential to be assigned, and (b) excluding or down-weighting sample cases where the propensity to receive one of the compared treatments is close to zero. We compared PENCOMP and other PS methods for estimation of alternative causal estimands when limited overlap occurs. Simulations suggest that, when there are extreme weights, PENCOMP tends to outperform the weighted estimators for ATE and performs similarly to the weighted estimators for alternative estimands. We illustrate PENCOMP in two applications: the effect of antiretroviral treatments on CD4 counts using the Multicenter AIDS cohort study (MACS) and whether right heart catheterization (RHC) is a beneficial treatment in treating critically ill patients.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47634803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-12-01DOI: 10.3390/stats5040075
L. Al-Labadi, Yifan Cheng, Forough Fazeli-Asl, Kyuson Lim, Ya-Fang Weng
{"title":"A Bayesian One-Sample Test for Proportion","authors":"L. Al-Labadi, Yifan Cheng, Forough Fazeli-Asl, Kyuson Lim, Ya-Fang Weng","doi":"10.3390/stats5040075","DOIUrl":"https://doi.org/10.3390/stats5040075","url":null,"abstract":"This paper deals with a new Bayesian approach to the one-sample test for proportion. More specifically, let x=(x1,…,xn) be an independent random sample of size n from a Bernoulli distribution with an unknown parameter θ. For a fixed value θ0, the goal is to test the null hypothesis H0:θ=θ0 against all possible alternatives. The proposed approach is based on using the well-known formula of the Kullback–Leibler divergence between two binomial distributions chosen in a certain way. Then, the difference of the distance from a priori to a posteriori is compared through the relative belief ratio (a measure of evidence). Some theoretical properties of the method are developed. Examples and simulation results are included.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48572391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-11-29DOI: 10.3390/stats5040074
Lili Yu, Yichuan Zhao
{"title":"A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling","authors":"Lili Yu, Yichuan Zhao","doi":"10.3390/stats5040074","DOIUrl":"https://doi.org/10.3390/stats5040074","url":null,"abstract":"Rubin’s variance estimator of the multiple imputation estimator for a domain mean is not asymptotically unbiased. Kim et al. derived the closed-form bias for Rubin’s variance estimator. In addition, they proposed an asymptotically unbiased variance estimator for the multiple imputation estimator when the imputed values can be written as a linear function of the observed values. However, this needs the assumption that the covariance of the imputed values in the same imputed dataset is twice that in the different imputed datasets. In this study, we proposed a bootstrap variance estimator that does not need this assumption. Both theoretical argument and simulation studies show that it was unbiased and asymptotically valid. The new method was applied to the Hox pupil popularity data for illustration.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41728934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-11-28DOI: 10.3390/stats5040073
I. Tsolas
{"title":"Assessing Regional Entrepreneurship: A Bootstrapping Approach in Data Envelopment Analysis","authors":"I. Tsolas","doi":"10.3390/stats5040073","DOIUrl":"https://doi.org/10.3390/stats5040073","url":null,"abstract":"The aim of the present paper is to demonstrate the viability of using data envelopment analysis (DEA) in a regional context to evaluate entrepreneurial activities. DEA was used to assess regional entrepreneurship in Greece using individual measures of entrepreneurship as inputs and employment rates as outputs. In addition to point estimates, a bootstrap algorithm was used to produce bias-corrected metrics. In the light of the results of the study, the Greek regions perform differently in terms of converting entrepreneurial activity into job creation. Moreover, there is some evidence that unemployment may be a driver of entrepreneurship and thus negatively affects DEA-based inefficiency. The derived indicators can serve as diagnostic tools and can also be used for the design of various interventions at the regional level.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44335683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StatsPub Date : 2022-11-23DOI: 10.3390/stats5040072
P. N. Rathie, L. Ozelim
{"title":"On the Relation between Lambert W-Function and Generalized Hypergeometric Functions","authors":"P. N. Rathie, L. Ozelim","doi":"10.3390/stats5040072","DOIUrl":"https://doi.org/10.3390/stats5040072","url":null,"abstract":"In the theory of special functions, finding correlations between different types of functions is of great interest as unifying results, especially when considering issues such as analytic continuation. In the present paper, the relation between Lambert W-function and generalized hypergeometric functions is discussed. It will be shown that it is possible to link these functions by following two different strategies, namely, by means of the direct and inverse Mellin transform of Lambert W-function and by solving the trinomial equation originally studied by Lambert and Euler. The new results can be used both to numerically evaluate Lambert W-function and to study its analytic structure.","PeriodicalId":93142,"journal":{"name":"Stats","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41910380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}