Haifeng Wang, Chang Pan, Xiao Guo, Chun Ji, Ke Deng
{"title":"From object detection to text detection and recognition: A brief evolution history of optical character recognition","authors":"Haifeng Wang, Chang Pan, Xiao Guo, Chun Ji, Ke Deng","doi":"10.1002/wics.1547","DOIUrl":"https://doi.org/10.1002/wics.1547","url":null,"abstract":"Text detection and recognition, which is also known as optical character recognition (OCR), is an active research area under quick development with a lot of exciting applications. Deep‐learning‐based methods represent the state‐of‐art of this area. However, these methods are largely deterministic: they give a deterministic output for each input. For both statisticians and general users, methods supporting uncertainty inference are of great appeal, leaving rich research opportunities to incorporate statistical models and methods with the established deep‐learning‐based approaches. In this paper, we provide a comprehensive review of the evolution history of research development on OCR with discussions on the statistical insights behind these developments and potential directions to enhance the current methods with statistical approaches. We hope this article can serve as a useful guidebook for statisticians who are seeking for a path toward edge‐cutting research in this exciting area.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2021-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1547","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46185325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ordinal regression: A review and a taxonomy of models","authors":"G. Tutz","doi":"10.1002/wics.1545","DOIUrl":"https://doi.org/10.1002/wics.1545","url":null,"abstract":"Ordinal models can be seen as being composed from simpler, in particular binary models. This view on ordinal models allows to derive a taxonomy of models that includes basic ordinal regression models, models with more complex parameterizations, the class of hierarchically structured models, and the more recently developed finite mixture models. The structured overview that is given covers existing models and shows how models can be extended to account for further effects of explanatory variables. Particular attention is given to the modeling of additional heterogeneity as, for example, dispersion effects. The modeling is embedded into the framework of response styles and the exact meaning of heterogeneity terms in ordinal models is investigated. It is shown that the meaning of terms is crucially determined by the type of model that is used. Moreover, it is demonstrated how models with a complex category‐specific effect structure can be simplified to obtain simpler models that fit sufficiently well. The fitting of models is illustrated by use of a real data set, and a short overview of existing software is given.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2021-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1545","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48668266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Gibbs sampler","authors":"Taeyoung Park, Seunghan Lee","doi":"10.1002/wics.1546","DOIUrl":"https://doi.org/10.1002/wics.1546","url":null,"abstract":"The Gibbs sampler is a simple but very powerful algorithm used to simulate from a complex high‐dimensional distribution. It is particularly useful in Bayesian analysis when a complex Bayesian model involves a number of model parameters and the conditional posterior distribution of each component given the others can be derived as a standard distribution. In the presence of a strong correlation structure among components, however, the Gibbs sampler can be criticized for its slow convergence. Here we discuss several algorithmic strategies such as blocking, collapsing, and partial collapsing that are available for improving the convergence characteristics of the Gibbs sampler.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2021-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1546","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42829044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stata","authors":"R. Gutierrez","doi":"10.1002/wics.116","DOIUrl":"https://doi.org/10.1002/wics.116","url":null,"abstract":"Stata is general‐purpose statistical software. Currently in version 11, Stata is known for its wide range of statistical routines, ease of data management, and publication‐quality graphics. Stata is available on virtually all computing platforms, including Windows, Macintosh, and most varieties of Unix/Linux. It is designed to run on both 32‐bit and 64‐bit architectures and operating systems. Stata possesses both a command‐line interface and a point‐and‐click menu interface, with a one‐to‐one correspondence between the two. Stata appeals to researchers from a wide range of fields, with concentrations in the health sciences and in economics. Statistically, Stata strengths are in the areas of panel/longitudinal data, survival analysis, and the analysis of data from complex surveys. Users can program their own routines using a mixture of Stata's own interpretive language and the compiled matrix‐programming language Mata, included with all Stata installations. Stata is offered in three flavors: Stata/IC, a standard version adequate for most purposes; Stata/SE, an expanded version for use with larger (wider) datasets; and Stata/MP, a version with specialized code designed to make use of multiple cores/processors and run faster on systems that have them. WIREs Comp Stat 2010 2 728–733 DOI: 10.1002/wics.116","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.116","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51205391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sampling James R. Thompson's inspired nonparametric portfolio approaches","authors":"J. Dobelman","doi":"10.1002/wics.1542","DOIUrl":"https://doi.org/10.1002/wics.1542","url":null,"abstract":"Asset or security returns are an example of phenomena whose distributions still cannot be convincingly modeled in a parametric framework. James R. (Jim) Thompson (1938–2017) used a variety of nonparametric approaches to develop workable investing solutions in such an environment. We review his ground breaking exploration of the veracity of the capital asset pricing model (CAPM), and several nonparametric approaches to portfolio formulation including the Simugram™, variants of his Max‐Median rule, and Tukey weightings.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2020-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1542","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41726452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Item response theory and its applications in educational measurement Part II: Theory and practices of test equating in item response theory","authors":"Kazuki Hori, Hirotaka Fukuhara, Tsuyoshi Yamada","doi":"10.1002/wics.1543","DOIUrl":"https://doi.org/10.1002/wics.1543","url":null,"abstract":"Item response theory (IRT) is a class of latent variable models, which are used to develop educational and psychological tests (e.g., standardized tests, personality tests, tests for licensure and certification). We offer readers with comprehensive overviews of the theory and applications of IRT through two articles. While Part 1 of the review discusses topics such as foundations of educational measurement, IRT models, item parameter estimation, and applications of IRT with R, this Part 2 reviews areas of test scores based on IRT. The primary focus is on presenting various topics with respect to test equating such as equating designs, IRT‐based equating methods, anchor stability check methods, and impact data analysis that psychometricians would deal with for a large‐scale standardized assessment in practice. These analyses are illustrated in Example section using data from Kolen and Brennan (2014). We also cover the foundation of IRT, IRT‐based person ability parameter estimation methods, and scaling and scale score.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2020-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1543","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47834229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero‐inflated modeling part II: Zero‐inflated models for complex data structures","authors":"D. S. Young, Eric Roemmele, Xuan Shi","doi":"10.1002/wics.1540","DOIUrl":"https://doi.org/10.1002/wics.1540","url":null,"abstract":"The prequel to this review provided an extensive treatment of classic zero‐inflated count regression models where a univariate discrete distribution is used for the count regression component of the model. The treatment of zero inflation beyond the classic univariate count regression setting has seen a substantial increase in recent years. This second review paper surveys some of this recent literature and focuses on important developments in handling zero inflation for correlated count settings, discrete time series models, spatial models, and multivariate models. We discuss some of the available computational tools for performing estimation in these settings, while again highlighting the diverse data problems that have been addressed using these methods.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2020-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1540","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49314739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero‐inflated modeling part I: Traditional zero‐inflated count regression models, their applications, and computational tools","authors":"D. S. Young, Eric Roemmele, Peng Yeh","doi":"10.1002/wics.1541","DOIUrl":"https://doi.org/10.1002/wics.1541","url":null,"abstract":"Count regression models maintain a steadfast presence in modern applied statistics as highlighted by their usage in diverse areas like biometry, ecology, and insurance. However, a common practical problem with observed count data is the presence of excess zeros relative to the assumed count distribution. The seminal work of Lambert (1992) was one of the first articles to thoroughly treat the problem of zero‐inflated count data in the presence of covariates. Since then, a vast literature has emerged regarding zero‐inflated count regression models. In this first of two review articles, we survey some of the classic and contemporary literature on parametric zero‐inflated count regression models, with emphasis on the utility of different univariate discrete distributions. We highlight some of the primary computational tools available for estimating and assessing the adequacy of these models. We concurrently emphasize the diverse data problems to which these models have been applied.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1541","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41347907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}