{"title":"Python and R for the Modern Data Scientist","authors":"C. Lortie","doi":"10.18637/jss.v103.b02","DOIUrl":"https://doi.org/10.18637/jss.v103.b02","url":null,"abstract":"Computation in many fields including those that use statistical software is increasingly driven by needs that can be addressed in many programming ecosystems. In projects that require statistical analyses, both R and Python comprise two frequent resources. In ecology, R is the most frequently used (Lai, Lortie, Muenchen, Yang, and Ma 2019). In bioinformatic gene set analyses, R is also more frequently used in peer-reviewed publications, but Python is still an important statistical resource depending on the specific project (Xie, Jauhari, and Mora 2021). Python outcompetes other languages in use for machine learning and some forms of factor analyses (Hao and Ho 2019; Persson and Khojasteh 2021; Raschka, Patterson, and Nolet 2020). However, the relative frequency that a tool is used for statistical analyses is only one metric of importance and not necessarily a proxy for its merit or its capacity to support innovation and efficient in analyses for practitioners (Zhao, Yan, and Li 2018). It is thus critical that we explore contrasts of at least these two common software languages that support statistics because data scientists can become isolated or polarized within their specific competencies, ideologies, and workflows. A high-level discussion of strengths and weaknesses specific to data endeavors with statistics is germane to both decisions on specific projects and on competency development as a scientist.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"52 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72785050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ParMA: Parallelized Bayesian Model Averaging for Generalized Linear Models","authors":"R. Lucchetti, Luca Pedini","doi":"10.18637/jss.v104.i02","DOIUrl":"https://doi.org/10.18637/jss.v104.i02","url":null,"abstract":"This paper describes the gretl function package ParMA , which provides Bayesian model averaging (BMA) in generalized linear models. In order to overcome the lack of analytical specification for many of the models covered, the package features an implementation of the reversible jump Markov chain Monte Carlo technique, following the original idea by Green (1995), as a flexible tool to model several specifications. Particular attention is devoted to computational aspects such as the automatization of the model building procedure and the parallelization of the sampling scheme.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"104 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87323845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"synthACS: Spatial Microsimulation Modeling with Synthetic American Community Survey Data","authors":"Alex P. Whitworth","doi":"10.18637/jss.v104.i07","DOIUrl":"https://doi.org/10.18637/jss.v104.i07","url":null,"abstract":"synthACS is an R package that provides flexible tools for building synthetic micro-datasets based on American Community Survey (ACS) base tables, allows data-extensibility and enables to conduct spatial microsimulation modeling (SMSM) via simulated annealing. To our knowledge, it is the first R package to provide broadly applicable tools for SMSM with ACS data as well as the first SMSM implementation that uses unequal probability sampling in the simulated annealing algorithm. In this paper, we contextualize these developments within the SMSM literature, provide a hands-on user-guide to package synthACS , present a case study of SMSM related to population dynamics, and note areas for future research.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"100 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76993187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carel F. W. Peeters, A. E. Bilgrau, W. V. van Wieringen
{"title":"rags2ridges: A One-Stop- ℓ2 -Shop for ","authors":"Carel F. W. Peeters, A. E. Bilgrau, W. V. van Wieringen","doi":"10.18637/jss.v102.i04","DOIUrl":"https://doi.org/10.18637/jss.v102.i04","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67679193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Doing Meta-Analysis with R - A Hands-On Guide","authors":"C. Lortie","doi":"10.18637/jss.v102.b02","DOIUrl":"https://doi.org/10.18637/jss.v102.b02","url":null,"abstract":"Scientific synthesis is a diverse field of contemporary science. Syntheses advance knowledge in many domains and can include data compilation, theory syntheses, methods contrasts, and systematic reviews with meta-analyses through an integrated and big-picture view of evidence (Halpern et al. 2020). All these knowledge tools are typically strongly supported by statistical software including the open-source programming language R. Within this environment, there are nearly 100 packages to support meta-analyses each with different functions and specific capabilities (Lortie and Filazzola 2020). Meta-analyses are defined in most domains as the calculation of effect sizes or a weighted relative strength of evidence from a set of studies or trials to then subsequently examine high-level statistical patterns and variance (Gurevitch, Koricheva, Nakagawa, and Stewart 2018). They are increasingly used in many fields of science to examine consilience in hypotheses (Lortie 2014) and have been proposed as the gold or even platinum standard of evidence when there is statistical agreement in the efficacy of an intervention across studies (Stegenga 2011). Consequently, there is a critical need for accessible, pragmatic publications, resources, and texts that enable scientists with varying levels of expertise to engage in scientific syntheses using meta-analysis.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"39 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77664859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"plot3logit: Ternary Plots for Interpreting Trinomial Regression Models","authors":"F. Santi, M. M. Dickson, G. Espa, D. Giuliani","doi":"10.18637/jss.v103.c01","DOIUrl":"https://doi.org/10.18637/jss.v103.c01","url":null,"abstract":"This paper presents the R package plot3logit which enables the covariate effects of trinomial regression models to be represented graphically by means of a ternary plot. The aim of the plot is helping the interpretation of regression coefficients in terms of the effects that a change in values of regressors has on the probability distribution of the dependent variable. Such changes may involve either a single regressor, or a group of them (composite changes), and the package permits both cases to be handled in a user-friendly way. Moreover, plot3logit can compute and draw confidence regions of the effects of covariate changes and enables multiple changes and profiles to be represented and compared jointly. Upstream and downstream compatibility makes the package able to work with other R packages or applications other than R .","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"382 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82501290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Identification and Forecasting of Structural Unobserved Components Models with UComp","authors":"D. J. Pedregal","doi":"10.18637/jss.v103.i09","DOIUrl":"https://doi.org/10.18637/jss.v103.i09","url":null,"abstract":"UComp is a powerful library for building unobserved components models, useful for forecasting and other important operations, such us de-trending, cycle analysis, seasonal adjustment, signal extraction, etc. One of the most outstanding features that makes UComp unique among its class of related software implementations is that models may be built automatically by identification algorithms (three versions are available). These algorithms select the best model among many possible combinations. Another relevant feature is that it is coded in C++ , opening the door to link it to different popular and widely used environments, like R , MATLAB , Octave , Python , etc. The implemented models for the components are more general than the usual ones in the field of unobserved components modeling, including different types of trend, cycle, seasonal and irregular components, input variables and outlier detection. The automatic character of the algorithms required the development of many complementary algorithms to control performance and make it applicable to as many different time series as possible. The library is open source and available in different formats in public repositories. The performance of the library is illustrated working on real data in several varied examples.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"77 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88563192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[RETRACTED ARTICLE] irtplay: An R Package for Unidimensional Item Response Theory Modeling","authors":"Hwanggyu Lim, C. Wells","doi":"10.18637/jss.v103.i12","DOIUrl":"https://doi.org/10.18637/jss.v103.i12","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"215 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75684716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regularized Ordinal Regression and the ordinalNet R Package.","authors":"Michael J Wurm, Paul J Rathouz, Bret M Hanlon","doi":"10.18637/jss.v099.i06","DOIUrl":"10.18637/jss.v099.i06","url":null,"abstract":"<p><p>Regularization techniques such as the lasso (Tibshirani 1996) and elastic net (Zou and Hastie 2005) can be used to improve regression model coefficient estimation and prediction accuracy, as well as to perform variable selection. Ordinal regression models are widely used in applications where the use of regularization could be beneficial; however, these models are not included in many popular software packages for regularized regression. We propose a coordinate descent algorithm to fit a broad class of ordinal regression models with an elastic net penalty. Furthermore, we demonstrate that each model in this class generalizes to a more flexible form, that can be used to model either ordered or unordered categorical response data. We call this the <i>elementwise link multinomial-ordinal</i> (ELMO) class, and it includes widely used models such as multinomial logistic regression (which also has an ordinal form) and ordinal logistic regression (which also has an unordered multinomial form). We introduce an elastic net penalty class that applies to either model form, and additionally, this penalty can be used to shrink a non-ordinal model toward its ordinal counterpart. Finally, we introduce the R package <b>ordinalNet</b>, which implements the algorithm for this model class.</p>","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"99 6","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8432594/pdf/nihms-1018361.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39408264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Analysis of Sample Selection Models through the R Package ssmrob","authors":"Mikhail Zhelonkin, E. Ronchetti","doi":"10.18637/jss.v099.i04","DOIUrl":"https://doi.org/10.18637/jss.v099.i04","url":null,"abstract":"The aim of this paper is to describe the implementation and to provide a tutorial for the R package ssmrob, which is developed for robust estimation and inference in sample selection and endogenous treatment models. The sample selectivity issue occurs in practice in various fields, when a non-random sample of a population is observed, i.e., when observations are present according to some selection rule. It is well known that the classical estimators introduced by Heckman (1979) are very sensitive to small deviations from the distributional assumptions (typically the normality assumption on the error terms). Zhelonkin, Genton, and Ronchetti (2016) investigated the robustness properties of these estimators and proposed robust alternatives to the estimator and the corresponding test. We briefly discuss the robust approach and demonstrate its performance in practice by providing several empirical examples. The package can be used both to produce a complete robust statistical analysis of these models which complements the classical one and as a set of useful tools for exploratory data analysis. Specifically, robust estimators and standard errors of the coefficients of both the selection and the regression equations are provided together with a robust test of selectivity. The package therefore provides additional useful information to practitioners in different fields of applications by enhancing their statistical analysis of these models.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"40 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90273129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}