{"title":"Robust Mediation Analysis: The R Package robmed","authors":"A. Alfons, N. Ateş, P. Groenen","doi":"10.18637/jss.v103.i13","DOIUrl":"https://doi.org/10.18637/jss.v103.i13","url":null,"abstract":"Mediation analysis is one of the most widely used statistical techniques in the social, behavioral, and medical sciences. Mediation models allow to study how an independent variable affects a dependent variable indirectly through one or more intervening variables, which are called mediators. The analysis is often carried out via a series of linear regressions, in which case the indirect effects can be computed as products of coefficients from those regressions. Statistical significance of the indirect effects is typically assessed via a bootstrap test based on ordinary least-squares estimates. However, this test is sensitive to outliers or other deviations from normality assumptions, which poses a serious threat to empirical testing of theory about mediation mechanisms. The R package robmed implements a robust procedure for mediation analysis based on the fast-and-robust bootstrap methodology for robust regression estimators, which yields reliable results even when the data deviate from the usual normality assumptions. Various other procedures for mediation analysis are included in package robmed as well. Moreover, robmed introduces a new formula interface that allows to specify mediation models with a single formula, and provides various plots for diagnostics or visual representation of the results.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"40 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74051967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Clustering with Contiguity Constraint in R","authors":"G. Guénard, P. Legendre","doi":"10.18637/jss.v103.i07","DOIUrl":"https://doi.org/10.18637/jss.v103.i07","url":null,"abstract":"This article presents a new implementation of hierarchical clustering for the R language that allows one to apply spatial or temporal contiguity constraints during the clustering process. The need for contiguity constraint arises, for instance, when one wants to partition a map into different domains of similar physical conditions, identify discontinuities in time series, group regional administrative units with respect to their performance, and so on. To increase computation efficiency, we programmed the core functions in plain C . The result is a new R function, constr.hclust , which is distributed in package adespatial . The program implements the general agglomerative hierarchical clustering algorithm described by Lance and Williams (1966; 1967), with the particularity of allowing only clusters that are contiguous in geographic space or along time to fuse at any given step. Contiguity can be defined with respect to space or time. Information about spatial contiguity is provided by a connection network among sites, with edges describing the links between connected sites. Clustering with a temporal contiguity constraint is also known as chronological clustering. Information on temporal contiguity can be implicitly provided as the rank positions of observations in the time series. The implementation was mirrored on that found in the hierarchical clustering function hclust of the standard R package stats ( R Core Team 2022). We transcribed that function from Fortran to C and added the functionality to apply constraints when running the function. The implementation is efficient. It is limited mainly by input/output access as massive amounts of memory are potentially needed to store copies of the dissimilarity matrix and update its elements when analyzing large problems. We provided R computer code for plotting results for numbers of clusters.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"2013 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87740464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"stringi: Fast and Portable Character String Processing in R","authors":"M. Gagolewski","doi":"10.18637/jss.v103.i02","DOIUrl":"https://doi.org/10.18637/jss.v103.i02","url":null,"abstract":"Effective processing of character strings is required at various stages of data analysis pipelines: from data cleansing and preparation, through information extraction, to report generation. Pattern searching, string collation and sorting, normalization, transliteration, and formatting are ubiquitous in text mining, natural language processing, and bioinformatics. This paper discusses and demonstrates how and why stringi, a mature R package for fast and portable handling of string data based on ICU (International Components for Unicode), should be included in each statistician’s or data scientist’s repertoire to complement their numerical computing and data wrangling skills.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90189254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Base R (2nd Edition)","authors":"James E. Helmreich","doi":"10.18637/jss.v103.b01","DOIUrl":"https://doi.org/10.18637/jss.v103.b01","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"47 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90818707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"bbl: Boltzmann Bayes Learner for High-Dimensional Inference with Discrete Predictors in R","authors":"J. Woo, Jinhua Wang","doi":"10.18637/jss.v101.i05","DOIUrl":"https://doi.org/10.18637/jss.v101.i05","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"22 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74074917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TransModel: An R Package for Linear Transformation Model with Censored Data","authors":"Jie Zhou, Jiajia Zhang, Wenbin Lu","doi":"10.18637/jss.v101.i09","DOIUrl":"https://doi.org/10.18637/jss.v101.i09","url":null,"abstract":"Linear transformation models, including the proportional hazards model and proportional odds model, under right censoring were discussed by Chen, Jin, and Ying (2002). The asymptotic variance of the estimator they proposed has a closed form and can be obtained easily by plug-in rules, which improves the computational efficiency. We develop an R package TransModel based on Chen’s approach. The detailed usage of the package is discussed, and the function is applied to the Veterans’ Administration lung cancer data.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74541724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Pantalone, R. Benedetti, Federica Pierismoni
{"title":"Spbsampling: An R Package for Spatially Balanced Sampling","authors":"Francesco Pantalone, R. Benedetti, Federica Pierismoni","doi":"10.18637/jss.v103.c02","DOIUrl":"https://doi.org/10.18637/jss.v103.c02","url":null,"abstract":"The basic idea underpinning the theory of spatially balanced sampling is that units closer to each other provide less information about a target of inference than units farther apart. Therefore, it should be desirable to select a sample well spread over the population of interest, or a spatially balanced sample . This situation is easily understood in, among many others, environmental, geological, biological, and agricultural surveys, where usually the main feature of the population is to be geo-referenced. Since traditional sampling designs generally do not exploit the spatial features and since it is desirable to take into account the information regarding spatial dependence, several sampling designs have been developed in order to achieve this objective. In this paper, we present the R package Spbsampling , which provides functions in order to perform three specific sampling designs that pursue the aforementioned purpose. In particular, these sampling designs achieve spatially balanced samples using a summary index of the distance matrix. In this sense, the applicability of the package is much wider, as a distance matrix can be defined for units according to variables different than geographical coordinates.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"26 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80110451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christophe Dutang, Vincent Goulet, Nicholas Langevin
{"title":"Feller-Pareto and Related Distributions: Numerical Implementation and Actuarial Applications","authors":"Christophe Dutang, Vincent Goulet, Nicholas Langevin","doi":"10.18637/jss.v103.i06","DOIUrl":"https://doi.org/10.18637/jss.v103.i06","url":null,"abstract":"Actuaries model insurance claim amounts using heavy tailed probability distributions. They routinely need to evaluate quantities related to these distributions such as quantiles in the far right tail, moments or limited moments. Furthermore, actuaries often resort to simulation to solve otherwise untractable risk evaluation problems. The paper discusses our implementation of support functions for the Feller-Pareto distribution for the R package actuar . The Feller-Pareto defines a large family of heavy tailed distributions encompassing the transformed beta family and many variants of the Pareto distribution.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"42 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90748725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steffen Grønneberg, Njål Foldnes, Katerina M. Marcoulides
{"title":"covsim: An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas","authors":"Steffen Grønneberg, Njål Foldnes, Katerina M. Marcoulides","doi":"10.18637/jss.v102.i03","DOIUrl":"https://doi.org/10.18637/jss.v102.i03","url":null,"abstract":"In factor analysis and structural equation modeling non-normal data simulation is traditionally performed by specifying univariate skewness and kurtosis together with the target covariance matrix. However, this leaves little control over the univariate distributions and the multivariate copula of the simulated vector. In this paper we explain how a more flexible simulation method called vine-to-anything (VITA) may be obtained from copula-based techniques, as implemented in a new R package, covsim . VITA is based on the concept of a regular vine, where bivariate copulas are coupled together into a full multivariate copula. We illustrate how to simulate continuous and ordinal data for covariance modeling, and how to use the new package discnorm to test for underlying normality in ordinal data. An introduction to copula and vine simulation is provided in the appendix.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"18 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74784007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The poolr Package for Combining Independent and Dependent p Values","authors":"Ozan Cinar, W. Viechtbauer","doi":"10.18637/jss.v101.i01","DOIUrl":"https://doi.org/10.18637/jss.v101.i01","url":null,"abstract":"The poolr package provides an implementation of a variety of methods for pooling (i.e., combining) p values, including Fisher’s method, Stouffer’s method, the inverse chisquare method, the binomial test, the Bonferroni method, and Tippett’s method. More importantly, the methods can be adjusted to account for dependence among the tests from which the p values have been derived assuming multivariate normality among the test statistics. All methods can be adjusted based on an estimate of the effective number of tests or by using an empirically-derived null distribution based on pseudo replicates that mimics a proper permutation test. For the Fisher, Stouffer, and inverse chi-square methods, the test statistics can also be directly generalized to account for dependence, leading to Brown’s method, Strube’s method, and the generalized inverse chi-square method. In this paper, we describe the various methods, discuss their implementation in the package, illustrate their use based on several examples, and compare the poolr package with several other packages that can be used to combine p values.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"18 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90338614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}