{"title":"A Bayesian Approach for Model-Based Clustering of Several Binary Dissimilarity Matrices: The dmbc Package in R","authors":"S. Venturini, R. Piccarreta","doi":"10.18637/jss.v100.i16","DOIUrl":"https://doi.org/10.18637/jss.v100.i16","url":null,"abstract":"We introduce the new package dmbc that implements a Bayesian algorithm for clustering a set of binary dissimilarity matrices within a model-based framework. Specifically, we consider the case when S matrices are available, each describing the dissimilarities among the same n objects, possibly expressed by S subjects (judges), or measured under different experimental conditions, or with reference to different characteristics of the objects themselves. In particular, we focus on binary dissimilarities, taking values 0 or 1 depending on whether or not two objects are deemed as dissimilar. We are interested in analyzing such data using multidimensional scaling (MDS). Differently from standard MDS algorithms, our goal is to cluster the dissimilarity matrices and, simultaneously, to extract an MDS configuration specific for each cluster. To this end, we develop a fully Bayesian three-way MDS approach, where the elements of each dissimilarity matrix are modeled as a mixture of Bernoulli random vectors. The parameter estimates and the MDS configurations are derived using a hybrid Metropolis-Gibbs Markov Chain Monte Carlo algorithm. We also propose a BIC-like criterion for jointly selecting the optimal number of clusters and latent space dimensions. We illustrate our approach referring both to synthetic data and to a publicly available data set taken from the literature. For the sake of efficiency, the core computations in the package are implemented in C/C++. The package also allows the simulation of multiple chains through the support of the parallel package.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"32 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76029407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. T. Ho, Kim P. Huynh, David T. Jacho-Chávez, Diego Rojas-Baez
{"title":"Data Science in Stata 16: Frames, Lasso, and Python Integration","authors":"A. T. Ho, Kim P. Huynh, David T. Jacho-Chávez, Diego Rojas-Baez","doi":"10.18637/jss.v098.s01","DOIUrl":"https://doi.org/10.18637/jss.v098.s01","url":null,"abstract":"Stata (StataCorp 2019) is one of the most widely used software for data analysis, statistics, and model fitting by economists, public policy researchers, epidemiologists, among others. Stata’s recent release of version 16 in June 2019 includes an up-to-date methodological library and a user-friendly version of various cutting edge techniques. In the newest release, Stata has implemented several changes and additions (see https://www.stata.com/new-in-stata/) that include lasso, multiple data sets in memory, meta-analysis, choice models, Python integration, Bayes-multiple chains, panel-data extended regression models, sample-size analysis for confidence intervals, panel-data mixed logit, nonlinear dynamic stochastic general equilibrium (DSGE) models, numerical integration. This review covers the most salient innovations in Stata 16. It is the first release that brings along an implementation of machine-learning tools. The three innovations we consider in this review are: (1) Multiple data sets in Memory, (2) Lasso for causal inference, and (3) Python integration. The following three sections are used to describe each one of these innovations. The last section are the final thoughts and conclusions of our review.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"3 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80646023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolás M Ballarini, Marius Thomas, G. Rosenkranz, B. Bornkamp
{"title":"subtee: An R Package for Subgroup Treatment Effect Estimation in Clinical Trials","authors":"Nicolás M Ballarini, Marius Thomas, G. Rosenkranz, B. Bornkamp","doi":"10.18637/jss.v099.i14","DOIUrl":"https://doi.org/10.18637/jss.v099.i14","url":null,"abstract":"The investigation of subgroups is an integral part of randomized clinical trials. Exploration of treatment effect heterogeneity is typically performed by covariate-adjusted analyses including treatment-by-covariate interactions. Several statistical techniques, such as model averaging and bagging, were proposed recently to address the problem of selection bias in treatment effect estimates for subgroups. In this paper, we describe the subtee R package for subgroup treatment effect estimation. The package can be used for all commonly encountered type of outcomes in clinical trials (continuous, binary, survival, count). We also provide additional functions to build the subgroup variables to be used and to plot the results using forest plots. The functions are demonstrated using data from a clinical trial investigating a treatment for prostate cancer with a survival endpoint.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"5 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82936817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IncDTW: An R Package for Incremental Calculation of Dynamic Time Warping","authors":"Maximilian Leodolter, C. Plant, Norbert Brändle","doi":"10.18637/jss.v099.i09","DOIUrl":"https://doi.org/10.18637/jss.v099.i09","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"57 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78058745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BNPmix: An R Package for Bayesian Nonparametric Modeling via Pitman-Yor Mixtures","authors":"R. Corradin, A. Canale, Bernardo Nipoti","doi":"10.18637/jss.v100.i15","DOIUrl":"https://doi.org/10.18637/jss.v100.i15","url":null,"abstract":"This introduction to the R package BNPmix is currently in press in the Journal of Statistical Software. BNPmix is an R package for Bayesian nonparametric multivariate density estimation, clustering, and regression, using Pitman-Yor mixture models, a flexible and robust generalization of the popular class of Dirichlet process mixture models. A variety of model specifications and state-of-the-art posterior samplers are implemented. In order to achieve computational efficiency, all sampling methods are written in C++ and seamless integrated into R by means of the Rcpp and RcppArmadillo packages. BNPmix exploits the ggplot2 capabilities and implements a series of generic functions to plot and print summaries of posterior densities and induced clustering of the data.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"79 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83762513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"dynamichazard: Dynamic Hazard Models Using State Space Models","authors":"Benjamin Christoffersen","doi":"10.18637/jss.v099.i07","DOIUrl":"https://doi.org/10.18637/jss.v099.i07","url":null,"abstract":"","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85183686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible Scan Statistics for Detecting Spatial Disease Clusters: The rflexscan R Package","authors":"Takahiro Otani, Kunihiko Takahashi","doi":"10.18637/jss.v099.i13","DOIUrl":"https://doi.org/10.18637/jss.v099.i13","url":null,"abstract":"The spatial scan statistic is commonly used to detect spatial disease clusters in epidemiological studies. Among the various types of scan statistics, the flexible scan statistic proposed by Tango and Takahashi (2005) is one of the most promising methods to detect arbitrarily-shaped clusters. In this paper, we introduce a new R package, rflexscan (Otani and Takahashi 2021), that provides efficient and easy-to-use methods for analyses of spatial count data using the flexible spatial scan statistic. The package is designed for any of the following interrelated purposes: to evaluate whether reported spatial disease clusters are statistically significant, to test whether a disease is randomly distributed over space, and to perform geographical surveillance of disease to detect areas of significantly high rates. The functionality of the package is demonstrated through an application to a public-domain small-area cancer incidence dataset in New York State, USA, between 2005 and 2009.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"13 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81783097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NScluster: An R Package for Maximum Palm Likelihood Estimation for Cluster Point Process Models Using OpenMP","authors":"U. Tanaka, Masami Saga, Junji Nakano","doi":"10.18637/jss.v098.i06","DOIUrl":"https://doi.org/10.18637/jss.v098.i06","url":null,"abstract":"NScluster is an R package used for simulation and parameter estimation for NeymanScott cluster point process models and their extensions. For parameter estimation, NScluster uses the maximum Palm likelihood estimation procedure. As some estimation procedures proposed herein require heavy calculation, NScluster can use parallel computation via OpenMP and achieve significant speedup in some cases. In this paper, we discuss results obtained using a laptop PC and a shared memory supercomputer. In addition, we examine the performance characteristics of parallel computation via OpenMP.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"29 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85885022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Bonner, Hanjoe Kim, D. Westneat, A. Mutzel, Jonathan Wright, Matthew R. Schofield
{"title":"dalmatian: A Package for Fitting Double Hierarchical Linear Models in R via JAGS and nimble","authors":"S. Bonner, Hanjoe Kim, D. Westneat, A. Mutzel, Jonathan Wright, Matthew R. Schofield","doi":"10.18637/jss.v100.i10","DOIUrl":"https://doi.org/10.18637/jss.v100.i10","url":null,"abstract":"Traditional regression models, including generalized linear mixed models, focus on understanding the deterministic factors that affect the mean of a response variable. Many biological studies seek to understand non-deterministic patterns in the variance or dispersion of a phenotypic or ecological response variable. We describe a new R package, dalmatian, that provides methods for fitting double hierarchical generalized linear models incorporating fixed and random predictors of both the mean and variance. Models are fit via Markov chain Monte Carlo sampling implemented in either JAGS or nimble and the package provides simple functions for monitoring the sampler and summarizing the results. We illustrate these functions through an application to data on food delivery by breeding pied flycatchers (Ficedula hypoleuca). Our intent is that this package makes it easier for practitioners to implement these models without having to learn the intricacies of Markov chain Monte Carlo methods.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"96 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74155214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"D-STEM v2: A Software for Modeling Functional Spatio-Temporal Data","authors":"Yaqiong Wang,Francesco Finazzi,Alessandro Fassò","doi":"10.18637/jss.v099.i10","DOIUrl":"https://doi.org/10.18637/jss.v099.i10","url":null,"abstract":"Functional spatio-temporal data naturally arise in many environmental and climate applications where data are collected in a three-dimensional space over time. The MATLAB D-STEM v1 software package was first introduced for modelling multivariate space-time data and has been recently extended to D-STEM v2 to handle functional data indexed across space and over time. This paper introduces the new modelling capabilities of DSTEM v2 as well as the complexity reduction techniques required when dealing with large data sets. Model estimation, validation and dynamic kriging are demonstrated in two case studies, one related to ground-level air quality data in Beijing, China, and the other one related to atmospheric profile data collected globally through radio sounding.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"83 11","pages":""},"PeriodicalIF":5.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}