R JournalPub Date : 2023-12-01Epub Date: 2024-04-10DOI: 10.32614/rj-2023-086
Hillary M Heiling, Naim U Rashid, Quefeng Li, Joseph G Ibrahim
{"title":"glmmPen: High Dimensional Penalized Generalized Linear Mixed Models.","authors":"Hillary M Heiling, Naim U Rashid, Quefeng Li, Joseph G Ibrahim","doi":"10.32614/rj-2023-086","DOIUrl":"10.32614/rj-2023-086","url":null,"abstract":"<p><p>Generalized linear mixed models (GLMMs) are widely used in research for their ability to model correlated outcomes with non-Gaussian conditional distributions. The proper selection of fixed and random effects is a critical part of the modeling process, where model misspecification may lead to significant bias. However, the joint selection of fixed and random effects has historically been limited to lower dimensional GLMMs, largely due to the use of criterion-based model selection strategies. Here we present the R package glmmPen, one of the first to select fixed and random effects in higher dimension using a penalized GLMM modeling framework. Model parameters are estimated using a Monte Carlo expectation conditional minimization (MCECM) algorithm, which leverages Stan and RcppArmadillo for increased computational efficiency. Our package supports the Binomial, Gaussian, and Poisson families and multiple penalty functions. In this manuscript we discuss the modeling procedure, estimation scheme, and software implementation through application to a pancreatic cancer subtyping study. Simulation results show our method has good performance in selecting both the fixed and random effects in high dimensional GLMMs.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"15 4","pages":"106-128"},"PeriodicalIF":2.3,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11138212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141181494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R JournalPub Date : 2023-12-01Epub Date: 2024-04-10DOI: 10.32614/rj-2023-081
Christopher R Bilder, Brianna D Hitt, Brad J Biggerstaff, Joshua M Tebbs, Christopher S McMahan
{"title":"binGroup2: Statistical Tools for Infection Identification via Group Testing.","authors":"Christopher R Bilder, Brianna D Hitt, Brad J Biggerstaff, Joshua M Tebbs, Christopher S McMahan","doi":"10.32614/rj-2023-081","DOIUrl":"10.32614/rj-2023-081","url":null,"abstract":"<p><p>Group testing is the process of testing items as an amalgamation, rather than separately, to determine the binary status for each item. Its use was especially important during the COVID-19 pandemic through testing specimens for SARS-CoV-2. The adoption of group testing for this and many other applications is because members of a negative testing group can be declared negative with potentially only one test. This subsequently leads to significant increases in laboratory testing capacity. Whenever a group testing algorithm is put into practice, it is critical for laboratories to understand the algorithm's operating characteristics, such as the expected number of tests. Our paper presents the binGroup2 package that provides the statistical tools for this purpose. This R package is the first to address the identification aspect of group testing for a wide variety of algorithms. We illustrate its use through COVID-19 and chlamydia/gonorrhea applications of group testing.</p>","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"15 4","pages":"21-36"},"PeriodicalIF":2.1,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11139028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141181492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R JournalPub Date : 2023-11-09DOI: 10.32614/rj-2023-049
Rosaria Lombardo, Michel van de Velden, Eric J. Beh
{"title":"Three-Way Correspondence Analysis in R","authors":"Rosaria Lombardo, Michel van de Velden, Eric J. Beh","doi":"10.32614/rj-2023-049","DOIUrl":"https://doi.org/10.32614/rj-2023-049","url":null,"abstract":"Three-way correspondence analysis is a suitable multivariate method for visualising the association in three-way categorical data, modelling the global dependence, or reducing dimensionality. This paper provides a description of an R package for performing three-way correspondence analysis: CA3variants. The functions in this package allow the analyst to perform several variations of this analysis, depending on the research question being posed and/or the properties underlying the data. Users can opt for the classical (symmetrical) approach or the non-symmetric variant - the latter is particularly useful if one of the three categorical variables is treated as a response variable. In addition, to perform the necessary three-way decompositions, a Tucker3 and a trivariate moment decomposition (using orthogonal polynomials) can be utilized. The Tucker3 method of decomposition can be used when one or more of the categorical variables is nominal while for ordinal variables the trivariate moment decomposition can be used. The package also provides a function that can be used to choose the model dimensionality.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":" 30","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135293173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R JournalPub Date : 2023-11-08DOI: 10.32614/rj-2023-040
J. A. F. Torvisco, R. Benítez, M. R. Arias, J. Cabello Sánchez
{"title":"nlstac: Non-Gradient Separable Nonlinear Least Squares Fitting","authors":"J. A. F. Torvisco, R. Benítez, M. R. Arias, J. Cabello Sánchez","doi":"10.32614/rj-2023-040","DOIUrl":"https://doi.org/10.32614/rj-2023-040","url":null,"abstract":"A new package for nonlinear least squares fitting is introduced in this paper. This package implements a recently developed algorithm that, for certain types of nonlinear curve fitting, reduces the number of nonlinear parameters to be fitted. One notable feature of this method is the absence of initialization which is typically necessary for nonlinear fitting gradient-based algorithms. Instead, just some bounds for the nonlinear parameters are required. Even though convergence for this method is guaranteed for exponential decay using the max-norm, the algorithm exhibits remarkable robustness, and its use has been extended to a wide range of functions using the Euclidean norm. Furthermore, this data-fitting package can also serve as a valuable resource for providing accurate initial parameters to other algorithms that rely on them.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"55 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135431319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Workflow for Estimating and Visualising Excess Mortality During the COVID-19 Pandemic","authors":"Garyfallos Konstantinoudis, Virgilio Gómez-Rubio, Michela Cameletti, Monica Pirani, Gianluca Baio, Marta Blangiardo","doi":"10.32614/rj-2023-055","DOIUrl":"https://doi.org/10.32614/rj-2023-055","url":null,"abstract":"COVID-19 related deaths estimates underestimate the pandemic burden on mortality because they suffer from completeness and accuracy issues. Excess mortality is a popular alternative, as it compares the observed number of deaths versus the number that would be expected if the pandemic did not occur. The expected number of deaths depends on population trends, temperature, and spatio-temporal patterns. In addition to this, high geographical resolution is required to examine within country trends and the effectiveness of the different public health policies. In this tutorial, we propose a workflow using R for estimating and visualising excess mortality at high geographical resolution. We show a case study estimating excess deaths during 2020 in Italy. The proposed workflow is fast to implement and allows for combining different models and presenting aggregated results based on factors such as age, sex, and spatial location. This makes it a particularly powerful and appealing workflow for online monitoring of the pandemic burden and timely policy making.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"55 s63","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135431320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R JournalPub Date : 2023-11-08DOI: 10.32614/rj-2023-050
Mauricio Sarrias
{"title":"Estimating Heteroskedastic and Instrumental Variable Models for Binary Outcome Variables in R","authors":"Mauricio Sarrias","doi":"10.32614/rj-2023-050","DOIUrl":"https://doi.org/10.32614/rj-2023-050","url":null,"abstract":"The objective of this article is to introduce the package Rchoice which provides functionality for estimating heteroskedastic and instrumental variable models for binary outcomes, whith emphasis on the calculation of the average marginal effects. To do so, I introduce two new functions of the Rchoice package using widely known applied examples. I also show how users can generate publication-ready tables of regression model estimates.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"55 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135431321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R JournalPub Date : 2023-11-01DOI: 10.32614/rj-2023-056
L.H. Vanegas, L.M. Rondón, G.A. Paula
{"title":"Generalized Estimating Equations using the new R package glmtoolbox","authors":"L.H. Vanegas, L.M. Rondón, G.A. Paula","doi":"10.32614/rj-2023-056","DOIUrl":"https://doi.org/10.32614/rj-2023-056","url":null,"abstract":"This paper introduces a very comprehensive implementation, available in the new `R` package `glmtoolbox`, of a very flexible statistical tool known as Generalized Estimating Equations (GEE), which analyzes cluster correlated data utilizing marginal models. As well as providing more built-in structures for the working correlation matrix than other GEE implementations in `R`, this GEE implementation also allows the user to: $(1)$ compute several estimates of the variance-covariance matrix of the estimators of the parameters of interest; $(2)$ compute several criteria to assist the selection of the structure for the working-correlation matrix; $(3)$ compare nested models using the Wald test as well as the generalized score test; $(4)$ assess the goodness-of-fit of the model using Pearson-, deviance- and Mahalanobis-type residuals; $(5)$ perform sensibility analysis using the global influence approach (that is, dfbeta statistic and Cook's distance) as well as the local influence approach; $(6)$ use several criteria to perform variable selection using a hybrid stepwise procedure; $(7)$ fit models with nonlinear predictors; $(8)$ handle dropout-type missing data under MAR rather than MCAR assumption by using observation-specific or cluster-specific weighted methods. The capabilities of this GEE implementation are illustrated by analyzing four real datasets obtained from longitudinal studies.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"107 5-6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R JournalPub Date : 2023-11-01DOI: 10.32614/rj-2023-052
Casper Hart, Earo Wang
{"title":"Taking the Scenic Route: Interactive and Performant Tour Animations","authors":"Casper Hart, Earo Wang","doi":"10.32614/rj-2023-052","DOIUrl":"https://doi.org/10.32614/rj-2023-052","url":null,"abstract":"The tour provides a useful vehicle for exploring high dimensional datasets. It works by combining a sequence of projections---the tour path---in to an animation---the display method. Current display implementations in R are limited in their interactivity and portability, and give poor performance and jerky animations even for small datasets. We take a detour into web technologies, such as Three.js and WebGL, to support smooth and performant tour visualisations. The R package detourr implements a set of display tools that allow for rich interactions (including orbit controls, scrubbing, and brushing) and smooth animations for large datasets. It provides a declarative R interface which is accessible to new users, and it supports linked views using crosstalk and shiny. The resulting animations are portable across a wide range of browsers and devices. We also extend the radial transformation of the Sage Tour (@laa2021burning) to 3 or more dimensions with an implementation in 3D, and provide a simplified implementation of the Slice Tour (@laa2020slice).","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"107 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R JournalPub Date : 2023-11-01DOI: 10.32614/rj-2023-041
Ezequiel Toum, Pierre Pitte
{"title":"hydrotoolbox, a Package for Hydrometeorological Data Management","authors":"Ezequiel Toum, Pierre Pitte","doi":"10.32614/rj-2023-041","DOIUrl":"https://doi.org/10.32614/rj-2023-041","url":null,"abstract":"The hydrometeorological data provided by federal agencies, research groups and private companies tend to be heterogeneous: records are kept in different formats, quality control processes are not standardized and may even vary within a given agency, variables are not always recorded with the same temporal resolution, and there are data gaps and incorrectly recorded values. Once these problems are dealt with, it is useful to have tools to safely store and manipulate the series, providing temporal aggregation, interactive visualization for analysis, static graphics to publish and/or communicate results, techniques to correct and/or modify the series, among others. Here we introduce a package written in the R language using object-oriented programming and designed to accomplish these objectives, giving to the user a general framework for working with any kind of hydrometeorological series. We present the package design, its strengths, limitations and show its application for two real cases.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"107 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R JournalPub Date : 2023-11-01DOI: 10.32614/rj-2023-043
Bastien Chassagnol, Antoine Bichat, Cheïma Boudjeniba, Pierre-Henri Wuillemin, Mickaël Guedj, David Gohel, Gregory Nuel, Etienne Becht
{"title":"Gaussian Mixture Models in R","authors":"Bastien Chassagnol, Antoine Bichat, Cheïma Boudjeniba, Pierre-Henri Wuillemin, Mickaël Guedj, David Gohel, Gregory Nuel, Etienne Becht","doi":"10.32614/rj-2023-043","DOIUrl":"https://doi.org/10.32614/rj-2023-043","url":null,"abstract":"Gaussian mixture models (GMMs) are widely used for modelling stochastic problems. Indeed, a wide diversity of packages have been developed in R. However, no recent review describing the main features offered by these packages and comparing their performances has been performed. In this article, we first introduce GMMs and the EM algorithm used to retrieve the parameters of the model and analyse the main features implemented among seven of the most widely used R packages. We then empirically compare their statistical and computational performances in relation with the choice of the initialisation algorithm and the complexity of the mixture. We demonstrate that the best estimation with well-separated components or with a small number of components with distinguishable modes is obtained with REBMIX initialisation, implemented in the [rebmix](https://CRAN.R-project.org/package=rebmix) package, while the best estimation with highly overlapping components is obtained with *k*-means or random initialisation. Importantly, we show that implementation details in the EM algorithm yield differences in the parameters' estimation. Especially, packages [mixtools](https://CRAN.R-project.org/package=mixtools) (Young et al. 2020) and [Rmixmod](https://CRAN.R-project.org/package=Rmixmod) (Langrognet et al. 2021) estimate the parameters of the mixture with smaller bias, while the RMSE and variability of the estimates is smaller with packages [bgmm](https://CRAN.R-project.org/package=bgmm) (Ewa Szczurek 2021) , [EMCluster](https://CRAN.R-project.org/package=EMCluster) (W.-C. Chen and Maitra 2022) , [GMKMcharlie](https://CRAN.R-project.org/package=GMKMcharlie) (Liu 2021), [flexmix](https://CRAN.R-project.org/package=flexmix) (Gruen and Leisch 2022) and [mclust](https://CRAN.R-project.org/package=mclust) (Fraley, Raftery, and Scrucca 2022). The comparison of these packages provides R users with useful recommendations for improving the computational and statistical performance of their clustering and for identifying common deficiencies. Additionally, we propose several improvements in the development of a future, unified mixture model package.","PeriodicalId":51285,"journal":{"name":"R Journal","volume":"102 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135714326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}