{"title":"Exact Posterior Mean and Covariance for Generalized Linear Mixed Models","authors":"Tonglin Zhang","doi":"arxiv-2409.09310","DOIUrl":"https://doi.org/arxiv-2409.09310","url":null,"abstract":"A novel method is proposed for the exact posterior mean and covariance of the\u0000random effects given the response in a generalized linear mixed model (GLMM)\u0000when the response does not follow normal. The research solves a long-standing\u0000problem in Bayesian statistics when an intractable integral appears in the\u0000posterior distribution. It is well-known that the posterior distribution of the\u0000random effects given the response in a GLMM when the response does not follow\u0000normal contains intractable integrals. Previous methods rely on Monte Carlo\u0000simulations for the posterior distributions. They do not provide the exact\u0000posterior mean and covariance of the random effects given the response. The\u0000special integral computation (SIC) method is proposed to overcome the\u0000difficulty. The SIC method does not use the posterior distribution in the\u0000computation. It devises an optimization problem to reach the task. An advantage\u0000is that the computation of the posterior distribution is unnecessary. The\u0000proposed SIC avoids the main difficulty in Bayesian analysis when intractable\u0000integrals appear in the posterior distribution.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint spatial modeling of mean and non-homogeneous variance combining semiparametric SAR and GAMLSS models for hedonic prices","authors":"J. D. Toloza-Delgado, O. O. Melo, N. A. Cruz","doi":"arxiv-2409.08912","DOIUrl":"https://doi.org/arxiv-2409.08912","url":null,"abstract":"In the context of spatial econometrics, it is very useful to have\u0000methodologies that allow modeling the spatial dependence of the observed\u0000variables and obtaining more precise predictions of both the mean and the\u0000variability of the response variable, something very useful in territorial\u0000planning and public policies. This paper proposes a new methodology that\u0000jointly models the mean and the variance. Also, it allows to model the spatial\u0000dependence of the dependent variable as a function of covariates and to model\u0000the semiparametric effects in both models. The algorithms developed are based\u0000on generalized additive models that allow the inclusion of non-parametric terms\u0000in both the mean and the variance, maintaining the traditional theoretical\u0000framework of spatial regression. The theoretical developments of the estimation\u0000of this model are carried out, obtaining desirable statistical properties in\u0000the estimators. A simulation study is developed to verify that the proposed\u0000method has a remarkable predictive capacity in terms of the mean square error\u0000and shows a notable improvement in the estimation of the spatial autoregressive\u0000parameter, compared to other traditional methods and some recent developments.\u0000The model is also tested on data from the construction of a hedonic price model\u0000for the city of Bogota, highlighting as the main result the ability to model\u0000the variability of housing prices, and the wealth in the analysis obtained.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roman HornungInstitute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, GermanyMunich Center for Machine Learning, Alexander HapfelmeierInstitute of AI and Informatics in Medicine, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany
{"title":"Multi forests: Variable importance for multi-class outcomes","authors":"Roman HornungInstitute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Munich, GermanyMunich Center for Machine Learning, Alexander HapfelmeierInstitute of AI and Informatics in Medicine, TUM School of Medicine and Health, Technical University of Munich, Munich, Germany","doi":"arxiv-2409.08925","DOIUrl":"https://doi.org/arxiv-2409.08925","url":null,"abstract":"In prediction tasks with multi-class outcomes, identifying covariates\u0000specifically associated with one or more outcome classes can be important.\u0000Conventional variable importance measures (VIMs) from random forests (RFs),\u0000like permutation and Gini importance, focus on overall predictive performance\u0000or node purity, without differentiating between the classes. Therefore, they\u0000can be expected to fail to distinguish class-associated covariates from\u0000covariates that only distinguish between groups of classes. We introduce a VIM\u0000called multi-class VIM, tailored for identifying exclusively class-associated\u0000covariates, via a novel RF variant called multi forests (MuFs). The trees in\u0000MuFs use both multi-way and binary splitting. The multi-way splits generate\u0000child nodes for each class, using a split criterion that evaluates how well\u0000these nodes represent their respective classes. This setup forms the basis of\u0000the multi-class VIM, which measures the discriminatory ability of the splits\u0000performed in the respective covariates with regard to this split criterion.\u0000Alongside the multi-class VIM, we introduce a second VIM, the discriminatory\u0000VIM. This measure, based on the binary splits, assesses the strength of the\u0000general influence of the covariates, irrespective of their\u0000class-associatedness. Simulation studies demonstrate that the multi-class VIM\u0000specifically ranks class-associated covariates highly, unlike conventional VIMs\u0000which also rank other types of covariates highly. Analyses of 121 datasets\u0000reveal that MuFs often have slightly lower predictive performance compared to\u0000conventional RFs. This is, however, not a limiting factor given the algorithm's\u0000primary purpose of calculating the multi-class VIM.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tracing the impacts of Mount Pinatubo eruption on global climate using spatially-varying changepoint detection","authors":"Samantha Shi-Jun, Lyndsay Shand, Bo Li","doi":"arxiv-2409.08908","DOIUrl":"https://doi.org/arxiv-2409.08908","url":null,"abstract":"Significant events such as volcanic eruptions can have global and long\u0000lasting impacts on climate. These global impacts, however, are not uniform\u0000across space and time. Understanding how the Mt. Pinatubo eruption affects\u0000global and regional climate is of great interest for predicting impact on\u0000climate due to similar events. We propose a Bayesian framework to\u0000simultaneously detect and estimate spatially-varying temporal changepoints for\u0000regional climate impacts. Our approach takes into account the diffusing nature\u0000of the changes caused by the volcanic eruption and leverages spatial\u0000correlation. We illustrate our method on simulated datasets and compare it with\u0000an existing changepoint detection method. Finally, we apply our method on\u0000monthly stratospheric aerosol optical depth and surface temperature data from\u00001985 to 1995 to detect and estimate changepoints following the 1991 Mt.\u0000Pinatubo eruption.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Bubel, Jochen Schmid, Maximilian Carmesin, Volodymyr Kozachynskyi, Erik Esche, Michael Bortz
{"title":"Cubature-based uncertainty estimation for nonlinear regression models","authors":"Martin Bubel, Jochen Schmid, Maximilian Carmesin, Volodymyr Kozachynskyi, Erik Esche, Michael Bortz","doi":"arxiv-2409.08756","DOIUrl":"https://doi.org/arxiv-2409.08756","url":null,"abstract":"Calibrating model parameters to measured data by minimizing loss functions is\u0000an important step in obtaining realistic predictions from model-based\u0000approaches, e.g., for process optimization. This is applicable to both\u0000knowledge-driven and data-driven model setups. Due to measurement errors, the\u0000calibrated model parameters also carry uncertainty. In this contribution, we\u0000use cubature formulas based on sparse grids to calculate the variance of the\u0000regression results. The number of cubature points is close to the theoretical\u0000minimum required for a given level of exactness. We present exact benchmark\u0000results, which we also compare to other cubatures. This scheme is then applied\u0000to estimate the prediction uncertainty of the NRTL model, calibrated to\u0000observations from different experimental designs.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Angular Co-variance using intrinsic geometry of torus: Non-parametric change points detection in meteorological data","authors":"Surojit Biswas, Buddhananda Banerjee, Arnab Kumar Laha","doi":"arxiv-2409.08838","DOIUrl":"https://doi.org/arxiv-2409.08838","url":null,"abstract":"In many temporal datasets, the parameters of the underlying distribution may\u0000change abruptly at unknown times. Detecting these changepoints is crucial for\u0000numerous applications. While this problem has been extensively studied for\u0000linear data, there has been remarkably less research on bivariate angular data.\u0000For the first time, we address the changepoint problem for the mean direction\u0000of toroidal and spherical data, which are types of bivariate angular data. By\u0000leveraging the intrinsic geometry of a curved torus, we introduce the concept\u0000of the ``square'' of an angle. This leads us to define the ``curved dispersion\u0000matrix'' for bivariate angular random variables, analogous to the dispersion\u0000matrix for bivariate linear random variables. Using this analogous measure of\u0000the ``Mahalanobis distance,'' we develop two new non-parametric tests to\u0000identify changes in the mean direction parameters for toroidal and spherical\u0000distributions. We derive the limiting distributions of the test statistics and\u0000evaluate their power surface and contours through extensive simulations. We\u0000also apply the proposed methods to detect changes in mean direction for hourly\u0000wind-wave direction measurements and the path of the cyclonic storm\u0000``Biporjoy,'' which occurred between 6th and 19th June 2023 over the Arabian\u0000Sea, western coast of India.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification of distributions for risks based on the first moment and c-statistic","authors":"Mohsen Sadatsafavi, Tae Yoon Lee, John Petkau","doi":"arxiv-2409.09178","DOIUrl":"https://doi.org/arxiv-2409.09178","url":null,"abstract":"We show that for any family of distributions with support on [0,1] with\u0000strictly monotonic cumulative distribution function (CDF) that has no jumps and\u0000is quantile-identifiable (i.e., any two distinct quantiles identify the\u0000distribution), knowing the first moment and c-statistic is enough to identify\u0000the distribution. The derivations motivate numerical algorithms for mapping a\u0000given pair of expected value and c-statistic to the parameters of specified\u0000two-parameter distributions for probabilities. We implemented these algorithms\u0000in R and in a simulation study evaluated their numerical accuracy for common\u0000families of distributions for risks (beta, logit-normal, and probit-normal). An\u0000area of application for these developments is in risk prediction modeling\u0000(e.g., sample size calculations and Value of Information analysis), where one\u0000might need to estimate the parameters of the distribution of predicted risks\u0000from the reported summary statistics.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Kley, Yuhan Philip Liu, Hongyuan Cao, Wei Biao Wu
{"title":"Change point analysis with irregular signals","authors":"Tobias Kley, Yuhan Philip Liu, Hongyuan Cao, Wei Biao Wu","doi":"arxiv-2409.08863","DOIUrl":"https://doi.org/arxiv-2409.08863","url":null,"abstract":"This paper considers the problem of testing and estimation of change point\u0000where signals after the change point can be highly irregular, which departs\u0000from the existing literature that assumes signals after the change point to be\u0000piece-wise constant or vary smoothly. A two-step approach is proposed to\u0000effectively estimate the location of the change point. The first step consists\u0000of a preliminary estimation of the change point that allows us to obtain\u0000unknown parameters for the second step. In the second step we use a new\u0000procedure to determine the position of the change point. We show that, under\u0000suitable conditions, the desirable $mathcal{O}_P(1)$ rate of convergence of\u0000the estimated change point can be obtained. We apply our method to analyze the\u0000Baidu search index of COVID-19 related symptoms and find 8~December 2019 to be\u0000the starting date of the COVID-19 pandemic.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kendrick Li, George C. Linderman, Xu Shi, Eric J. Tchetgen Tchetgen
{"title":"Regression-based proximal causal inference for right-censored time-to-event data","authors":"Kendrick Li, George C. Linderman, Xu Shi, Eric J. Tchetgen Tchetgen","doi":"arxiv-2409.08924","DOIUrl":"https://doi.org/arxiv-2409.08924","url":null,"abstract":"Unmeasured confounding is one of the major concerns in causal inference from\u0000observational data. Proximal causal inference (PCI) is an emerging\u0000methodological framework to detect and potentially account for confounding bias\u0000by carefully leveraging a pair of negative control exposure (NCE) and outcome\u0000(NCO) variables, also known as treatment and outcome confounding proxies.\u0000Although regression-based PCI is well developed for binary and continuous\u0000outcomes, analogous PCI regression methods for right-censored time-to-event\u0000outcomes are currently lacking. In this paper, we propose a novel two-stage\u0000regression PCI approach for right-censored survival data under an additive\u0000hazard structural model. We provide theoretical justification for the proposed\u0000approach tailored to different types of NCOs, including continuous, count, and\u0000right-censored time-to-event variables. We illustrate the approach with an\u0000evaluation of the effectiveness of right heart catheterization among critically\u0000ill patients using data from the SUPPORT study. Our method is implemented in\u0000the open-access R package 'pci2s'.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paola Vesco, David Randahl, Håvard Hegre, Stina Högbladh, Mert Can Yilmaz
{"title":"The underreported death toll of wars: a probabilistic reassessment from a structured expert elicitation","authors":"Paola Vesco, David Randahl, Håvard Hegre, Stina Högbladh, Mert Can Yilmaz","doi":"arxiv-2409.08779","DOIUrl":"https://doi.org/arxiv-2409.08779","url":null,"abstract":"Event datasets including those provided by Uppsala Conflict Data Program\u0000(UCDP) are based on reports from the media and international organizations, and\u0000are likely to suffer from reporting bias. Since the UCDP has strict inclusion\u0000criteria, they most likely under-estimate conflict-related deaths, but we do\u0000not know by how much. Here, we provide a generalizable, cross-national measure\u0000of uncertainty around UCDP reported fatalities that is more robust and\u0000realistic than UCDP's documented low and high estimates, and make available a\u0000dataset and R package accounting for the measurement uncertainty. We use a\u0000structured expert elicitation combined with statistical modelling to derive a\u0000distribution of plausible number of fatalities given the number of\u0000battle-related deaths and the type of violence documented by the UCDP. The\u0000results can help scholars understand the extent of bias affecting their\u0000empirical analyses of organized violence and contribute to improve the accuracy\u0000of conflict forecasting systems.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}