Zero-inflated multivariate tobit regression modeling

IF 0.8 4区数学 Q3 STATISTICS & PROBABILITY

Journal of Statistical Planning and Inference Pub Date : 2024-09-03 DOI:10.1016/j.jspi.2024.106229

Becky Tang , Henry A. Frye , John A. Silander Jr. , Alan E. Gelfand

{"title":"Zero-inflated multivariate tobit regression modeling","authors":"Becky Tang , Henry A. Frye , John A. Silander Jr. , Alan E. Gelfand","doi":"10.1016/j.jspi.2024.106229","DOIUrl":null,"url":null,"abstract":"<div><p>A frequent challenge encountered in real-world applications is data having a high proportion of zeros. Focusing on ecological abundance data, much attention has been given to zero-inflated count data. Models for non-negative continuous abundance data with an excess of zeros are rarely discussed. Work presented here considers the creation of a point mass at zero through a left-censoring approach or through a hurdle approach. We incorporate both mechanisms to capture the analog of zero-inflation for count data. Additionally, primary attention has been given to univariate zero-inflated modeling (e.g., single species), whereas data often arise jointly (e.g., a collection of species). With multivariate abundance data, a key issue is to capture dependence among the species at a site, both in terms of positive abundance as well as absence. Therefore, our contribution is a model for multivariate zero-inflated continuous data that are non-negative. Working in a Bayesian framework, we discuss the issue of separating the two sources of zeros and offer model comparison metrics for multivariate zero-inflated data. In an application, we model the total biomass for five tree species obtained from plots established in the Forest Inventory Analysis database in the Northeast region of the United States.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"236 ","pages":"Article 106229"},"PeriodicalIF":0.8000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378375824000867","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

A frequent challenge encountered in real-world applications is data having a high proportion of zeros. Focusing on ecological abundance data, much attention has been given to zero-inflated count data. Models for non-negative continuous abundance data with an excess of zeros are rarely discussed. Work presented here considers the creation of a point mass at zero through a left-censoring approach or through a hurdle approach. We incorporate both mechanisms to capture the analog of zero-inflation for count data. Additionally, primary attention has been given to univariate zero-inflated modeling (e.g., single species), whereas data often arise jointly (e.g., a collection of species). With multivariate abundance data, a key issue is to capture dependence among the species at a site, both in terms of positive abundance as well as absence. Therefore, our contribution is a model for multivariate zero-inflated continuous data that are non-negative. Working in a Bayesian framework, we discuss the issue of separating the two sources of zeros and offer model comparison metrics for multivariate zero-inflated data. In an application, we model the total biomass for five tree species obtained from plots established in the Forest Inventory Analysis database in the Northeast region of the United States.

查看原文本刊更多论文

零膨胀多元托比特回归建模

实际应用中经常遇到的一个难题是数据中零的比例很高。以生态丰度数据为重点，零膨胀计数数据受到了广泛关注。而针对零过多的非负连续丰度数据的模型却鲜有讨论。本文介绍的工作考虑了通过左删减法或障碍法在零点处创建一个点质量。我们将这两种机制结合起来，以捕捉计数数据的零膨胀模拟。此外，人们主要关注的是单变量零膨胀建模（如单一物种），而数据往往是共同产生的（如物种集合）。对于多变量丰度数据，一个关键问题是捕捉一个地点物种之间的依赖性，包括正丰度和缺失。因此，我们的贡献是建立了一个非负的多变量零膨胀连续数据模型。在贝叶斯框架下，我们讨论了分离两个零源的问题，并提供了多元零膨胀数据的模型比较指标。在一个应用中，我们对从美国东北部地区森林资源清查分析数据库建立的地块中获得的五个树种的总生物量进行了建模。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Statistical Planning and Inference 数学-统计学与概率论

CiteScore

2.10

自引率

11.10%

发文量

审稿时长

3-6 weeks

期刊介绍： The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists. We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.