A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data.

IF 3 1区数学 Q1 STATISTICS & PROBABILITY

Journal of the American Statistical Association Pub Date : 2023-01-01 Epub Date: 2021-07-14 DOI:10.1080/01621459.2021.1933499

Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira

{"title":"A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data.","authors":"Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira","doi":"10.1080/01621459.2021.1933499","DOIUrl":null,"url":null,"abstract":"<p><p>The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"405-416"},"PeriodicalIF":3.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01621459.2021.1933499","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Statistical Association","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/01621459.2021.1933499","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/7/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 16

Abstract

The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.

查看原文本刊更多论文

用于嵌套数据贝叶斯非参数分析的通用原子模型。

利用大型数据集进行有针对性的治疗干预需要新的方法来描述在特定人群的不同亚群中观察到的异质性。在嵌套数据集的推断中，尤其需要部分可交换数据的模型，在嵌套数据集中，观测数据被假定为不同的单元，需要共享一些信息来了解单元的独特特征。在本手稿中，我们提出了一种嵌套共原子模型（CAM），它特别适用于嵌套数据集的分析，在嵌套数据集中，各单元的分布预计只在每个单元采样的一小部分观测值上存在差异。所提出的 CAM 允许在分布和观测水平上进行双层聚类，并可通过使用计算效率高的嵌套切片采样器算法进行可扩展的后验推断。我们进一步讨论了如何扩展所提出的建模框架以处理离散测量，并对饮食交换研究中的真实微生物组数据集进行了后验推断，以研究肠道微生物群组成的改变如何与不同的饮食习惯相关联。我们还通过模拟研究进一步考察了我们的模型在捕捉人群真实分布结构方面的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Statistical Association 数学-统计学与概率论

CiteScore

7.50

自引率

8.10%

发文量

168

审稿时长

12 months

期刊介绍： Established in 1888 and published quarterly in March, June, September, and December, the Journal of the American Statistical Association ( JASA ) has long been considered the premier journal of statistical science. Articles focus on statistical applications, theory, and methods in economic, social, physical, engineering, and health sciences. Important books contributing to statistical advancement are reviewed in JASA . JASA is indexed in Current Index to Statistics and MathSci Online and reviewed in Mathematical Reviews. JASA is abstracted by Access Company and is indexed and abstracted in the SRM Database of Social Research Methodology.