{"title":"微生物组数据分析的dirichlet -多项式混合回归模型。","authors":"Roberto Ascari, Sonia Migliorati, Andrea Ongaro","doi":"10.1002/sim.70220","DOIUrl":null,"url":null,"abstract":"<p><p>Motivated by the challenges in analyzing gut microbiome and metagenomic data, this paper introduces a novel mixture distribution for multivariate counts and a regression model built upon it. The flexibility and interpretability of the proposed distribution accommodate both negative and positive dependence among taxa and are accompanied by numerous theoretical properties, including explicit expressions for inter- and intraclass correlations, thereby providing a powerful tool for understanding complex microbiome interactions. Furthermore, the regression model based on this distribution facilitates the clear identification and interpretation of relationships between taxa and covariates by modeling the marginal mean of the multivariate response (i.e., taxa counts). Inference is performed using a tailored Hamiltonian Monte Carlo estimation method combined with a spike-and-slab variable selection procedure. Extensive simulation studies and an application to a human gut microbiome dataset highlight the proposed model's substantial improvements over competing models in terms of fit, interpretability, and predictive performance.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":"44 18-19","pages":"e70220"},"PeriodicalIF":1.8000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12330782/pdf/","citationCount":"0","resultStr":"{\"title\":\"A New Dirichlet-Multinomial Mixture Regression Model for the Analysis of Microbiome Data.\",\"authors\":\"Roberto Ascari, Sonia Migliorati, Andrea Ongaro\",\"doi\":\"10.1002/sim.70220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Motivated by the challenges in analyzing gut microbiome and metagenomic data, this paper introduces a novel mixture distribution for multivariate counts and a regression model built upon it. The flexibility and interpretability of the proposed distribution accommodate both negative and positive dependence among taxa and are accompanied by numerous theoretical properties, including explicit expressions for inter- and intraclass correlations, thereby providing a powerful tool for understanding complex microbiome interactions. Furthermore, the regression model based on this distribution facilitates the clear identification and interpretation of relationships between taxa and covariates by modeling the marginal mean of the multivariate response (i.e., taxa counts). Inference is performed using a tailored Hamiltonian Monte Carlo estimation method combined with a spike-and-slab variable selection procedure. Extensive simulation studies and an application to a human gut microbiome dataset highlight the proposed model's substantial improvements over competing models in terms of fit, interpretability, and predictive performance.</p>\",\"PeriodicalId\":21879,\"journal\":{\"name\":\"Statistics in Medicine\",\"volume\":\"44 18-19\",\"pages\":\"e70220\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12330782/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics in Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/sim.70220\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/sim.70220","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
A New Dirichlet-Multinomial Mixture Regression Model for the Analysis of Microbiome Data.
Motivated by the challenges in analyzing gut microbiome and metagenomic data, this paper introduces a novel mixture distribution for multivariate counts and a regression model built upon it. The flexibility and interpretability of the proposed distribution accommodate both negative and positive dependence among taxa and are accompanied by numerous theoretical properties, including explicit expressions for inter- and intraclass correlations, thereby providing a powerful tool for understanding complex microbiome interactions. Furthermore, the regression model based on this distribution facilitates the clear identification and interpretation of relationships between taxa and covariates by modeling the marginal mean of the multivariate response (i.e., taxa counts). Inference is performed using a tailored Hamiltonian Monte Carlo estimation method combined with a spike-and-slab variable selection procedure. Extensive simulation studies and an application to a human gut microbiome dataset highlight the proposed model's substantial improvements over competing models in terms of fit, interpretability, and predictive performance.
期刊介绍:
The journal aims to influence practice in medicine and its associated sciences through the publication of papers on statistical and other quantitative methods. Papers will explain new methods and demonstrate their application, preferably through a substantive, real, motivating example or a comprehensive evaluation based on an illustrative example. Alternatively, papers will report on case-studies where creative use or technical generalizations of established methodology is directed towards a substantive application. Reviews of, and tutorials on, general topics relevant to the application of statistics to medicine will also be published. The main criteria for publication are appropriateness of the statistical methods to a particular medical problem and clarity of exposition. Papers with primarily mathematical content will be excluded. The journal aims to enhance communication between statisticians, clinicians and medical researchers.