{"title":"通过贝叶斯张量分解进行多向重叠聚类","authors":"Zhuofan Wang, Fangting Zhou, Kejun He, Yang Ni","doi":"10.4310/23-sii790","DOIUrl":null,"url":null,"abstract":"The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, available at $\\href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"12 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-way overlapping clustering by Bayesian tensor decomposition\",\"authors\":\"Zhuofan Wang, Fangting Zhou, Kejun He, Yang Ni\",\"doi\":\"10.4310/23-sii790\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, available at $\\\\href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}.\",\"PeriodicalId\":51230,\"journal\":{\"name\":\"Statistics and Its Interface\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistics and Its Interface\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.4310/23-sii790\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics and Its Interface","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.4310/23-sii790","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Multi-way overlapping clustering by Bayesian tensor decomposition
The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, available at $\href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}.
期刊介绍:
Exploring the interface between the field of statistics and other disciplines, including but not limited to: biomedical sciences, geosciences, computer sciences, engineering, and social and behavioral sciences. Publishes high-quality articles in broad areas of statistical science, emphasizing substantive problems, sound statistical models and methods, clear and efficient computational algorithms, and insightful discussions of the motivating problems.