High-dimensional covariate-augmented overdispersed poisson factor model.

IF 1.4 4区 数学 Q3 BIOLOGY
Biometrics Pub Date : 2024-03-27 DOI:10.1093/biomtc/ujae031
Wei Liu, Qingzhi Zhong
{"title":"High-dimensional covariate-augmented overdispersed poisson factor model.","authors":"Wei Liu, Qingzhi Zhong","doi":"10.1093/biomtc/ujae031","DOIUrl":null,"url":null,"abstract":"<p><p>The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomtc/ujae031","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The current Poisson factor models often assume that the factors are unknown, which overlooks the explanatory potential of certain observable covariates. This study focuses on high dimensional settings, where the number of the count response variables and/or covariates can diverge as the sample size increases. A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. A group of identifiability conditions is provided to theoretically guarantee computational identifiability. We incorporate the interdependence of both response variables and covariates by imposing a low-rank constraint on the large coefficient matrix. To address the computation challenges posed by nonlinearity, two high-dimensional latent matrices, and the low-rank constraint, we propose a novel variational estimation scheme that combines Laplace and Taylor approximations. We also develop a criterion based on a singular value ratio to determine the number of factors and the rank of the coefficient matrix. Comprehensive simulation studies demonstrate that the proposed method outperforms the state-of-the-art methods in estimation accuracy and computational efficiency. The practical merit of our method is demonstrated by an application to the CITE-seq dataset. A flexible implementation of our proposed method is available in the R package COAP.

高维协变量增强过分散泊松因子模型。
目前的泊松因子模型通常假设因子是未知的,这就忽略了某些可观测协变量的解释潜力。本研究侧重于高维环境,在这种环境中,计数响应变量和/或协变量的数量会随着样本量的增加而分散。本研究提出了一种协变量增强的过度分散泊松因子模型,以联合执行高维泊松因子分析,并估计过度分散计数数据的大系数矩阵。我们提供了一组可识别性条件,从理论上保证了计算的可识别性。我们通过对大系数矩阵施加低秩约束,将响应变量和协变量的相互依赖性纳入其中。为了解决非线性、两个高维潜矩阵和低阶约束带来的计算挑战,我们提出了一种结合拉普拉斯和泰勒近似的新型变分估计方案。我们还开发了一种基于奇异值比率的标准,用于确定因子的数量和系数矩阵的秩。综合模拟研究表明,所提出的方法在估计精度和计算效率方面都优于最先进的方法。我们的方法在 CITE-seq 数据集上的应用证明了它的实用价值。我们提出的方法可在 R 软件包 COAP 中灵活实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biometrics
Biometrics 生物-生物学
CiteScore
2.70
自引率
5.30%
发文量
178
审稿时长
4-8 weeks
期刊介绍: The International Biometric Society is an international society promoting the development and application of statistical and mathematical theory and methods in the biosciences, including agriculture, biomedical science and public health, ecology, environmental sciences, forestry, and allied disciplines. The Society welcomes as members statisticians, mathematicians, biological scientists, and others devoted to interdisciplinary efforts in advancing the collection and interpretation of information in the biosciences. The Society sponsors the biennial International Biometric Conference, held in sites throughout the world; through its National Groups and Regions, it also Society sponsors regional and local meetings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信