纵向数据表示学习的动态高斯过程潜变量模型

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference Pub Date : 2020-10-19 DOI:10.1145/3412815.3416894

Thanh Le, Vasant G Honavar

{"title":"纵向数据表示学习的动态高斯过程潜变量模型","authors":"Thanh Le, Vasant G Honavar","doi":"10.1145/3412815.3416894","DOIUrl":null,"url":null,"abstract":"Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (L-GPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. L-GPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of L-GPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradient-based optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that L-GPLVM substantially and consistently outperforms the state-of-the-art alternatives in recovering the missing observations even when the available data exhibits a high degree of sparsity. The compact representations of irregularly sampled and sparse longitudinal data can be used to perform a variety of machine learning tasks, including clustering, classification, and regression.","PeriodicalId":176130,"journal":{"name":"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data\",\"authors\":\"Thanh Le, Vasant G Honavar\",\"doi\":\"10.1145/3412815.3416894\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (L-GPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. L-GPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of L-GPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradient-based optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that L-GPLVM substantially and consistently outperforms the state-of-the-art alternatives in recovering the missing observations even when the available data exhibits a high degree of sparsity. The compact representations of irregularly sampled and sparse longitudinal data can be used to perform a variety of machine learning tasks, including clustering, classification, and regression.\",\"PeriodicalId\":176130,\"journal\":{\"name\":\"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3412815.3416894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3412815.3416894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

许多现实世界的应用涉及纵向数据，包括对几个变量的观察，其中变量的不同子集在不规则间隔的时间点采样。我们引入了纵向高斯过程潜在变量模型(L-GPLVM)，这是高斯过程潜在变量模型的一个变体，用于学习这些数据的紧凑表示。L-GPLVM克服了动态高斯过程潜变量模型及其变体的一个关键限制，该模型依赖于在所有采样时间点上完全观察到数据的假设。我们描述了一种从稀疏观测中学习L-GPLVM参数的有效方法，通过将动态模型与多任务高斯过程模型耦合起来，在基于梯度的变分下界优化的每一步对缺失观测进行采样。我们进一步展示了稀疏过程卷积框架在学习稀疏和不规则采样纵向数据的潜在表示方面的优势，相对于标准潜变量模型，计算开销最小。我们展示了合成数据的实验，以及具有不同程度稀疏观测值的MOCAP数据的变体，这些实验表明，即使在可用数据显示出高度稀疏的情况下，L-GPLVM在恢复缺失观测值方面也始终优于最先进的替代方案。不规则采样和稀疏纵向数据的紧凑表示可用于执行各种机器学习任务，包括聚类、分类和回归。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data

Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (L-GPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. L-GPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of L-GPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradient-based optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that L-GPLVM substantially and consistently outperforms the state-of-the-art alternatives in recovering the missing observations even when the available data exhibits a high degree of sparsity. The compact representations of irregularly sampled and sparse longitudinal data can be used to perform a variety of machine learning tasks, including clustering, classification, and regression.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

自引率

0.00%

发文量