Chao Shang, Aaron Palmer, Jiangwen Sun, Ko-Shin Chen, Jin Lu, Jinbo Bi
{"title":"VIGAN:利用生成对抗网络进行缺失视图推算。","authors":"Chao Shang, Aaron Palmer, Jiangwen Sun, Ko-Shin Chen, Jin Lu, Jinbo Bi","doi":"10.1109/BigData.2017.8257992","DOIUrl":null,"url":null,"abstract":"<p><p>In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2017 ","pages":"766-775"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813842/pdf/nihms918595.pdf","citationCount":"0","resultStr":"{\"title\":\"VIGAN: Missing View Imputation with Generative Adversarial Networks.\",\"authors\":\"Chao Shang, Aaron Palmer, Jiangwen Sun, Ko-Shin Chen, Jin Lu, Jinbo Bi\",\"doi\":\"10.1109/BigData.2017.8257992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.</p>\",\"PeriodicalId\":74501,\"journal\":{\"name\":\"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data\",\"volume\":\"2017 \",\"pages\":\"766-775\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813842/pdf/nihms918595.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BigData.2017.8257992\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2018/1/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigData.2017.8257992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2018/1/15 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在大数据成为常态的时代,人们不再关注数据的数量,而是更加关注数据的质量和完整性。在许多学科中,数据都是从异构来源收集的,从而产生了多视角或多模态数据集。在多视角数据分析中,数据缺失问题一直是一个难以解决的难题。特别是当某些样本遗漏了整个数据视图时,就会产生视图缺失问题。当无法根据特定视图中的信息对这些样本进行数据推算时,经典的多重推算或矩阵补全方法就很难奏效。常用的简单方法是剔除缺失视图的样本,但这种方法会大大减少样本量,从而降低后续分析的统计能力。在本文中,我们提出了一种通过生成式对抗网络(GAN)进行视图估算的新方法,并将其命名为 VIGAN。这种方法首先将每个视图视为一个单独的域,并通过 GAN 使用每个视图的随机抽样数据来识别域到域的映射,然后采用多模态去噪自编码器 (DAE) 根据视图间的配对数据从 GAN 输出中重建缺失的视图。然后,通过联合优化 GAN 和 DAE,我们的模型实现了领域映射和视图对应的知识整合,从而有效地恢复缺失的视图。基准数据集上的实证结果通过与最新技术的比较验证了 VIGAN 方法。在药物使用障碍的遗传研究中对 VIGAN 的评估进一步证明了这种方法在生命科学领域的有效性和可用性。
VIGAN: Missing View Imputation with Generative Adversarial Networks.
In an era when big data are becoming the norm, there is less concern with the quantity but more with the quality and completeness of the data. In many disciplines, data are collected from heterogeneous sources, resulting in multi-view or multi-modal datasets. The missing data problem has been challenging to address in multi-view data analysis. Especially, when certain samples miss an entire view of data, it creates the missing view problem. Classic multiple imputations or matrix completion methods are hardly effective here when no information can be based on in the specific view to impute data for such samples. The commonly-used simple method of removing samples with a missing view can dramatically reduce sample size, thus diminishing the statistical power of a subsequent analysis. In this paper, we propose a novel approach for view imputation via generative adversarial networks (GANs), which we name by VIGAN. This approach first treats each view as a separate domain and identifies domain-to-domain mappings via a GAN using randomly-sampled data from each view, and then employs a multi-modal denoising autoencoder (DAE) to reconstruct the missing view from the GAN outputs based on paired data across the views. Then, by optimizing the GAN and DAE jointly, our model enables the knowledge integration for domain mappings and view correspondences to effectively recover the missing view. Empirical results on benchmark datasets validate the VIGAN approach by comparing against the state of the art. The evaluation of VIGAN in a genetic study of substance use disorders further proves the effectiveness and usability of this approach in life science.