{"title":"VAE-Info-cGAN:通过结合像素级和特征级地理空间条件输入生成合成图像","authors":"Xuerong Xiao, Swetava Ganguli, Vipul Pandey","doi":"10.1145/3423457.3429361","DOIUrl":null,"url":null,"abstract":"Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector, a, in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. During generation, a is sampled from U[0, 1], while it is learned directly from the ground truth during training. An interpretation of a to systematically generate synthetic images by varying a chosen binary macroscopic feature is explored by training a linear binary classifier in the latent space. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatio-temporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.","PeriodicalId":129055,"journal":{"name":"Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science","volume":"218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs\",\"authors\":\"Xuerong Xiao, Swetava Ganguli, Vipul Pandey\",\"doi\":\"10.1145/3423457.3429361\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector, a, in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. During generation, a is sampled from U[0, 1], while it is learned directly from the ground truth during training. An interpretation of a to systematically generate synthetic images by varying a chosen binary macroscopic feature is explored by training a linear binary classifier in the latent space. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatio-temporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.\",\"PeriodicalId\":129055,\"journal\":{\"name\":\"Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science\",\"volume\":\"218 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3423457.3429361\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3423457.3429361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
摘要
由于缺乏类别平衡和多样化的训练数据,训练鲁棒监督深度学习模型对于许多计算机视觉的地理空间应用是困难的。相反,为许多应用程序获得足够的训练数据在财务上是禁止的,或者可能是不可行的,特别是当应用程序涉及到罕见或极端事件的建模时。使用生成模型综合生成数据(和标签),该模型可以从目标分布中采样并利用图像的多尺度特性,这是解决标记数据稀缺性的廉价解决方案。为了实现这一目标,我们提出了一种深度条件生成模型,称为VAE- info - cgan,它将变分自编码器(VAE)与条件信息最大化生成对抗网络(InfoGAN)相结合,用于合成同时受像素级条件(PLC)和宏观特征级条件(FLC)约束的语义丰富的图像。在尺寸上,PLC只能在合成图像的通道尺寸上变化,并且意味着是特定于任务的输入。在生成的图像的潜在空间中,FLC被建模为一个属性向量a,它控制着与目标分布相关的各种特征属性的贡献。在生成过程中,a从U[0,1]中采样,而在训练过程中直接从ground truth中学习。通过在潜在空间中训练线性二元分类器,探索了通过改变选择的二元宏观特征来系统地生成合成图像的解释。在GPS轨迹数据集上的实验表明,该模型可以准确地在不同地理位置生成各种形式的时空聚合,而仅以路网的栅格表示为条件。VAE-Info-cGAN的主要预期应用是合成数据(和标签)生成,用于目标数据增强,用于与地理空间分析和遥感相关的基于计算机视觉的问题建模。
VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs
Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector, a, in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. During generation, a is sampled from U[0, 1], while it is learned directly from the ground truth during training. An interpretation of a to systematically generate synthetic images by varying a chosen binary macroscopic feature is explored by training a linear binary classifier in the latent space. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatio-temporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.