{"title":"Automatic Virtual 3D City Generation for Synthetic Data Collection","authors":"Bingyu Shen, Boyang Li, W. Scheirer","doi":"10.1109/WACVW52041.2021.00022","DOIUrl":null,"url":null,"abstract":"Computer vision has achieved superior results with the rapid development of new techniques in deep neural networks. Object detection in the wild is a core task in computer vision, and already has many successful applications in the real world. However, deep neural networks for object detection usually consist of hundreds, and sometimes even thousands, of layers. Training such networks is challenging, and training data has a fundamental impact on model performance. Because data collection and annotation are expensive and labor-intensive, lots of data augmentation methods have been proposed to generate synthetic data for neural network training. Most of those methods focus on manipulating 2D images. In contrast to that, in this paper, we leverage the realistic visual effects of 3D environments and propose a new way of generating synthetic data for computer vision tasks related to city scenes. Specifically, we describe a pipeline that can generate a 3D city model from an input of a 2D image that portrays the layout design of a city. This pipeline also takes optional parameters to further customize the output 3D city model. Using our pipeline, a virtual 3D city model with high-quality textures can be generated within seconds, and the output is an object ready to render. The model generated will assist people with limited 3D development knowledge to create high quality city scenes for different needs. As examples, we show the use of generated 3D city models as the synthetic data source for a scene text detection task and a traffic sign detection task. Both qualitative and quantitative results show that the generated virtual city is a good match to real-world data and potentially can benefit other computer vision tasks with similar contexts.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACVW52041.2021.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Computer vision has achieved superior results with the rapid development of new techniques in deep neural networks. Object detection in the wild is a core task in computer vision, and already has many successful applications in the real world. However, deep neural networks for object detection usually consist of hundreds, and sometimes even thousands, of layers. Training such networks is challenging, and training data has a fundamental impact on model performance. Because data collection and annotation are expensive and labor-intensive, lots of data augmentation methods have been proposed to generate synthetic data for neural network training. Most of those methods focus on manipulating 2D images. In contrast to that, in this paper, we leverage the realistic visual effects of 3D environments and propose a new way of generating synthetic data for computer vision tasks related to city scenes. Specifically, we describe a pipeline that can generate a 3D city model from an input of a 2D image that portrays the layout design of a city. This pipeline also takes optional parameters to further customize the output 3D city model. Using our pipeline, a virtual 3D city model with high-quality textures can be generated within seconds, and the output is an object ready to render. The model generated will assist people with limited 3D development knowledge to create high quality city scenes for different needs. As examples, we show the use of generated 3D city models as the synthetic data source for a scene text detection task and a traffic sign detection task. Both qualitative and quantitative results show that the generated virtual city is a good match to real-world data and potentially can benefit other computer vision tasks with similar contexts.