{"title":"Joint height estimation and semantic labeling of monocular aerial images with CNNS","authors":"Shivangi Srivastava, M. Volpi, D. Tuia","doi":"10.1109/IGARSS.2017.8128167","DOIUrl":null,"url":null,"abstract":"We aim to jointly estimate height and semantically label monocular aerial images. These two tasks are traditionally addressed separately in remote sensing, despite their strong correlation. Therefore, a model learning both height and classes jointly seems advantageous and so, we propose a multitask Convolutional Neural Network (CNN) architecture with two losses: one performing semantic labeling, and another predicting normalized Digital Surface Model (nDSM) from the pixel values. Since the nDSM/height information is used only in the second loss, there is no need to have a nDSM map at test time, and the model can estimate height automatically on new images. We test our proposed method on a set of sub-decimeter resolution images and show that our model equals the performances of two separate models, but at the cost of a single one.","PeriodicalId":6466,"journal":{"name":"2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)","volume":"47 1","pages":"5173-5176"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGARSS.2017.8128167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 59
Abstract
We aim to jointly estimate height and semantically label monocular aerial images. These two tasks are traditionally addressed separately in remote sensing, despite their strong correlation. Therefore, a model learning both height and classes jointly seems advantageous and so, we propose a multitask Convolutional Neural Network (CNN) architecture with two losses: one performing semantic labeling, and another predicting normalized Digital Surface Model (nDSM) from the pixel values. Since the nDSM/height information is used only in the second loss, there is no need to have a nDSM map at test time, and the model can estimate height automatically on new images. We test our proposed method on a set of sub-decimeter resolution images and show that our model equals the performances of two separate models, but at the cost of a single one.