R. Schnürer, A. Cengiz Öztireli, M. Heitzler, R. Sieber, L. Hurni
{"title":"图形地图中人物的实例分割、身体部位解析与姿态估计","authors":"R. Schnürer, A. Cengiz Öztireli, M. Heitzler, R. Sieber, L. Hurni","doi":"10.1080/23729333.2021.1949087","DOIUrl":null,"url":null,"abstract":"ABSTRACT In recent years, convolutional neural networks (CNNs) have been applied successfully to recognise persons, their body parts and pose keypoints in photos and videos. The transfer of these techniques to artificially created images is rather unexplored, though challenging since these images are drawn in different styles, body proportions, and levels of abstraction. In this work, we study these problems on the basis of pictorial maps where we identify included human figures with two consecutive CNNs: We first segment individual figures with Mask R-CNN, and then parse their body parts and estimate their poses simultaneously with four different UNet++ versions. We train the CNNs with a mixture of real persons and synthetic figures and compare the results with manually annotated test datasets consisting of pictorial figures. By varying the training datasets and the CNN configurations, we were able to improve the original Mask R-CNN model and we achieved moderately satisfying results with the UNet++ versions. The extracted figures may be used for animation and storytelling and may be relevant for the analysis of historic and contemporary maps.","PeriodicalId":36401,"journal":{"name":"International Journal of Cartography","volume":"111 3S 1","pages":"291 - 307"},"PeriodicalIF":0.4000,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Instance Segmentation, Body Part Parsing, and Pose Estimation of Human Figures in Pictorial Maps\",\"authors\":\"R. Schnürer, A. Cengiz Öztireli, M. Heitzler, R. Sieber, L. Hurni\",\"doi\":\"10.1080/23729333.2021.1949087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT In recent years, convolutional neural networks (CNNs) have been applied successfully to recognise persons, their body parts and pose keypoints in photos and videos. The transfer of these techniques to artificially created images is rather unexplored, though challenging since these images are drawn in different styles, body proportions, and levels of abstraction. In this work, we study these problems on the basis of pictorial maps where we identify included human figures with two consecutive CNNs: We first segment individual figures with Mask R-CNN, and then parse their body parts and estimate their poses simultaneously with four different UNet++ versions. We train the CNNs with a mixture of real persons and synthetic figures and compare the results with manually annotated test datasets consisting of pictorial figures. By varying the training datasets and the CNN configurations, we were able to improve the original Mask R-CNN model and we achieved moderately satisfying results with the UNet++ versions. The extracted figures may be used for animation and storytelling and may be relevant for the analysis of historic and contemporary maps.\",\"PeriodicalId\":36401,\"journal\":{\"name\":\"International Journal of Cartography\",\"volume\":\"111 3S 1\",\"pages\":\"291 - 307\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2021-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Cartography\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/23729333.2021.1949087\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Cartography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/23729333.2021.1949087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Instance Segmentation, Body Part Parsing, and Pose Estimation of Human Figures in Pictorial Maps
ABSTRACT In recent years, convolutional neural networks (CNNs) have been applied successfully to recognise persons, their body parts and pose keypoints in photos and videos. The transfer of these techniques to artificially created images is rather unexplored, though challenging since these images are drawn in different styles, body proportions, and levels of abstraction. In this work, we study these problems on the basis of pictorial maps where we identify included human figures with two consecutive CNNs: We first segment individual figures with Mask R-CNN, and then parse their body parts and estimate their poses simultaneously with four different UNet++ versions. We train the CNNs with a mixture of real persons and synthetic figures and compare the results with manually annotated test datasets consisting of pictorial figures. By varying the training datasets and the CNN configurations, we were able to improve the original Mask R-CNN model and we achieved moderately satisfying results with the UNet++ versions. The extracted figures may be used for animation and storytelling and may be relevant for the analysis of historic and contemporary maps.