基于姿态变换和增强背景变化的深度人体图像分析

Proceedings. Pacific Conference on Computer Graphics and Applications Pub Date : 2017-01-01 DOI:10.2312/PG.20171317

Takazumi Kikuchi, Yuki Endo, Yoshihiro Kanamori, Taisuke Hashimoto, J. Mitani

{"title":"基于姿态变换和增强背景变化的深度人体图像分析","authors":"Takazumi Kikuchi, Yuki Endo, Yoshihiro Kanamori, Taisuke Hashimoto, J. Mitani","doi":"10.2312/PG.20171317","DOIUrl":null,"url":null,"abstract":"Human parsing is a fundamental task to estimate semantic parts in a human image such as face, arm, leg, hat, and dress. Recent deep-learning based methods have achieved significant improvements, but collecting training datasets of pixel-wise annotations is labor-intensive. In this paper, we propose two solutions to cope with limited dataset. First, to handle various poses, we incorporate a pose estimation network into an end-to-end human parsing network in order to transfer common features across the domains. The pose estimation network can be trained using rich datasets and feed valuable features to the human parsing network. Second, to handle complicated backgrounds, we increase the variations of background images automatically by replacing the original backgrounds of human images with those obtained from large-scale scenery image datasets. While each of the two solutions is versatile and beneficial to human parsing, their combination yields further improvement. CCS Concepts •Computing methodologies → Image segmentation; Image processing;","PeriodicalId":88304,"journal":{"name":"Proceedings. Pacific Conference on Computer Graphics and Applications","volume":"24 1","pages":"7-12"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transferring Pose and Augmenting Background Variation for Deep Human Image Parsing\",\"authors\":\"Takazumi Kikuchi, Yuki Endo, Yoshihiro Kanamori, Taisuke Hashimoto, J. Mitani\",\"doi\":\"10.2312/PG.20171317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human parsing is a fundamental task to estimate semantic parts in a human image such as face, arm, leg, hat, and dress. Recent deep-learning based methods have achieved significant improvements, but collecting training datasets of pixel-wise annotations is labor-intensive. In this paper, we propose two solutions to cope with limited dataset. First, to handle various poses, we incorporate a pose estimation network into an end-to-end human parsing network in order to transfer common features across the domains. The pose estimation network can be trained using rich datasets and feed valuable features to the human parsing network. Second, to handle complicated backgrounds, we increase the variations of background images automatically by replacing the original backgrounds of human images with those obtained from large-scale scenery image datasets. While each of the two solutions is versatile and beneficial to human parsing, their combination yields further improvement. CCS Concepts •Computing methodologies → Image segmentation; Image processing;\",\"PeriodicalId\":88304,\"journal\":{\"name\":\"Proceedings. Pacific Conference on Computer Graphics and Applications\",\"volume\":\"24 1\",\"pages\":\"7-12\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Pacific Conference on Computer Graphics and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2312/PG.20171317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Pacific Conference on Computer Graphics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2312/PG.20171317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人类解析是一项基本任务，用于估计人类图像中的语义部分，如面部、手臂、腿、帽子和衣服。最近基于深度学习的方法已经取得了显著的进步，但是收集逐像素注释的训练数据集是一项劳动密集型的工作。在本文中，我们提出了两种解决方案来处理有限的数据集。首先，为了处理各种姿态，我们将姿态估计网络合并到端到端人类解析网络中，以便跨域传递共同特征。姿态估计网络可以使用丰富的数据集进行训练，并为人工解析网络提供有价值的特征。其次，为了处理复杂的背景，我们采用大规模风景图像数据集的背景代替人的原始图像，自动增加背景图像的变化;虽然这两种解决方案中的每一种都是通用的，并且有利于人工解析，但它们的组合会产生进一步的改进。•计算方法→图像分割;图像处理;

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Transferring Pose and Augmenting Background Variation for Deep Human Image Parsing

Human parsing is a fundamental task to estimate semantic parts in a human image such as face, arm, leg, hat, and dress. Recent deep-learning based methods have achieved significant improvements, but collecting training datasets of pixel-wise annotations is labor-intensive. In this paper, we propose two solutions to cope with limited dataset. First, to handle various poses, we incorporate a pose estimation network into an end-to-end human parsing network in order to transfer common features across the domains. The pose estimation network can be trained using rich datasets and feed valuable features to the human parsing network. Second, to handle complicated backgrounds, we increase the variations of background images automatically by replacing the original backgrounds of human images with those obtained from large-scale scenery image datasets. While each of the two solutions is versatile and beneficial to human parsing, their combination yields further improvement. CCS Concepts •Computing methodologies → Image segmentation; Image processing;

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. Pacific Conference on Computer Graphics and Applications

自引率

0.00%

发文量