M. H. A. Pratama, Willy Anugrah Cahyadi, Fiky Yosef Suratman
{"title":"使用Pix2Pix进行基于图像的虚拟试穿的人类解析","authors":"M. H. A. Pratama, Willy Anugrah Cahyadi, Fiky Yosef Suratman","doi":"10.1109/IoTaIS56727.2022.9975927","DOIUrl":null,"url":null,"abstract":"Image-based virtual try-on is a method that can let people try on clothes virtually. One of the challenges in image-based virtual try-on is segmentation. The segmentation needed in the virtual try-on implementation is the one that can divide humans into several objects based on their body parts such as hair, face, neck, hands, upper body, and lower body. This type of segmentation is called human parsing. There are several human parsing methods and datasets that have achieved great results. Unfortunately, some limitations make the method unsuitable in an image-based virtual try-on model. We proposed human parsing using the Pix2Pix model with the VITON dataset. Our model yields an average accuracy of 89.76%, an average F1-score of 86.80%, and an average IoU of 76.79%. These satisfactory results allow our model to be used in upcoming image-based virtual try-on research.","PeriodicalId":138894,"journal":{"name":"2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human Parsing for Image-Based Virtual Try-On Using Pix2Pix\",\"authors\":\"M. H. A. Pratama, Willy Anugrah Cahyadi, Fiky Yosef Suratman\",\"doi\":\"10.1109/IoTaIS56727.2022.9975927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image-based virtual try-on is a method that can let people try on clothes virtually. One of the challenges in image-based virtual try-on is segmentation. The segmentation needed in the virtual try-on implementation is the one that can divide humans into several objects based on their body parts such as hair, face, neck, hands, upper body, and lower body. This type of segmentation is called human parsing. There are several human parsing methods and datasets that have achieved great results. Unfortunately, some limitations make the method unsuitable in an image-based virtual try-on model. We proposed human parsing using the Pix2Pix model with the VITON dataset. Our model yields an average accuracy of 89.76%, an average F1-score of 86.80%, and an average IoU of 76.79%. These satisfactory results allow our model to be used in upcoming image-based virtual try-on research.\",\"PeriodicalId\":138894,\"journal\":{\"name\":\"2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IoTaIS56727.2022.9975927\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IoTaIS56727.2022.9975927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Human Parsing for Image-Based Virtual Try-On Using Pix2Pix
Image-based virtual try-on is a method that can let people try on clothes virtually. One of the challenges in image-based virtual try-on is segmentation. The segmentation needed in the virtual try-on implementation is the one that can divide humans into several objects based on their body parts such as hair, face, neck, hands, upper body, and lower body. This type of segmentation is called human parsing. There are several human parsing methods and datasets that have achieved great results. Unfortunately, some limitations make the method unsuitable in an image-based virtual try-on model. We proposed human parsing using the Pix2Pix model with the VITON dataset. Our model yields an average accuracy of 89.76%, an average F1-score of 86.80%, and an average IoU of 76.79%. These satisfactory results allow our model to be used in upcoming image-based virtual try-on research.