{"title":"ROM-Pose: restoring occluded mask image for 2D human pose estimation.","authors":"Yunju Lee, Jihie Kim","doi":"10.7717/peerj-cs.2843","DOIUrl":null,"url":null,"abstract":"<p><p>Human pose estimation (HPE) is a field focused on estimating human poses by detecting key points in images. HPE includes methods like top-down and bottom-up approaches. The top-down approach uses a two-stage process, first locating and then detecting key points on humans with bounding boxes, whereas the bottom-up approach directly detects individual key points and integrates them to estimate the overall pose. In this article, we address the problem of bounding box detection inaccuracies in certain situations using the top-down method. The detected bounding boxes, which serve as input for the model, impact the accuracy of pose estimation. Occlusions occur when a part of the target's body is obscured by a person or object and hinder the model's ability to detect complete bounding boxes. Consequently, the model produces bounding boxes that do not recognize occluded parts, resulting in their exclusion from the input used by the HPE model. To mitigate this issue, we introduce the Restoring Occluded Mask Image for 2D Human Pose Estimation (ROM-Pose), comprising a restoration model and an HPE model. The restoration model is designed to delineate the boundary between the target's grayscale mask (occluded image) and the blocker's grayscale mask (occludee image) using the specially created Whole Common Objects in Context (COCO) dataset. Upon identifying the boundary, the restoration model restores the occluded image. This restored image is subsequently overlaid onto the RGB image for use in the HPE model. By integrating occluded parts' information into the input, the bounding box includes these areas during detection, thus enhancing the HPE model's ability to recognize them. ROM-Pose achieved a 1.6% improvement in average precision (AP) compared to the baseline.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2843"},"PeriodicalIF":3.5000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12192664/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2843","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Human pose estimation (HPE) is a field focused on estimating human poses by detecting key points in images. HPE includes methods like top-down and bottom-up approaches. The top-down approach uses a two-stage process, first locating and then detecting key points on humans with bounding boxes, whereas the bottom-up approach directly detects individual key points and integrates them to estimate the overall pose. In this article, we address the problem of bounding box detection inaccuracies in certain situations using the top-down method. The detected bounding boxes, which serve as input for the model, impact the accuracy of pose estimation. Occlusions occur when a part of the target's body is obscured by a person or object and hinder the model's ability to detect complete bounding boxes. Consequently, the model produces bounding boxes that do not recognize occluded parts, resulting in their exclusion from the input used by the HPE model. To mitigate this issue, we introduce the Restoring Occluded Mask Image for 2D Human Pose Estimation (ROM-Pose), comprising a restoration model and an HPE model. The restoration model is designed to delineate the boundary between the target's grayscale mask (occluded image) and the blocker's grayscale mask (occludee image) using the specially created Whole Common Objects in Context (COCO) dataset. Upon identifying the boundary, the restoration model restores the occluded image. This restored image is subsequently overlaid onto the RGB image for use in the HPE model. By integrating occluded parts' information into the input, the bounding box includes these areas during detection, thus enhancing the HPE model's ability to recognize them. ROM-Pose achieved a 1.6% improvement in average precision (AP) compared to the baseline.
期刊介绍:
PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.