面向无监督人类关键点检测的前景驱动对比学习

IF 2.4 4区工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Photonics Journal Pub Date : 2025-03-07 DOI:10.1109/JPHOT.2025.3567754

Shuxian Li;Hui Luo;Zhengwei Miao;Zhixing Wang;Qiliang Bao;Jianlin Zhang

{"title":"面向无监督人类关键点检测的前景驱动对比学习","authors":"Shuxian Li;Hui Luo;Zhengwei Miao;Zhixing Wang;Qiliang Bao;Jianlin Zhang","doi":"10.1109/JPHOT.2025.3567754","DOIUrl":null,"url":null,"abstract":"Human keypoint detection has significant value in computer vision tasks such as human-machine interaction. Recently, unsupervised human keypoint detection has become prevalent due to concerns about data privacy. Most existing methods are based on a reconstruction process that extracts appearance and pose information from transformed image pairs and spatially aligns them to obtain a reconstructed image for detection. However, these methods suffer from an issue because they reconstruct the entire image, which can easily lead to some keypoints being assigned to the background region. In this work, we believe that focusing on independent reconstruction and detection of the foreground region can mitigate the above issue. To this end, we propose a novel unsupervised human keypoint detection scheme to achieve reliable detection, which focuses on reconstructing and detecting keypoints in the foreground. Specifically, we first use a segmentor to separate the foreground and background of the image, for reconstruction and detection to be done only on the foreground region. Considering that keypoints vary due to changes in appearance and pose, we then introduce the contrastive loss to expand the feature space and enhance the network's robustness. Depending on the insertion position of the segmentor, we differentiate the proposed scheme into two versions: the effective version and the efficient version. Experimental results on popular datasets show that the proposed method exhibits superior performance. Specifically, on the BBC Pose dataset, the effective version achieves a <inline-formula><tex-math>$\\mathbf{7.0\\%}$</tex-math></inline-formula> performance improvement. The efficient version leads to a <inline-formula><tex-math>$\\mathbf{5.7\\%}$</tex-math></inline-formula> performance enhancement without sacrificing the inference speed.","PeriodicalId":13204,"journal":{"name":"IEEE Photonics Journal","volume":"17 3","pages":"1-14"},"PeriodicalIF":2.4000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10989733","citationCount":"0","resultStr":"{\"title\":\"Foreground-Driven Contrastive Learning for Unsupervised Human Keypoint Detection\",\"authors\":\"Shuxian Li;Hui Luo;Zhengwei Miao;Zhixing Wang;Qiliang Bao;Jianlin Zhang\",\"doi\":\"10.1109/JPHOT.2025.3567754\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human keypoint detection has significant value in computer vision tasks such as human-machine interaction. Recently, unsupervised human keypoint detection has become prevalent due to concerns about data privacy. Most existing methods are based on a reconstruction process that extracts appearance and pose information from transformed image pairs and spatially aligns them to obtain a reconstructed image for detection. However, these methods suffer from an issue because they reconstruct the entire image, which can easily lead to some keypoints being assigned to the background region. In this work, we believe that focusing on independent reconstruction and detection of the foreground region can mitigate the above issue. To this end, we propose a novel unsupervised human keypoint detection scheme to achieve reliable detection, which focuses on reconstructing and detecting keypoints in the foreground. Specifically, we first use a segmentor to separate the foreground and background of the image, for reconstruction and detection to be done only on the foreground region. Considering that keypoints vary due to changes in appearance and pose, we then introduce the contrastive loss to expand the feature space and enhance the network's robustness. Depending on the insertion position of the segmentor, we differentiate the proposed scheme into two versions: the effective version and the efficient version. Experimental results on popular datasets show that the proposed method exhibits superior performance. Specifically, on the BBC Pose dataset, the effective version achieves a <inline-formula><tex-math>$\\\\mathbf{7.0\\\\%}$</tex-math></inline-formula> performance improvement. The efficient version leads to a <inline-formula><tex-math>$\\\\mathbf{5.7\\\\%}$</tex-math></inline-formula> performance enhancement without sacrificing the inference speed.\",\"PeriodicalId\":13204,\"journal\":{\"name\":\"IEEE Photonics Journal\",\"volume\":\"17 3\",\"pages\":\"1-14\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10989733\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Photonics Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10989733/\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Photonics Journal","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10989733/","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

人体关键点检测在人机交互等计算机视觉任务中具有重要的应用价值。最近，由于对数据隐私的担忧，无监督的人类关键点检测变得普遍。大多数现有的方法都是基于一种重建过程，即从转换后的图像对中提取外观和姿态信息，并对其进行空间对齐，以获得用于检测的重建图像。然而，这些方法存在一个问题，因为它们重构了整个图像，这很容易导致一些关键点被分配到背景区域。在这项工作中，我们认为专注于前景区域的独立重建和检测可以缓解上述问题。为此，我们提出了一种新的无监督人类关键点检测方案来实现可靠的检测，该方案着重于前景关键点的重建和检测。具体来说，我们首先使用分割器分离图像的前景和背景，以便仅在前景区域进行重建和检测。考虑到关键点会随着外观和姿态的变化而变化，我们引入对比损失来扩展特征空间，增强网络的鲁棒性。根据分割器的插入位置，我们将提出的方案分为两个版本：有效版本和有效版本。在常用数据集上的实验结果表明，该方法具有良好的性能。具体来说，在BBC Pose数据集上，有效版本实现了$\mathbf{7.0\%}$的性能提升。高效的版本可以在不牺牲推理速度的情况下提高$\mathbf{5.7%}$的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Foreground-Driven Contrastive Learning for Unsupervised Human Keypoint Detection

Human keypoint detection has significant value in computer vision tasks such as human-machine interaction. Recently, unsupervised human keypoint detection has become prevalent due to concerns about data privacy. Most existing methods are based on a reconstruction process that extracts appearance and pose information from transformed image pairs and spatially aligns them to obtain a reconstructed image for detection. However, these methods suffer from an issue because they reconstruct the entire image, which can easily lead to some keypoints being assigned to the background region. In this work, we believe that focusing on independent reconstruction and detection of the foreground region can mitigate the above issue. To this end, we propose a novel unsupervised human keypoint detection scheme to achieve reliable detection, which focuses on reconstructing and detecting keypoints in the foreground. Specifically, we first use a segmentor to separate the foreground and background of the image, for reconstruction and detection to be done only on the foreground region. Considering that keypoints vary due to changes in appearance and pose, we then introduce the contrastive loss to expand the feature space and enhance the network's robustness. Depending on the insertion position of the segmentor, we differentiate the proposed scheme into two versions: the effective version and the efficient version. Experimental results on popular datasets show that the proposed method exhibits superior performance. Specifically, on the BBC Pose dataset, the effective version achieves a

$\mathbf{7.0\%}$

performance improvement. The efficient version leads to a

$\mathbf{5.7\%}$

performance enhancement without sacrificing the inference speed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Photonics Journal ENGINEERING, ELECTRICAL & ELECTRONIC-OPTICS

CiteScore

4.50

自引率

8.30%

发文量

489

审稿时长

1.4 months

期刊介绍： Breakthroughs in the generation of light and in its control and utilization have given rise to the field of Photonics, a rapidly expanding area of science and technology with major technological and economic impact. Photonics integrates quantum electronics and optics to accelerate progress in the generation of novel photon sources and in their utilization in emerging applications at the micro and nano scales spanning from the far-infrared/THz to the x-ray region of the electromagnetic spectrum. IEEE Photonics Journal is an online-only journal dedicated to the rapid disclosure of top-quality peer-reviewed research at the forefront of all areas of photonics. Contributions addressing issues ranging from fundamental understanding to emerging technologies and applications are within the scope of the Journal. The Journal includes topics in: Photon sources from far infrared to X-rays, Photonics materials and engineered photonic structures, Integrated optics and optoelectronic, Ultrafast, attosecond, high field and short wavelength photonics, Biophotonics, including DNA photonics, Nanophotonics, Magnetophotonics, Fundamentals of light propagation and interaction; nonlinear effects, Optical data storage, Fiber optics and optical communications devices, systems, and technologies, Micro Opto Electro Mechanical Systems (MOEMS), Microwave photonics, Optical Sensors.