{"title":"Self-supervised keypoint detection based on affine transformation","authors":"Na Ying, Xuewei Zhang, Miao Hu, Xinyu Lin, Kairui Yin, Jian Zhao","doi":"10.1016/j.jfranklin.2025.107648","DOIUrl":null,"url":null,"abstract":"<div><div>Self-supervised learning has emerged as a powerful approach to reducing the cost associated with data labeling for network training. Nonetheless, a key challenge in self-supervised keypoint detection is ensuring that the detected keypoints carry human-interpretable semantic meaning. This paper addresses this challenge by introducing a novel self-supervised keypoint detection algorithm designed to generate semantically meaningful human keypoints while maintaining detection accuracy. The proposed approach reformulates human keypoint detection as a problem of affine transformation of predefined keypoint templates, distinguishing itself from existing self-supervised techniques. Specifically, a semantically annotated human keypoint template is predefined, and an affine transformation matrix is learned based on extracted human pose features. By applying this matrix to the template, the algorithm generates keypoints that are not only accurate but also semantically aligned with the corresponding human poses. Furthermore, a margin loss is introduced to stabilize the affine transformations across various image scales, ensuring robust performance. Experimental evaluations on the Human3.6M and Deepfashion datasets demonstrate that the algorithm achieves an average detection error of 2.78 on Human3.6M, only a marginal increase of 0.02 compared to the baseline method, Autolink. On the Deepfashion dataset, the algorithm achieves a keypoint detection accuracy of 65%, which is 1% below Autolink. Importantly, unlike other methods, the proposed algorithm guarantees that all generated keypoints are semantically interpretable, providing a significant advantage in human-centered applications.</div></div>","PeriodicalId":17283,"journal":{"name":"Journal of The Franklin Institute-engineering and Applied Mathematics","volume":"362 8","pages":"Article 107648"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of The Franklin Institute-engineering and Applied Mathematics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016003225001425","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Self-supervised learning has emerged as a powerful approach to reducing the cost associated with data labeling for network training. Nonetheless, a key challenge in self-supervised keypoint detection is ensuring that the detected keypoints carry human-interpretable semantic meaning. This paper addresses this challenge by introducing a novel self-supervised keypoint detection algorithm designed to generate semantically meaningful human keypoints while maintaining detection accuracy. The proposed approach reformulates human keypoint detection as a problem of affine transformation of predefined keypoint templates, distinguishing itself from existing self-supervised techniques. Specifically, a semantically annotated human keypoint template is predefined, and an affine transformation matrix is learned based on extracted human pose features. By applying this matrix to the template, the algorithm generates keypoints that are not only accurate but also semantically aligned with the corresponding human poses. Furthermore, a margin loss is introduced to stabilize the affine transformations across various image scales, ensuring robust performance. Experimental evaluations on the Human3.6M and Deepfashion datasets demonstrate that the algorithm achieves an average detection error of 2.78 on Human3.6M, only a marginal increase of 0.02 compared to the baseline method, Autolink. On the Deepfashion dataset, the algorithm achieves a keypoint detection accuracy of 65%, which is 1% below Autolink. Importantly, unlike other methods, the proposed algorithm guarantees that all generated keypoints are semantically interpretable, providing a significant advantage in human-centered applications.
期刊介绍:
The Journal of The Franklin Institute has an established reputation for publishing high-quality papers in the field of engineering and applied mathematics. Its current focus is on control systems, complex networks and dynamic systems, signal processing and communications and their applications. All submitted papers are peer-reviewed. The Journal will publish original research papers and research review papers of substance. Papers and special focus issues are judged upon possible lasting value, which has been and continues to be the strength of the Journal of The Franklin Institute.