{"title":"Hierarchical language description knowledge base for LLM-based human pose estimation","authors":"Wenjie Chen , Xuemei Xie","doi":"10.1016/j.neucom.2025.130336","DOIUrl":null,"url":null,"abstract":"<div><div>Language plays an important role in human communication and knowledge representation, and human cognition involves numerous top-down and bottom-up processes that rely on different levels of knowledge guidance. To align with human cognition, hierarchical language descriptions can be used in human pose estimation as different levels of knowledge guidance, which is lacking in existing studies. We propose HLanD-Pose, the Hierarchical Language Description Knowledge Base for Human Pose Estimation using the Large Language Model (LLM). It describes human posture from the whole to the components and models the poses within a scene to construct a hierarchical knowledge base. When the relevant knowledge is activated by visual information, the matched hierarchical language description of the current human pose can serve as a guide for performing the keypoint localization task. With the powerful reasoning and language comprehension abilities of large language models, human poses in images can be effectively understood, which helps to recognize and accurately locate the target keypoints. Experiments show the remarkable performance of our method on standard keypoint localization benchmarks. Moreover, the designed hierarchical language description and external knowledge base enhance the model’s superior ability to understand the human body in scene-specific datasets, demonstrating strong generalizable capability in cross-dataset keypoint localization.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"642 ","pages":"Article 130336"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010082","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Language plays an important role in human communication and knowledge representation, and human cognition involves numerous top-down and bottom-up processes that rely on different levels of knowledge guidance. To align with human cognition, hierarchical language descriptions can be used in human pose estimation as different levels of knowledge guidance, which is lacking in existing studies. We propose HLanD-Pose, the Hierarchical Language Description Knowledge Base for Human Pose Estimation using the Large Language Model (LLM). It describes human posture from the whole to the components and models the poses within a scene to construct a hierarchical knowledge base. When the relevant knowledge is activated by visual information, the matched hierarchical language description of the current human pose can serve as a guide for performing the keypoint localization task. With the powerful reasoning and language comprehension abilities of large language models, human poses in images can be effectively understood, which helps to recognize and accurately locate the target keypoints. Experiments show the remarkable performance of our method on standard keypoint localization benchmarks. Moreover, the designed hierarchical language description and external knowledge base enhance the model’s superior ability to understand the human body in scene-specific datasets, demonstrating strong generalizable capability in cross-dataset keypoint localization.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.