Hierarchical language description knowledge base for LLM-based human pose estimation

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wenjie Chen , Xuemei Xie
{"title":"Hierarchical language description knowledge base for LLM-based human pose estimation","authors":"Wenjie Chen ,&nbsp;Xuemei Xie","doi":"10.1016/j.neucom.2025.130336","DOIUrl":null,"url":null,"abstract":"<div><div>Language plays an important role in human communication and knowledge representation, and human cognition involves numerous top-down and bottom-up processes that rely on different levels of knowledge guidance. To align with human cognition, hierarchical language descriptions can be used in human pose estimation as different levels of knowledge guidance, which is lacking in existing studies. We propose HLanD-Pose, the Hierarchical Language Description Knowledge Base for Human Pose Estimation using the Large Language Model (LLM). It describes human posture from the whole to the components and models the poses within a scene to construct a hierarchical knowledge base. When the relevant knowledge is activated by visual information, the matched hierarchical language description of the current human pose can serve as a guide for performing the keypoint localization task. With the powerful reasoning and language comprehension abilities of large language models, human poses in images can be effectively understood, which helps to recognize and accurately locate the target keypoints. Experiments show the remarkable performance of our method on standard keypoint localization benchmarks. Moreover, the designed hierarchical language description and external knowledge base enhance the model’s superior ability to understand the human body in scene-specific datasets, demonstrating strong generalizable capability in cross-dataset keypoint localization.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"642 ","pages":"Article 130336"},"PeriodicalIF":5.5000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225010082","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Language plays an important role in human communication and knowledge representation, and human cognition involves numerous top-down and bottom-up processes that rely on different levels of knowledge guidance. To align with human cognition, hierarchical language descriptions can be used in human pose estimation as different levels of knowledge guidance, which is lacking in existing studies. We propose HLanD-Pose, the Hierarchical Language Description Knowledge Base for Human Pose Estimation using the Large Language Model (LLM). It describes human posture from the whole to the components and models the poses within a scene to construct a hierarchical knowledge base. When the relevant knowledge is activated by visual information, the matched hierarchical language description of the current human pose can serve as a guide for performing the keypoint localization task. With the powerful reasoning and language comprehension abilities of large language models, human poses in images can be effectively understood, which helps to recognize and accurately locate the target keypoints. Experiments show the remarkable performance of our method on standard keypoint localization benchmarks. Moreover, the designed hierarchical language description and external knowledge base enhance the model’s superior ability to understand the human body in scene-specific datasets, demonstrating strong generalizable capability in cross-dataset keypoint localization.
基于llm的人体姿态估计层次语言描述知识库
语言在人类的交流和知识表达中起着重要的作用,人类的认知涉及许多自上而下和自下而上的过程,这些过程依赖于不同层次的知识引导。为了与人类认知保持一致,在人体姿态估计中可以使用分层语言描述作为不同层次的知识指导,这是现有研究所缺乏的。提出了基于大语言模型(LLM)的人体姿态估计层次语言描述知识库HLanD-Pose。该方法从整体到局部描述人体姿态,并对场景中的姿态进行建模,构建层次化知识库。当相关知识被视觉信息激活时,匹配的当前人体姿态的分层语言描述可以作为执行关键点定位任务的指南。借助大型语言模型强大的推理和语言理解能力,可以有效地理解图像中的人体姿势,有助于识别和准确定位目标关键点。实验表明,该方法在标准的关键点定位基准测试中具有显著的性能。此外,设计的分层语言描述和外部知识库增强了模型在特定场景数据集中对人体的卓越理解能力,在跨数据集关键点定位方面表现出较强的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信