SUMediPose: A 2D-3D pose estimation dataset

IF 1 Q3 MULTIDISCIPLINARY SCIENCES
Chris-Mari Schreuder , Oloff Bergh , Lizé Steyn , Rensu P. Theart
{"title":"SUMediPose: A 2D-3D pose estimation dataset","authors":"Chris-Mari Schreuder ,&nbsp;Oloff Bergh ,&nbsp;Lizé Steyn ,&nbsp;Rensu P. Theart","doi":"10.1016/j.dib.2025.111579","DOIUrl":null,"url":null,"abstract":"<div><div>Biomechanical movement analysis is crucial in medical and sports contexts, yet the technology remains expensive and inaccessible to many. Recent advancements in machine learning and computer vision, particularly in Pose Estimation (PE), offer promising alternatives. PE models detect key points on the human body to estimate its pose in either 2D or 3D space, enabling markerless motion capture. This approach facilitates more natural and flexible movement tracking without the need for physical markers. However, markerless systems generally lack the accuracy of marker-based methods and require extensive annotated data for training, which is often anatomically inaccurate. Additionally, current 3D pose estimation techniques face practical challenges, including complex hardware setups, intricate camera calibrations, and a shortage of reliable ground truth 2D-3D datasets.</div><div>To address these challenges, we introduce a multimodal dataset comprising 3,444 recordings, 2,896,943 image frames, and 3,804,413 corresponding 3D and 2D marker-based motion capture keypoint coordinates. The dataset includes 28 participants performing eight strength and conditioning actions at three different speeds, with full image and keypoint data available for 26 participants, while two participants have only keypoint data without accompanying image data. Video and image data were captured using a custom-developed multi-RGB-camera system, while the marker-based 3D data was acquired using the Vicon system and subsequently projected into each camera’s internal coordinate system, represented in both 3D space and 2D image space. The multi-RGB-camera system consists of six cameras arranged in a circular formation around the subject, offering a full 360° view of the scene from the same height and resulting in a diverse set of viewing angles. The recording setup was designed to allow both capture systems to record participants' movements simultaneously, synchronizing the data to provide ground truth 3D data, which was then back-projected to generate 2D-pixel keypoint data for each corresponding image frame. This design enables the dataset to support both 2D and 3D pose estimation tasks. To ensure anatomical accuracy, a professional placed an extensive array of markers on each participant, adhering to industry standards.</div><div>The dataset also includes all intrinsic and extrinsic camera parameters, as well as origin axis data, necessary for performing any 3D or 2D projections. This allows the dataset to be adjusted and tailored to meet specific research or application needs.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111579"},"PeriodicalIF":1.0000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925003117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Biomechanical movement analysis is crucial in medical and sports contexts, yet the technology remains expensive and inaccessible to many. Recent advancements in machine learning and computer vision, particularly in Pose Estimation (PE), offer promising alternatives. PE models detect key points on the human body to estimate its pose in either 2D or 3D space, enabling markerless motion capture. This approach facilitates more natural and flexible movement tracking without the need for physical markers. However, markerless systems generally lack the accuracy of marker-based methods and require extensive annotated data for training, which is often anatomically inaccurate. Additionally, current 3D pose estimation techniques face practical challenges, including complex hardware setups, intricate camera calibrations, and a shortage of reliable ground truth 2D-3D datasets.
To address these challenges, we introduce a multimodal dataset comprising 3,444 recordings, 2,896,943 image frames, and 3,804,413 corresponding 3D and 2D marker-based motion capture keypoint coordinates. The dataset includes 28 participants performing eight strength and conditioning actions at three different speeds, with full image and keypoint data available for 26 participants, while two participants have only keypoint data without accompanying image data. Video and image data were captured using a custom-developed multi-RGB-camera system, while the marker-based 3D data was acquired using the Vicon system and subsequently projected into each camera’s internal coordinate system, represented in both 3D space and 2D image space. The multi-RGB-camera system consists of six cameras arranged in a circular formation around the subject, offering a full 360° view of the scene from the same height and resulting in a diverse set of viewing angles. The recording setup was designed to allow both capture systems to record participants' movements simultaneously, synchronizing the data to provide ground truth 3D data, which was then back-projected to generate 2D-pixel keypoint data for each corresponding image frame. This design enables the dataset to support both 2D and 3D pose estimation tasks. To ensure anatomical accuracy, a professional placed an extensive array of markers on each participant, adhering to industry standards.
The dataset also includes all intrinsic and extrinsic camera parameters, as well as origin axis data, necessary for performing any 3D or 2D projections. This allows the dataset to be adjusted and tailored to meet specific research or application needs.

Abstract Image

SUMediPose:一个2D-3D姿态估计数据集
生物力学运动分析在医学和运动环境中是至关重要的,但这项技术仍然昂贵,而且许多人无法获得。机器学习和计算机视觉的最新进展,特别是在姿态估计(PE)方面,提供了有希望的替代方案。PE模型检测人体上的关键点,以估计其在2D或3D空间中的姿势,从而实现无标记动作捕捉。这种方法有助于更自然和灵活的运动跟踪,而不需要物理标记。然而,无标记系统通常缺乏基于标记方法的准确性,并且需要大量带注释的数据进行训练,这在解剖学上通常是不准确的。此外,目前的3D姿态估计技术面临着实际挑战,包括复杂的硬件设置,复杂的相机校准,以及缺乏可靠的地面真实2D-3D数据集。为了解决这些挑战,我们引入了一个多模态数据集,包括3,444个记录,2,896,943个图像帧和3,804,413个相应的基于标记的3D和2D动作捕捉关键点坐标。数据集包括28名参与者以三种不同的速度进行八种力量和调节动作,其中26名参与者可获得完整的图像和关键点数据,而两名参与者只有关键点数据而没有随附的图像数据。视频和图像数据使用定制开发的多rgb相机系统捕获,而基于标记的3D数据使用Vicon系统获取,随后投影到每个相机的内部坐标系统中,在3D空间和2D图像空间中表示。多rgb相机系统由六台相机组成,以圆形形式排列在主体周围,从相同的高度提供完整的360°视图,并产生不同的视角。记录设置的目的是允许两个捕捉系统同时记录参与者的动作,同步数据以提供地面真实的3D数据,然后将其反向投影以生成每个相应图像帧的2d像素关键点数据。这种设计使数据集能够支持2D和3D姿态估计任务。为了确保解剖的准确性,一位专业人士按照行业标准,在每个参与者身上放置了大量的标记。该数据集还包括执行任何3D或2D投影所需的所有内部和外部相机参数,以及原点轴数据。这使得数据集可以调整和定制,以满足特定的研究或应用需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data in Brief
Data in Brief MULTIDISCIPLINARY SCIENCES-
CiteScore
3.10
自引率
0.00%
发文量
996
审稿时长
70 days
期刊介绍: Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信