二维人体姿态分析模型的评价指标系统化

S. Antoshchuk, Anastasiia A. Breskina
{"title":"二维人体姿态分析模型的评价指标系统化","authors":"S. Antoshchuk, Anastasiia A. Breskina","doi":"10.15276/hait.06.2023.2","DOIUrl":null,"url":null,"abstract":"This paper describes the systematization of evaluation metrics for 2D human pose analysis models. Some of the most popular tasks solved using machine learning (ML) methods are detection, tracking and recognition of human actions for various practical applications. There are a lot of different metrics that allow evaluating the model from one point or another. To evaluate a specific task, a certain set of metrics is used. However, as literature analysis shows, the vast number of metric definitions, as well as the use of different terms and multiple representations of the same ideas, causes problems of interpretation and comparison of different ML models and methods in detecting, tracking, and recognizing human actions. The purpose of this work is to analyze the metrics for evaluating methods for processing 2D human poses in video in order to facilitate the informed choice of the metrics. To improve the objectivity of evaluating the results of empirical studies of existing and newly developed methods and models for detecting, tracking, and recognizing human actions, a systematization of existing metrics into subgroups was proposed, depending on what task they evaluate. Four classes of evaluation metrics were introduced: classification metrics, key point’s detection, object tracking, and general metrics. Classification metrics are based on quality evaluation and matching values from predicted bounding boxes with ground truths. Key point’s detection metrics are oriented on the quality of found joints of the human body skeleton. Tracking metrics evaluate the object detection on each frame and the correctness of determining its trajectory. General metrics are not specifically related to any of the human 2D pose analysis tasks. The prototype of the application based on suggested metrics systematization, the purpose of which is to help data scientists in formalizing the choice of metrics for evaluating models depending on the ML problem being solved and the application area was developed. To evaluate and demonstrate the metrics, that were suggested in this application, Faster R-CNN, SSD and YOLOv3 object detection models were analyzed and compared in scope of 2D human pose analysis application area. The results of the analysis showed that Faster R-CNN and YOLOv3 have the most accurate responses, although they have the disadvantage of a high False positive rate. The implementation also showed that metrics that based on True negative values are uninformative in scope of working with bounding boxes, because of the specific of application area and inability to calculate True negatives on the image data.","PeriodicalId":375628,"journal":{"name":"Herald of Advanced Information Technology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation metrics systematization for 2D human poses analysis models\",\"authors\":\"S. Antoshchuk, Anastasiia A. Breskina\",\"doi\":\"10.15276/hait.06.2023.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the systematization of evaluation metrics for 2D human pose analysis models. Some of the most popular tasks solved using machine learning (ML) methods are detection, tracking and recognition of human actions for various practical applications. There are a lot of different metrics that allow evaluating the model from one point or another. To evaluate a specific task, a certain set of metrics is used. However, as literature analysis shows, the vast number of metric definitions, as well as the use of different terms and multiple representations of the same ideas, causes problems of interpretation and comparison of different ML models and methods in detecting, tracking, and recognizing human actions. The purpose of this work is to analyze the metrics for evaluating methods for processing 2D human poses in video in order to facilitate the informed choice of the metrics. To improve the objectivity of evaluating the results of empirical studies of existing and newly developed methods and models for detecting, tracking, and recognizing human actions, a systematization of existing metrics into subgroups was proposed, depending on what task they evaluate. Four classes of evaluation metrics were introduced: classification metrics, key point’s detection, object tracking, and general metrics. Classification metrics are based on quality evaluation and matching values from predicted bounding boxes with ground truths. Key point’s detection metrics are oriented on the quality of found joints of the human body skeleton. Tracking metrics evaluate the object detection on each frame and the correctness of determining its trajectory. General metrics are not specifically related to any of the human 2D pose analysis tasks. The prototype of the application based on suggested metrics systematization, the purpose of which is to help data scientists in formalizing the choice of metrics for evaluating models depending on the ML problem being solved and the application area was developed. To evaluate and demonstrate the metrics, that were suggested in this application, Faster R-CNN, SSD and YOLOv3 object detection models were analyzed and compared in scope of 2D human pose analysis application area. The results of the analysis showed that Faster R-CNN and YOLOv3 have the most accurate responses, although they have the disadvantage of a high False positive rate. The implementation also showed that metrics that based on True negative values are uninformative in scope of working with bounding boxes, because of the specific of application area and inability to calculate True negatives on the image data.\",\"PeriodicalId\":375628,\"journal\":{\"name\":\"Herald of Advanced Information Technology\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Herald of Advanced Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15276/hait.06.2023.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Herald of Advanced Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15276/hait.06.2023.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文描述了二维人体姿态分析模型评价指标的系统化。使用机器学习(ML)方法解决的一些最流行的任务是检测、跟踪和识别各种实际应用中的人类行为。有许多不同的度量标准允许从一点或另一点评估模型。为了评估特定的任务,需要使用一组特定的度量标准。然而,正如文献分析所显示的那样,大量的度量定义,以及使用不同的术语和相同思想的多种表示,导致了在检测、跟踪和识别人类行为时不同ML模型和方法的解释和比较问题。这项工作的目的是分析用于评估视频中二维人体姿势处理方法的指标,以促进指标的知情选择。为了提高评估现有和新开发的用于检测、跟踪和识别人类行为的方法和模型的实证研究结果的客观性,建议将现有指标系统化,根据它们评估的任务划分为子组。介绍了四类评价指标:分类指标、关键点检测指标、目标跟踪指标和一般指标。分类指标是基于质量评估和从预测的边界框匹配值与地面真相。关键点的检测指标以人体骨骼关节的发现质量为导向。跟踪指标评估每一帧的目标检测和确定其轨迹的正确性。一般指标与任何人类2D姿势分析任务没有特别的关系。基于建议的度量系统的应用程序原型,其目的是帮助数据科学家根据正在解决的机器学习问题和已开发的应用领域来形式化评估模型的度量选择。为了评估和演示本应用中提出的指标,在二维人体姿态分析应用领域范围内,对Faster R-CNN、SSD和YOLOv3目标检测模型进行了分析和比较。分析结果表明,Faster R-CNN和YOLOv3反应最准确,但存在假阳性率高的缺点。该实现还表明,基于真负值的度量在处理边界框的范围内是没有信息的,因为应用领域的特殊性和无法计算图像数据的真负值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation metrics systematization for 2D human poses analysis models
This paper describes the systematization of evaluation metrics for 2D human pose analysis models. Some of the most popular tasks solved using machine learning (ML) methods are detection, tracking and recognition of human actions for various practical applications. There are a lot of different metrics that allow evaluating the model from one point or another. To evaluate a specific task, a certain set of metrics is used. However, as literature analysis shows, the vast number of metric definitions, as well as the use of different terms and multiple representations of the same ideas, causes problems of interpretation and comparison of different ML models and methods in detecting, tracking, and recognizing human actions. The purpose of this work is to analyze the metrics for evaluating methods for processing 2D human poses in video in order to facilitate the informed choice of the metrics. To improve the objectivity of evaluating the results of empirical studies of existing and newly developed methods and models for detecting, tracking, and recognizing human actions, a systematization of existing metrics into subgroups was proposed, depending on what task they evaluate. Four classes of evaluation metrics were introduced: classification metrics, key point’s detection, object tracking, and general metrics. Classification metrics are based on quality evaluation and matching values from predicted bounding boxes with ground truths. Key point’s detection metrics are oriented on the quality of found joints of the human body skeleton. Tracking metrics evaluate the object detection on each frame and the correctness of determining its trajectory. General metrics are not specifically related to any of the human 2D pose analysis tasks. The prototype of the application based on suggested metrics systematization, the purpose of which is to help data scientists in formalizing the choice of metrics for evaluating models depending on the ML problem being solved and the application area was developed. To evaluate and demonstrate the metrics, that were suggested in this application, Faster R-CNN, SSD and YOLOv3 object detection models were analyzed and compared in scope of 2D human pose analysis application area. The results of the analysis showed that Faster R-CNN and YOLOv3 have the most accurate responses, although they have the disadvantage of a high False positive rate. The implementation also showed that metrics that based on True negative values are uninformative in scope of working with bounding boxes, because of the specific of application area and inability to calculate True negatives on the image data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信