具有挑战性条件下的高效行人属性识别系统

Machine Graphics and Vision Pub Date : 2023-08-21 DOI:10.22630/mgv.2023.32.2.1

Ha X. Nguyen, Dong N. Hoang, Tuan A. Tran, Tuan M. Dang

{"title":"具有挑战性条件下的高效行人属性识别系统","authors":"Ha X. Nguyen, Dong N. Hoang, Tuan A. Tran, Tuan M. Dang","doi":"10.22630/mgv.2023.32.2.1","DOIUrl":null,"url":null,"abstract":"In this work, an efficient pedestrian attribute recognition system is introduced. The system is based on a novel processing pipeline that combines the best-performing attribute extraction model with an efficient attribute filtering algorithm using keypoints of human pose. The attribute extraction models are developed based on several state-of-the-art deep networks via transfer learning techniques, including ResNet50, Swin-transformer, and ConvNeXt. Pre-trained models of these networks are fine-tuned using the Ensemble Pedestrian Attribute Recognition (EPAR) dataset. Several optimization techniques, including the advanced optimizer Adam with Decoupled Weight Decay Regularization (AdamW), Random Erasing (RE), and weighted loss functions, are adopted to solve issues of data unbalancing or challenging conditions like partial and occluded bodies. Experimental evaluations are performed via EPAR that contains 26993 images of 1477 person IDs, most of which are in challenging conditions. The results show that the ConvNeXt-v2-B outperforms other networks; mean accuracy (mA) reaches 85.57%, and other indices are also the highest. The addition of AdamW or RE can improve accuracy by 1-2%. The use of new loss functions can solve the issue of data unbalancing, in which the accuracy of data-less attributes improves by a maximum of 14% in the best case. Significantly, when the attribute filtering algorithm is applied, the results are dramatically improved, and mA reaches an excellent value of 94.85%. Utilizing the state-of-the-art attribute extraction model with optimization techniques on the large-scale and diverse dataset and attribute filtering has shown a good approach and thus has a high potential for practical applications.","PeriodicalId":39750,"journal":{"name":"Machine Graphics and Vision","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An efficient pedestrian attributes recognition system under challenging conditions\",\"authors\":\"Ha X. Nguyen, Dong N. Hoang, Tuan A. Tran, Tuan M. Dang\",\"doi\":\"10.22630/mgv.2023.32.2.1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, an efficient pedestrian attribute recognition system is introduced. The system is based on a novel processing pipeline that combines the best-performing attribute extraction model with an efficient attribute filtering algorithm using keypoints of human pose. The attribute extraction models are developed based on several state-of-the-art deep networks via transfer learning techniques, including ResNet50, Swin-transformer, and ConvNeXt. Pre-trained models of these networks are fine-tuned using the Ensemble Pedestrian Attribute Recognition (EPAR) dataset. Several optimization techniques, including the advanced optimizer Adam with Decoupled Weight Decay Regularization (AdamW), Random Erasing (RE), and weighted loss functions, are adopted to solve issues of data unbalancing or challenging conditions like partial and occluded bodies. Experimental evaluations are performed via EPAR that contains 26993 images of 1477 person IDs, most of which are in challenging conditions. The results show that the ConvNeXt-v2-B outperforms other networks; mean accuracy (mA) reaches 85.57%, and other indices are also the highest. The addition of AdamW or RE can improve accuracy by 1-2%. The use of new loss functions can solve the issue of data unbalancing, in which the accuracy of data-less attributes improves by a maximum of 14% in the best case. Significantly, when the attribute filtering algorithm is applied, the results are dramatically improved, and mA reaches an excellent value of 94.85%. Utilizing the state-of-the-art attribute extraction model with optimization techniques on the large-scale and diverse dataset and attribute filtering has shown a good approach and thus has a high potential for practical applications.\",\"PeriodicalId\":39750,\"journal\":{\"name\":\"Machine Graphics and Vision\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Graphics and Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22630/mgv.2023.32.2.1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Graphics and Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22630/mgv.2023.32.2.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了一种高效的行人属性识别系统。该系统基于一种新颖的处理流水线，将性能最好的属性提取模型与高效的基于人体姿态关键点的属性过滤算法相结合。属性提取模型是基于几种最先进的深度网络，通过迁移学习技术开发的，包括ResNet50、swan -transformer和ConvNeXt。这些网络的预训练模型使用集成行人属性识别(EPAR)数据集进行微调。采用了几种优化技术，包括具有解耦权衰减正则化(AdamW)、随机擦除(RE)和加权损失函数的高级优化器Adam，以解决数据不平衡或部分和遮挡体等挑战性条件的问题。实验评估是通过EPAR进行的，EPAR包含26993张1477个人id的图像，其中大多数都处于具有挑战性的条件下。结果表明，ConvNeXt-v2-B网络优于其他网络;平均准确率(mA)达到85.57%，其他指标也最高。添加AdamW或RE可将精度提高1-2%。使用新的损失函数可以解决数据不平衡的问题，在最好的情况下，无数据属性的准确性最多提高14%。值得注意的是，当应用属性过滤算法时，结果得到了显著改善，mA达到了94.85%的优异值。利用最先进的属性提取模型和优化技术，在大规模和多样化的数据集上进行属性过滤，是一种很好的方法，具有很高的实际应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An efficient pedestrian attributes recognition system under challenging conditions

In this work, an efficient pedestrian attribute recognition system is introduced. The system is based on a novel processing pipeline that combines the best-performing attribute extraction model with an efficient attribute filtering algorithm using keypoints of human pose. The attribute extraction models are developed based on several state-of-the-art deep networks via transfer learning techniques, including ResNet50, Swin-transformer, and ConvNeXt. Pre-trained models of these networks are fine-tuned using the Ensemble Pedestrian Attribute Recognition (EPAR) dataset. Several optimization techniques, including the advanced optimizer Adam with Decoupled Weight Decay Regularization (AdamW), Random Erasing (RE), and weighted loss functions, are adopted to solve issues of data unbalancing or challenging conditions like partial and occluded bodies. Experimental evaluations are performed via EPAR that contains 26993 images of 1477 person IDs, most of which are in challenging conditions. The results show that the ConvNeXt-v2-B outperforms other networks; mean accuracy (mA) reaches 85.57%, and other indices are also the highest. The addition of AdamW or RE can improve accuracy by 1-2%. The use of new loss functions can solve the issue of data unbalancing, in which the accuracy of data-less attributes improves by a maximum of 14% in the best case. Significantly, when the attribute filtering algorithm is applied, the results are dramatically improved, and mA reaches an excellent value of 94.85%. Utilizing the state-of-the-art attribute extraction model with optimization techniques on the large-scale and diverse dataset and attribute filtering has shown a good approach and thus has a high potential for practical applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine Graphics and Vision Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

0.40

自引率

0.00%

发文量

期刊介绍： Machine GRAPHICS & VISION (MGV) is a refereed international journal, published quarterly, providing a scientific exchange forum and an authoritative source of information in the field of, in general, pictorial information exchange between computers and their environment, including applications of visual and graphical computer systems. The journal concentrates on theoretical and computational models underlying computer generated, analysed, or otherwise processed imagery, in particular: - image processing - scene analysis, modeling, and understanding - machine vision - pattern matching and pattern recognition - image synthesis, including three-dimensional imaging and solid modeling