Adaptive Multi-Task Learning for Multi-PAR in Real World

IF 2.3 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE journal of radio frequency identification Pub Date : 2024-02-29 DOI:10.1109/JRFID.2024.3371881

Haoyun Sun;Hongwei Zhao;Weishan Zhang;Liang Xu;Hongqing Guan

{"title":"Adaptive Multi-Task Learning for Multi-PAR in Real World","authors":"Haoyun Sun;Hongwei Zhao;Weishan Zhang;Liang Xu;Hongqing Guan","doi":"10.1109/JRFID.2024.3371881","DOIUrl":null,"url":null,"abstract":"Multi-pedestrian attribute recognition (Multi-PAR) is a vital task for smart city surveillance applications, which requires identifying various attributes of multiple pedestrians in a single image. However, most existing methods are limited by the complex backgrounds and the time-consuming pedestrian detection preprocessing work in real-world scenarios, and cannot achieve satisfactory accuracy and efficiency. In this paper, we present a novel end-to-end solution, named Adaptive Multi-Task Network (AMTN), which jointly performs multiple tasks and leverages an adaptive feature re-extraction (AFRE) module to optimize them. Specially, We integrate pedestrian detection into AMTN to perform PAR preprocessing, and incorporate a person re-identification (ReID) task branch to track pedestrians in video streams, thereby selecting the clearest video frames for analysis instead of every video frame to improve analysis efficiency and recognition accuracy. Moreover, we design a dynamic weight fitting loss (DWFL) function to prevent gradient explosions and balance tasks during training. We conduct extensive experiments to evaluate the accuracy and efficiency of our approach, and compare it with the state-of-the-art methods. The experimental results demonstrate that our method outperforms other state-of-the-art algorithms, achieving 1.5%-4.9% improvement in accuracy on Multi-PAR. The experiments also show that the AMTN can greatly improve the efficiency of preprocessing by saving the computation of feature extraction through basic features sharing. Compared with the state-of-the-art detection algorithm Yolov5s, it can improve the efficiency by 42%.","PeriodicalId":73291,"journal":{"name":"IEEE journal of radio frequency identification","volume":"8 ","pages":"357-366"},"PeriodicalIF":2.3000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal of radio frequency identification","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10454582/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-pedestrian attribute recognition (Multi-PAR) is a vital task for smart city surveillance applications, which requires identifying various attributes of multiple pedestrians in a single image. However, most existing methods are limited by the complex backgrounds and the time-consuming pedestrian detection preprocessing work in real-world scenarios, and cannot achieve satisfactory accuracy and efficiency. In this paper, we present a novel end-to-end solution, named Adaptive Multi-Task Network (AMTN), which jointly performs multiple tasks and leverages an adaptive feature re-extraction (AFRE) module to optimize them. Specially, We integrate pedestrian detection into AMTN to perform PAR preprocessing, and incorporate a person re-identification (ReID) task branch to track pedestrians in video streams, thereby selecting the clearest video frames for analysis instead of every video frame to improve analysis efficiency and recognition accuracy. Moreover, we design a dynamic weight fitting loss (DWFL) function to prevent gradient explosions and balance tasks during training. We conduct extensive experiments to evaluate the accuracy and efficiency of our approach, and compare it with the state-of-the-art methods. The experimental results demonstrate that our method outperforms other state-of-the-art algorithms, achieving 1.5%-4.9% improvement in accuracy on Multi-PAR. The experiments also show that the AMTN can greatly improve the efficiency of preprocessing by saving the computation of feature extraction through basic features sharing. Compared with the state-of-the-art detection algorithm Yolov5s, it can improve the efficiency by 42%.

查看原文本刊更多论文

针对真实世界中多 PAR 的自适应多任务学习

多行人属性识别（Multi-PAR）是智慧城市监控应用中的一项重要任务，它要求在单幅图像中识别多个行人的各种属性。然而，现有的大多数方法受限于现实场景中复杂的背景和耗时的行人检测预处理工作，无法达到令人满意的精度和效率。在本文中，我们提出了一种新颖的端到端解决方案，名为自适应多任务网络（AMTN），它可以联合执行多项任务，并利用自适应特征再提取（AFRE）模块对其进行优化。特别是，我们在 AMTN 中集成了行人检测功能，以执行 PAR 预处理，并集成了人员再识别（ReID）任务分支，以跟踪视频流中的行人，从而选择最清晰的视频帧进行分析，而不是每个视频帧，以提高分析效率和识别准确率。此外，我们还设计了动态权重拟合损失（DWFL）函数，以防止训练过程中出现梯度爆炸和平衡任务。我们进行了大量实验来评估我们方法的准确性和效率，并将其与最先进的方法进行比较。实验结果表明，我们的方法优于其他最先进的算法，在 Multi-PAR 上的准确率提高了 1.5%-4.9%。实验还表明，AMTN 通过基本特征共享节省了特征提取的计算量，从而大大提高了预处理的效率。与最先进的检测算法 Yolov5s 相比，其效率提高了 42%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE journal of radio frequency identification

CiteScore

5.70

自引率

0.00%

发文量