DTHN:双变压器头端到端人员搜索网络

IF 1.7

Computers, materials & continua Pub Date : 2023-01-01 DOI:10.32604/cmc.2023.042765

Cheng Feng, Dezhi Han, Chongqing Chen

{"title":"DTHN:双变压器头端到端人员搜索网络","authors":"Cheng Feng, Dezhi Han, Chongqing Chen","doi":"10.32604/cmc.2023.042765","DOIUrl":null,"url":null,"abstract":"Person search mainly consists of two submissions, namely Person Detection and Person Re-identification (re-ID). Existing approaches are primarily based on Faster R-CNN and Convolutional Neural Network (CNN) (e.g., ResNet). While these structures may detect high-quality bounding boxes, they seem to degrade the performance of re-ID. To address this issue, this paper proposes a Dual-Transformer Head Network (DTHN) for end-to-end person search, which contains two independent Transformer heads, a box head for detecting the bounding box and extracting efficient bounding box feature, and a re-ID head for capturing high-quality re-ID features for the re-ID task. Specifically, after the image goes through the ResNet backbone network to extract features, the Region Proposal Network (RPN) proposes possible bounding boxes. The box head then extracts more efficient features within these bounding boxes for detection. Following this, the re-ID head computes the occluded attention of the features in these bounding boxes and distinguishes them from other persons or backgrounds. Extensive experiments on two widely used benchmark datasets, CUHK-SYSU and PRW, achieve state-of-the-art performance levels, 94.9 mAP and 95.3 top-1 scores on the CUHK-SYSU dataset, and 51.6 mAP and 87.6 top-1 scores on the PRW dataset, which demonstrates the advantages of this paper’s approach. The efficiency comparison also shows our method is highly efficient in both time and space.","PeriodicalId":93535,"journal":{"name":"Computers, materials & continua","volume":"58 1","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DTHN: Dual-Transformer Head End-to-End Person Search Network\",\"authors\":\"Cheng Feng, Dezhi Han, Chongqing Chen\",\"doi\":\"10.32604/cmc.2023.042765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Person search mainly consists of two submissions, namely Person Detection and Person Re-identification (re-ID). Existing approaches are primarily based on Faster R-CNN and Convolutional Neural Network (CNN) (e.g., ResNet). While these structures may detect high-quality bounding boxes, they seem to degrade the performance of re-ID. To address this issue, this paper proposes a Dual-Transformer Head Network (DTHN) for end-to-end person search, which contains two independent Transformer heads, a box head for detecting the bounding box and extracting efficient bounding box feature, and a re-ID head for capturing high-quality re-ID features for the re-ID task. Specifically, after the image goes through the ResNet backbone network to extract features, the Region Proposal Network (RPN) proposes possible bounding boxes. The box head then extracts more efficient features within these bounding boxes for detection. Following this, the re-ID head computes the occluded attention of the features in these bounding boxes and distinguishes them from other persons or backgrounds. Extensive experiments on two widely used benchmark datasets, CUHK-SYSU and PRW, achieve state-of-the-art performance levels, 94.9 mAP and 95.3 top-1 scores on the CUHK-SYSU dataset, and 51.6 mAP and 87.6 top-1 scores on the PRW dataset, which demonstrates the advantages of this paper’s approach. The efficiency comparison also shows our method is highly efficient in both time and space.\",\"PeriodicalId\":93535,\"journal\":{\"name\":\"Computers, materials & continua\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers, materials & continua\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32604/cmc.2023.042765\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers, materials & continua","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32604/cmc.2023.042765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人员搜索主要包括人员检测和人员重新识别(re-ID)两部分。现有的方法主要基于Faster R-CNN和卷积神经网络(CNN)(如ResNet)。虽然这些结构可以检测到高质量的边界框，但它们似乎降低了re-ID的性能。为了解决这一问题，本文提出了一种用于端到端人员搜索的双变压器头网络(DTHN)，该网络包含两个独立的变压器头，一个用于检测包围盒并提取有效的包围盒特征的盒头，以及一个用于捕获高质量的重id特征以用于重id任务的重id头。具体来说，图像经过ResNet骨干网提取特征后，区域建议网络(Region Proposal network, RPN)提出可能的边界框。盒头然后在这些边界盒中提取更有效的特征进行检测。在此之后，re-ID头计算这些边界框中特征的遮挡注意力，并将其与其他人或背景区分开来。在两个广泛使用的基准数据集(中大- sysu和PRW)上进行了大量实验，达到了最先进的性能水平，中大- sysu数据集的mAP得分为94.9分，top-1得分为95.3分，PRW数据集的mAP得分为51.6分，top-1得分为87.6分，这表明了本文方法的优势。效率对比也表明，该方法在时间和空间上都是高效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DTHN: Dual-Transformer Head End-to-End Person Search Network

Person search mainly consists of two submissions, namely Person Detection and Person Re-identification (re-ID). Existing approaches are primarily based on Faster R-CNN and Convolutional Neural Network (CNN) (e.g., ResNet). While these structures may detect high-quality bounding boxes, they seem to degrade the performance of re-ID. To address this issue, this paper proposes a Dual-Transformer Head Network (DTHN) for end-to-end person search, which contains two independent Transformer heads, a box head for detecting the bounding box and extracting efficient bounding box feature, and a re-ID head for capturing high-quality re-ID features for the re-ID task. Specifically, after the image goes through the ResNet backbone network to extract features, the Region Proposal Network (RPN) proposes possible bounding boxes. The box head then extracts more efficient features within these bounding boxes for detection. Following this, the re-ID head computes the occluded attention of the features in these bounding boxes and distinguishes them from other persons or backgrounds. Extensive experiments on two widely used benchmark datasets, CUHK-SYSU and PRW, achieve state-of-the-art performance levels, 94.9 mAP and 95.3 top-1 scores on the CUHK-SYSU dataset, and 51.6 mAP and 87.6 top-1 scores on the PRW dataset, which demonstrates the advantages of this paper’s approach. The efficiency comparison also shows our method is highly efficient in both time and space.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers, materials & continua

自引率

0.00%

发文量