{"title":"Person Attribute Recognition using Hybrid Transformers for Surveillance Scenarios","authors":"S. Abhilash, Venu Madhav Nookala","doi":"10.1109/DISCOVER55800.2022.9974664","DOIUrl":null,"url":null,"abstract":"Recognition of person attributes has been an emerging research topic and also have drawn extensive attention in the area of video surveillance. It is a very important and challenging task to notice the regions of a person’s attributes. Existing methods are applied to primary convolutional neural networks to localize the region related to person attribute. In this paper we adopted a co-scale Conv-Attentional image transformer to decipher the most discriminative attribute and region at multiple levels.Serial and parallel building blocks are introduced wherein serial blocks consists of conv-attention and feed forward network and parallel blocks have two strategies which are attention with feature interpolation and direct cross layer attention. From our results we observe that hybrid transformers are better than pure transformers. Extensive experimental result shows that proposed hybrid method outperforms the existing methods on four different personal attribute datasets i.e., RapV2, RapVl, PETA, PA100K.","PeriodicalId":264177,"journal":{"name":"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DISCOVER55800.2022.9974664","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Recognition of person attributes has been an emerging research topic and also have drawn extensive attention in the area of video surveillance. It is a very important and challenging task to notice the regions of a person’s attributes. Existing methods are applied to primary convolutional neural networks to localize the region related to person attribute. In this paper we adopted a co-scale Conv-Attentional image transformer to decipher the most discriminative attribute and region at multiple levels.Serial and parallel building blocks are introduced wherein serial blocks consists of conv-attention and feed forward network and parallel blocks have two strategies which are attention with feature interpolation and direct cross layer attention. From our results we observe that hybrid transformers are better than pure transformers. Extensive experimental result shows that proposed hybrid method outperforms the existing methods on four different personal attribute datasets i.e., RapV2, RapVl, PETA, PA100K.