{"title":"通过平滑扩展卷积和变压器实现准确的人群计数","authors":"Xin Zeng, Huake Wang, Gaoyi Zhu, Yunpeng Wu","doi":"10.1109/CCAI57533.2023.10201260","DOIUrl":null,"url":null,"abstract":"Density-based methods have shown promising results on crowd counting. Many existing methods seek to extract multi-scale features by dilated convolutions, but always gridding artifacts plague dilated convolutions. In this work, we propose to solve the gridding artifacts via smooth dilated residual block (SDRB). The smoothed dilation technique adds separable and shared convolutions that provide dependency among feature maps. Moreover, we present a residual contextual transformer block (RCTB) for multi-scale feature generation. The RCTB enables the location and recognition of people on the pixel level. Finally, we corroborate the prediction accuracy and the generalization capability with extensive experimental support. Our model enjoys superior performance on three realistic and public benchmarks: JHU-CROWD++, ShanghaiTech, and FDST.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"692 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Accurate Crowd Counting Via Smoothed Dilated Convolutions and Transformer\",\"authors\":\"Xin Zeng, Huake Wang, Gaoyi Zhu, Yunpeng Wu\",\"doi\":\"10.1109/CCAI57533.2023.10201260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Density-based methods have shown promising results on crowd counting. Many existing methods seek to extract multi-scale features by dilated convolutions, but always gridding artifacts plague dilated convolutions. In this work, we propose to solve the gridding artifacts via smooth dilated residual block (SDRB). The smoothed dilation technique adds separable and shared convolutions that provide dependency among feature maps. Moreover, we present a residual contextual transformer block (RCTB) for multi-scale feature generation. The RCTB enables the location and recognition of people on the pixel level. Finally, we corroborate the prediction accuracy and the generalization capability with extensive experimental support. Our model enjoys superior performance on three realistic and public benchmarks: JHU-CROWD++, ShanghaiTech, and FDST.\",\"PeriodicalId\":285760,\"journal\":{\"name\":\"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)\",\"volume\":\"692 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCAI57533.2023.10201260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Accurate Crowd Counting Via Smoothed Dilated Convolutions and Transformer
Density-based methods have shown promising results on crowd counting. Many existing methods seek to extract multi-scale features by dilated convolutions, but always gridding artifacts plague dilated convolutions. In this work, we propose to solve the gridding artifacts via smooth dilated residual block (SDRB). The smoothed dilation technique adds separable and shared convolutions that provide dependency among feature maps. Moreover, we present a residual contextual transformer block (RCTB) for multi-scale feature generation. The RCTB enables the location and recognition of people on the pixel level. Finally, we corroborate the prediction accuracy and the generalization capability with extensive experimental support. Our model enjoys superior performance on three realistic and public benchmarks: JHU-CROWD++, ShanghaiTech, and FDST.