SPCANet: congested crowd counting via strip pooling combined attention network

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2024-09-18 DOI:10.7717/peerj-cs.2273

Zhongyuan Yuan

{"title":"SPCANet: congested crowd counting via strip pooling combined attention network","authors":"Zhongyuan Yuan","doi":"10.7717/peerj-cs.2273","DOIUrl":null,"url":null,"abstract":"Crowd counting aims to estimate the number and distribution of the population in crowded places, which is an important research direction in object counting. It is widely used in public place management, crowd behavior analysis, and other scenarios, showing its robust practicality. In recent years, crowd-counting technology has been developing rapidly. However, in highly crowded and noisy scenes, the counting effect of most models is still seriously affected by the distortion of view angle, dense occlusion, and inconsistent crowd distribution. Perspective distortion causes crowds to appear in different sizes and shapes in the image, and dense occlusion and inconsistent crowd distributions result in parts of the crowd not being captured completely. This ultimately results in the imperfect capture of spatial information in the model. To solve such problems, we propose a strip pooling combined attention (SPCANet) network model based on normed-deformable convolution (NDConv). We model long-distance dependencies more efficiently by introducing strip pooling. In contrast to traditional square kernel pooling, strip pooling uses long and narrow kernels (1×N or N×1) to deal with dense crowds, mutual occlusion, and overlap. Efficient channel attention (ECA), a mechanism for learning channel attention using a local cross-channel interaction strategy, is also introduced in SPCANet. This module generates channel attention through a fast 1D convolution to reduce model complexity while improving performance as much as possible. Four mainstream datasets, Shanghai Tech Part A, Shanghai Tech Part B, UCF-QNRF, and UCF CC 50, were utilized in extensive experiments, and mean absolute error (MAE) exceeds the baseline, which is 60.9, 7.3, 90.8, and 161.1, validating the effectiveness of SPCANet. Meanwhile, mean squared error (MSE) decreases by 5.7% on average over the four datasets, and the robustness is greatly improved.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2273","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Crowd counting aims to estimate the number and distribution of the population in crowded places, which is an important research direction in object counting. It is widely used in public place management, crowd behavior analysis, and other scenarios, showing its robust practicality. In recent years, crowd-counting technology has been developing rapidly. However, in highly crowded and noisy scenes, the counting effect of most models is still seriously affected by the distortion of view angle, dense occlusion, and inconsistent crowd distribution. Perspective distortion causes crowds to appear in different sizes and shapes in the image, and dense occlusion and inconsistent crowd distributions result in parts of the crowd not being captured completely. This ultimately results in the imperfect capture of spatial information in the model. To solve such problems, we propose a strip pooling combined attention (SPCANet) network model based on normed-deformable convolution (NDConv). We model long-distance dependencies more efficiently by introducing strip pooling. In contrast to traditional square kernel pooling, strip pooling uses long and narrow kernels (1×N or N×1) to deal with dense crowds, mutual occlusion, and overlap. Efficient channel attention (ECA), a mechanism for learning channel attention using a local cross-channel interaction strategy, is also introduced in SPCANet. This module generates channel attention through a fast 1D convolution to reduce model complexity while improving performance as much as possible. Four mainstream datasets, Shanghai Tech Part A, Shanghai Tech Part B, UCF-QNRF, and UCF CC 50, were utilized in extensive experiments, and mean absolute error (MAE) exceeds the baseline, which is 60.9, 7.3, 90.8, and 161.1, validating the effectiveness of SPCANet. Meanwhile, mean squared error (MSE) decreases by 5.7% on average over the four datasets, and the robustness is greatly improved.

查看原文本刊更多论文

SPCANet：通过带状集合组合注意力网络进行拥挤人群计数

人群计数旨在估计人群密集场所的人口数量和分布，是物体计数的一个重要研究方向。它被广泛应用于公共场所管理、人群行为分析等场景，显示了其强大的实用性。近年来，人群计数技术发展迅速。然而，在高度拥挤和嘈杂的场景中，由于视角失真、密集遮挡和人群分布不一致等原因，大多数模型的计数效果仍受到严重影响。视角失真会导致图像中出现不同大小和形状的人群，而密集遮挡和不一致的人群分布则会导致部分人群无法被完全捕捉。这最终导致模型中的空间信息捕捉不完美。为了解决这些问题，我们提出了一种基于规范化可变形卷积（NDConv）的带状集合组合注意力（SPCANet）网络模型。通过引入条带池化，我们更有效地建立了长距离依赖关系模型。与传统的方形内核池相比，条状池使用长而窄的内核（1×N 或 N×1）来处理密集人群、相互遮挡和重叠等问题。SPCANet 还引入了高效通道注意力（ECA），这是一种利用局部跨通道交互策略学习通道注意力的机制。该模块通过快速一维卷积生成通道注意力，在尽可能提高性能的同时降低模型复杂度。在大量的实验中，我们使用了四个主流数据集：上海科技 A 部分、上海科技 B 部分、UCF-QNRF 和 UCF CC 50，其平均绝对误差（MAE）分别为 60.9、7.3、90.8 和 161.1，超过了基准线，验证了 SPCANet 的有效性。同时，四个数据集的平均平方误差（MSE）平均降低了 5.7%，鲁棒性大大提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.