一种用于人群计数的邻居感知特征增强网络

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-05-03 DOI:10.1016/j.imavis.2025.105578

Lin Wang , Jie Li , Chun Qi , Xuan Wu , Runrun Zou , Fengping Wang , Pan Wang

{"title":"一种用于人群计数的邻居感知特征增强网络","authors":"Lin Wang , Jie Li , Chun Qi , Xuan Wu , Runrun Zou , Fengping Wang , Pan Wang","doi":"10.1016/j.imavis.2025.105578","DOIUrl":null,"url":null,"abstract":"<div><div>Deep neural networks have achieved significant progress in the field of crowd counting in recent years. However, many networks still face challenges in effectively representing crowd features due to the insufficient exploitation of inter-channel and inter-pixel relationships. To overcome these limitations, we propose the Neighbor-Aware Feature Enhancement Network (NAFENet), a novel architecture designed to strengthen feature representation by adequately leveraging both channel and pixel dependencies. Specifically, we introduce two modules to model channel dependencies: the Across Channel Attention Module (ACAM) and the Channel Residual Module (CRM). ACAM computes a relevance map to quantify the influence of adjacent channels on the current channel and extracts valuable information to enrich the feature representation. On the other hand, CRM learns the residual maps between adjacent channels to capture their correlations and differences, enabling the network to gain a deeper understanding of the image content. In addition, we embed a Spatial Correlation Module (SCM) in NAFENet to model long-range dependencies between pixels across neighboring rows to analyze long continuous structures more effectively. Experimental results on six challenging datasets demonstrate that the proposed method achieves impressive performance compared to state-of-the-art models. Complexity analysis further reveals that our model is more efficient, requiring less time and fewer computational resources than other approaches.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"159 ","pages":"Article 105578"},"PeriodicalIF":4.2000,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A neighbor-aware feature enhancement network for crowd counting\",\"authors\":\"Lin Wang , Jie Li , Chun Qi , Xuan Wu , Runrun Zou , Fengping Wang , Pan Wang\",\"doi\":\"10.1016/j.imavis.2025.105578\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep neural networks have achieved significant progress in the field of crowd counting in recent years. However, many networks still face challenges in effectively representing crowd features due to the insufficient exploitation of inter-channel and inter-pixel relationships. To overcome these limitations, we propose the Neighbor-Aware Feature Enhancement Network (NAFENet), a novel architecture designed to strengthen feature representation by adequately leveraging both channel and pixel dependencies. Specifically, we introduce two modules to model channel dependencies: the Across Channel Attention Module (ACAM) and the Channel Residual Module (CRM). ACAM computes a relevance map to quantify the influence of adjacent channels on the current channel and extracts valuable information to enrich the feature representation. On the other hand, CRM learns the residual maps between adjacent channels to capture their correlations and differences, enabling the network to gain a deeper understanding of the image content. In addition, we embed a Spatial Correlation Module (SCM) in NAFENet to model long-range dependencies between pixels across neighboring rows to analyze long continuous structures more effectively. Experimental results on six challenging datasets demonstrate that the proposed method achieves impressive performance compared to state-of-the-art models. Complexity analysis further reveals that our model is more efficient, requiring less time and fewer computational resources than other approaches.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"159 \",\"pages\":\"Article 105578\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625001660\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001660","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，深度神经网络在人群统计领域取得了重大进展。然而，由于没有充分利用通道间和像素间的关系，许多网络在有效表征人群特征方面仍然面临挑战。为了克服这些限制，我们提出了邻居感知特征增强网络（NAFENet），这是一种新的架构，旨在通过充分利用通道和像素依赖性来增强特征表示。具体来说，我们引入了两个模块来建模渠道依赖关系：跨渠道关注模块（ACAM）和渠道剩余模块（CRM）。ACAM通过计算关联图来量化相邻通道对当前通道的影响，并提取有价值的信息来丰富特征表示。另一方面，CRM学习相邻通道之间的残差映射，以捕获它们的相关性和差异，使网络能够更深入地了解图像内容。此外，我们在NAFENet中嵌入了空间相关模块（SCM）来模拟相邻行像素之间的远程依赖关系，从而更有效地分析长连续结构。在六个具有挑战性的数据集上的实验结果表明，与最先进的模型相比，所提出的方法取得了令人印象深刻的性能。复杂性分析进一步表明，我们的模型比其他方法更高效，所需的时间和计算资源更少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A neighbor-aware feature enhancement network for crowd counting

Deep neural networks have achieved significant progress in the field of crowd counting in recent years. However, many networks still face challenges in effectively representing crowd features due to the insufficient exploitation of inter-channel and inter-pixel relationships. To overcome these limitations, we propose the Neighbor-Aware Feature Enhancement Network (NAFENet), a novel architecture designed to strengthen feature representation by adequately leveraging both channel and pixel dependencies. Specifically, we introduce two modules to model channel dependencies: the Across Channel Attention Module (ACAM) and the Channel Residual Module (CRM). ACAM computes a relevance map to quantify the influence of adjacent channels on the current channel and extracts valuable information to enrich the feature representation. On the other hand, CRM learns the residual maps between adjacent channels to capture their correlations and differences, enabling the network to gain a deeper understanding of the image content. In addition, we embed a Spatial Correlation Module (SCM) in NAFENet to model long-range dependencies between pixels across neighboring rows to analyze long continuous structures more effectively. Experimental results on six challenging datasets demonstrate that the proposed method achieves impressive performance compared to state-of-the-art models. Complexity analysis further reveals that our model is more efficient, requiring less time and fewer computational resources than other approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.