Progressive self-supervised learning: A pre-training method for crowd counting

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2025-02-01 DOI:10.1016/j.patrec.2024.12.007

Yao Gu , Zhe Zheng , Yingna Wu, Guangping Xie, Na Ni

{"title":"Progressive self-supervised learning: A pre-training method for crowd counting","authors":"Yao Gu , Zhe Zheng , Yingna Wu, Guangping Xie, Na Ni","doi":"10.1016/j.patrec.2024.12.007","DOIUrl":null,"url":null,"abstract":"<div><div>Crowd counting technologies possess substantial social significance, and deep learning methods are increasingly seen as potent tools for advancing this field. Traditionally, many approaches have sought to enhance model performance by transferring knowledge from ImageNet, utilizing its classification weights to initialize models. However, the application of these pre-training weights is suboptimal for crowd counting, which involves dense prediction significantly different from image classification. To address these limitations, we introduce a progressive self-supervised learning approach, designed to generate more suitable pre-training weights from a large collection of density-related images. We gathered 173k images using custom-designed prompts and implemented a two-stage learning process to refine the feature representations of image patches with similar densities. In the first stage, mutual information between overlapping patches within the same image is maximized. Subsequently, a combination of global and local losses is evaluated to enhance feature similarity, with the latter assessing patches from different images of comparable densities. Our innovative pre-training approach demonstrated substantial improvements, reducing the Mean Absolute Error (MAE) by 7.5%, 17.6%, and 28.7% on the ShanghaiTech Part A & Part B and UCF_QNRF datasets respectively. Furthermore, when these pre-training weights were used to initialize existing models, such as CSRNet for density map regression and DM-Count for point supervision, a significant enhancement in performance was observed.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 148-154"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524003623","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Crowd counting technologies possess substantial social significance, and deep learning methods are increasingly seen as potent tools for advancing this field. Traditionally, many approaches have sought to enhance model performance by transferring knowledge from ImageNet, utilizing its classification weights to initialize models. However, the application of these pre-training weights is suboptimal for crowd counting, which involves dense prediction significantly different from image classification. To address these limitations, we introduce a progressive self-supervised learning approach, designed to generate more suitable pre-training weights from a large collection of density-related images. We gathered 173k images using custom-designed prompts and implemented a two-stage learning process to refine the feature representations of image patches with similar densities. In the first stage, mutual information between overlapping patches within the same image is maximized. Subsequently, a combination of global and local losses is evaluated to enhance feature similarity, with the latter assessing patches from different images of comparable densities. Our innovative pre-training approach demonstrated substantial improvements, reducing the Mean Absolute Error (MAE) by 7.5%, 17.6%, and 28.7% on the ShanghaiTech Part A & Part B and UCF_QNRF datasets respectively. Furthermore, when these pre-training weights were used to initialize existing models, such as CSRNet for density map regression and DM-Count for point supervision, a significant enhancement in performance was observed.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.