Yao Gu , Zhe Zheng , Yingna Wu, Guangping Xie, Na Ni
{"title":"Progressive self-supervised learning: A pre-training method for crowd counting","authors":"Yao Gu , Zhe Zheng , Yingna Wu, Guangping Xie, Na Ni","doi":"10.1016/j.patrec.2024.12.007","DOIUrl":null,"url":null,"abstract":"<div><div>Crowd counting technologies possess substantial social significance, and deep learning methods are increasingly seen as potent tools for advancing this field. Traditionally, many approaches have sought to enhance model performance by transferring knowledge from ImageNet, utilizing its classification weights to initialize models. However, the application of these pre-training weights is suboptimal for crowd counting, which involves dense prediction significantly different from image classification. To address these limitations, we introduce a progressive self-supervised learning approach, designed to generate more suitable pre-training weights from a large collection of density-related images. We gathered 173k images using custom-designed prompts and implemented a two-stage learning process to refine the feature representations of image patches with similar densities. In the first stage, mutual information between overlapping patches within the same image is maximized. Subsequently, a combination of global and local losses is evaluated to enhance feature similarity, with the latter assessing patches from different images of comparable densities. Our innovative pre-training approach demonstrated substantial improvements, reducing the Mean Absolute Error (MAE) by 7.5%, 17.6%, and 28.7% on the ShanghaiTech Part A & Part B and UCF_QNRF datasets respectively. Furthermore, when these pre-training weights were used to initialize existing models, such as CSRNet for density map regression and DM-Count for point supervision, a significant enhancement in performance was observed.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 148-154"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524003623","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Crowd counting technologies possess substantial social significance, and deep learning methods are increasingly seen as potent tools for advancing this field. Traditionally, many approaches have sought to enhance model performance by transferring knowledge from ImageNet, utilizing its classification weights to initialize models. However, the application of these pre-training weights is suboptimal for crowd counting, which involves dense prediction significantly different from image classification. To address these limitations, we introduce a progressive self-supervised learning approach, designed to generate more suitable pre-training weights from a large collection of density-related images. We gathered 173k images using custom-designed prompts and implemented a two-stage learning process to refine the feature representations of image patches with similar densities. In the first stage, mutual information between overlapping patches within the same image is maximized. Subsequently, a combination of global and local losses is evaluated to enhance feature similarity, with the latter assessing patches from different images of comparable densities. Our innovative pre-training approach demonstrated substantial improvements, reducing the Mean Absolute Error (MAE) by 7.5%, 17.6%, and 28.7% on the ShanghaiTech Part A & Part B and UCF_QNRF datasets respectively. Furthermore, when these pre-training weights were used to initialize existing models, such as CSRNet for density map regression and DM-Count for point supervision, a significant enhancement in performance was observed.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.