Ao Zhang, Xin Deng, Baoying Liu, Weiwei Zhang, Jun Guo, Linrui Xie
{"title":"规模分离:使用不同密度图进行视频人群计数","authors":"Ao Zhang, Xin Deng, Baoying Liu, Weiwei Zhang, Jun Guo, Linrui Xie","doi":"10.1117/1.jei.33.4.043016","DOIUrl":null,"url":null,"abstract":"Most crowd counting methods rely on integrating density maps for prediction, but they encounter performance degradation in the face of density variations. Existing methods primarily employ a multi-scale architecture to mitigate this issue. However, few approaches concurrently consider both scale and timing information. We propose a scale-divided architecture for video crowd counting. Initially, density maps of different Gaussian scales are employed to retain information at various scales, accommodating scale changes in images. Subsequently, we observe that the spatiotemporal network places greater emphasis on individual locations, prompting us to aggregate temporal information at a specific scale. This design enables the temporal model to acquire more spatial information and alleviate occlusion issues. Experimental results on various public datasets demonstrate the superior performance of our proposed method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"69 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scale separation: video crowd counting with different density maps\",\"authors\":\"Ao Zhang, Xin Deng, Baoying Liu, Weiwei Zhang, Jun Guo, Linrui Xie\",\"doi\":\"10.1117/1.jei.33.4.043016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most crowd counting methods rely on integrating density maps for prediction, but they encounter performance degradation in the face of density variations. Existing methods primarily employ a multi-scale architecture to mitigate this issue. However, few approaches concurrently consider both scale and timing information. We propose a scale-divided architecture for video crowd counting. Initially, density maps of different Gaussian scales are employed to retain information at various scales, accommodating scale changes in images. Subsequently, we observe that the spatiotemporal network places greater emphasis on individual locations, prompting us to aggregate temporal information at a specific scale. This design enables the temporal model to acquire more spatial information and alleviate occlusion issues. Experimental results on various public datasets demonstrate the superior performance of our proposed method.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":\"69 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.4.043016\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.4.043016","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Scale separation: video crowd counting with different density maps
Most crowd counting methods rely on integrating density maps for prediction, but they encounter performance degradation in the face of density variations. Existing methods primarily employ a multi-scale architecture to mitigate this issue. However, few approaches concurrently consider both scale and timing information. We propose a scale-divided architecture for video crowd counting. Initially, density maps of different Gaussian scales are employed to retain information at various scales, accommodating scale changes in images. Subsequently, we observe that the spatiotemporal network places greater emphasis on individual locations, prompting us to aggregate temporal information at a specific scale. This design enables the temporal model to acquire more spatial information and alleviate occlusion issues. Experimental results on various public datasets demonstrate the superior performance of our proposed method.
期刊介绍:
The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.