{"title":"Scale-aware Gaussian mixture loss for crowd localization transformers","authors":"Alabi Mehzabin Anisha, Sriram Chellappan","doi":"10.1016/j.hcc.2024.100296","DOIUrl":null,"url":null,"abstract":"<div><div>A fundamental problem in crowd localization using computer vision techniques stems from intrinsic scale shifts. Scale shifts occur when the crowd density within an image is uneven and chaotic, a feature common in dense crowds. At locations nearer to the camera, crowd density is lower than those farther away. Consequently, there is a significant change in the number of pixels representing a person across locations in an image depending on the camera’s position. Existing crowd localization methods do not effectively handle scale shifts, resulting in relatively poor performance in dense crowd images. In this paper, we explicitly address this challenge. Our method, called Gaussian Loss Transformers (GLT), directly incorporates scale variants in crowds by adapting loss functions to handle them in the end-to-end training pipeline. To inform the model about the scale variants within the crowd, we utilize a Gaussian mixture model (GMM) for pre-processing the ground truths into non-overlapping clusters. This cluster information is utilized as a weighting factor while computing the localization loss for that cluster. Extensive experiments on state-of-the-art datasets and computer vision models reveal that our method improves localization performance in dense crowd images. We also analyze the effect of multiple parameters in our technique and report findings on their impact on crowd localization performance.</div></div>","PeriodicalId":100605,"journal":{"name":"High-Confidence Computing","volume":"5 3","pages":"Article 100296"},"PeriodicalIF":3.0000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High-Confidence Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667295224000990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
A fundamental problem in crowd localization using computer vision techniques stems from intrinsic scale shifts. Scale shifts occur when the crowd density within an image is uneven and chaotic, a feature common in dense crowds. At locations nearer to the camera, crowd density is lower than those farther away. Consequently, there is a significant change in the number of pixels representing a person across locations in an image depending on the camera’s position. Existing crowd localization methods do not effectively handle scale shifts, resulting in relatively poor performance in dense crowd images. In this paper, we explicitly address this challenge. Our method, called Gaussian Loss Transformers (GLT), directly incorporates scale variants in crowds by adapting loss functions to handle them in the end-to-end training pipeline. To inform the model about the scale variants within the crowd, we utilize a Gaussian mixture model (GMM) for pre-processing the ground truths into non-overlapping clusters. This cluster information is utilized as a weighting factor while computing the localization loss for that cluster. Extensive experiments on state-of-the-art datasets and computer vision models reveal that our method improves localization performance in dense crowd images. We also analyze the effect of multiple parameters in our technique and report findings on their impact on crowd localization performance.