Scale-aware Gaussian mixture loss for crowd localization transformers

IF 3 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

High-Confidence Computing Pub Date : 2024-12-10 DOI:10.1016/j.hcc.2024.100296

Alabi Mehzabin Anisha, Sriram Chellappan

{"title":"Scale-aware Gaussian mixture loss for crowd localization transformers","authors":"Alabi Mehzabin Anisha, Sriram Chellappan","doi":"10.1016/j.hcc.2024.100296","DOIUrl":null,"url":null,"abstract":"<div><div>A fundamental problem in crowd localization using computer vision techniques stems from intrinsic scale shifts. Scale shifts occur when the crowd density within an image is uneven and chaotic, a feature common in dense crowds. At locations nearer to the camera, crowd density is lower than those farther away. Consequently, there is a significant change in the number of pixels representing a person across locations in an image depending on the camera’s position. Existing crowd localization methods do not effectively handle scale shifts, resulting in relatively poor performance in dense crowd images. In this paper, we explicitly address this challenge. Our method, called Gaussian Loss Transformers (GLT), directly incorporates scale variants in crowds by adapting loss functions to handle them in the end-to-end training pipeline. To inform the model about the scale variants within the crowd, we utilize a Gaussian mixture model (GMM) for pre-processing the ground truths into non-overlapping clusters. This cluster information is utilized as a weighting factor while computing the localization loss for that cluster. Extensive experiments on state-of-the-art datasets and computer vision models reveal that our method improves localization performance in dense crowd images. We also analyze the effect of multiple parameters in our technique and report findings on their impact on crowd localization performance.</div></div>","PeriodicalId":100605,"journal":{"name":"High-Confidence Computing","volume":"5 3","pages":"Article 100296"},"PeriodicalIF":3.0000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High-Confidence Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667295224000990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

A fundamental problem in crowd localization using computer vision techniques stems from intrinsic scale shifts. Scale shifts occur when the crowd density within an image is uneven and chaotic, a feature common in dense crowds. At locations nearer to the camera, crowd density is lower than those farther away. Consequently, there is a significant change in the number of pixels representing a person across locations in an image depending on the camera’s position. Existing crowd localization methods do not effectively handle scale shifts, resulting in relatively poor performance in dense crowd images. In this paper, we explicitly address this challenge. Our method, called Gaussian Loss Transformers (GLT), directly incorporates scale variants in crowds by adapting loss functions to handle them in the end-to-end training pipeline. To inform the model about the scale variants within the crowd, we utilize a Gaussian mixture model (GMM) for pre-processing the ground truths into non-overlapping clusters. This cluster information is utilized as a weighting factor while computing the localization loss for that cluster. Extensive experiments on state-of-the-art datasets and computer vision models reveal that our method improves localization performance in dense crowd images. We also analyze the effect of multiple parameters in our technique and report findings on their impact on crowd localization performance.

查看原文本刊更多论文

群体定位变压器的尺度感知高斯混合损耗

使用计算机视觉技术进行人群定位的一个基本问题源于固有的尺度变化。当图像内的人群密度不均匀和混乱时，就会发生尺度变化，这是密集人群中常见的特征。在离摄像机较近的地方，人群密度低于离摄像机较远的地方。因此，根据相机的位置，在图像中不同位置代表人物的像素数量会发生显著变化。现有的人群定位方法不能有效地处理尺度变化，导致在密集人群图像中性能相对较差。在本文中，我们明确地解决了这一挑战。我们的方法，称为高斯损耗变压器（GLT），通过调整损失函数在端到端训练管道中处理它们，直接将人群中的尺度变量纳入其中。为了让模型了解人群中的尺度变化，我们使用高斯混合模型（GMM）将基本事实预处理为不重叠的聚类。在计算该集群的定位损失时，将该集群信息用作加权因子。在最先进的数据集和计算机视觉模型上进行的大量实验表明，我们的方法提高了密集人群图像的定位性能。我们还分析了技术中多个参数的影响，并报告了它们对人群定位性能的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

High-Confidence Computing

CiteScore

4.70

自引率

0.00%

发文量