{"title":"利用变形图和密度图改进计数结果的人群计数注意事项","authors":"P. Do","doi":"10.1109/NICS54270.2021.9701500","DOIUrl":null,"url":null,"abstract":"With the vigorous development of CNN, most crowd counting methods have approached using CNN to estimate the density map and then infer the count. However, these methods face many limitations due to limited receptive fields, background noise, etc. With the advent of Transformer in natural language processing, it is possible to utilize this model for the crowd counting problem. The Transformer can model the global context, so it helps to solve the problem of receptive fields. On the other hand, with the attention mechanism, the model can focus on areas of concentration of people, helping to solve the problem of background noise. In this paper, we propose a Crowd counting model combining Transformer and Density map (TDCrowd) to estimate the number of people in a crowd. With the use of a Transformer, TDCrowd can still be trained so that it does not need information about the location of people in the crowd, but only information about the count. Experiments on three datasets ShanghaiTech, UCF_QNR, and JHU-Crowd++, show that TDCrowd gives better results when compared to regression-based methods (need only the count information) and density map-based (need the count information and location information).","PeriodicalId":296963,"journal":{"name":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Attention in Crowd Counting Using the Transformer and Density Map to Improve Counting Result\",\"authors\":\"P. Do\",\"doi\":\"10.1109/NICS54270.2021.9701500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the vigorous development of CNN, most crowd counting methods have approached using CNN to estimate the density map and then infer the count. However, these methods face many limitations due to limited receptive fields, background noise, etc. With the advent of Transformer in natural language processing, it is possible to utilize this model for the crowd counting problem. The Transformer can model the global context, so it helps to solve the problem of receptive fields. On the other hand, with the attention mechanism, the model can focus on areas of concentration of people, helping to solve the problem of background noise. In this paper, we propose a Crowd counting model combining Transformer and Density map (TDCrowd) to estimate the number of people in a crowd. With the use of a Transformer, TDCrowd can still be trained so that it does not need information about the location of people in the crowd, but only information about the count. Experiments on three datasets ShanghaiTech, UCF_QNR, and JHU-Crowd++, show that TDCrowd gives better results when compared to regression-based methods (need only the count information) and density map-based (need the count information and location information).\",\"PeriodicalId\":296963,\"journal\":{\"name\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS54270.2021.9701500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS54270.2021.9701500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Attention in Crowd Counting Using the Transformer and Density Map to Improve Counting Result
With the vigorous development of CNN, most crowd counting methods have approached using CNN to estimate the density map and then infer the count. However, these methods face many limitations due to limited receptive fields, background noise, etc. With the advent of Transformer in natural language processing, it is possible to utilize this model for the crowd counting problem. The Transformer can model the global context, so it helps to solve the problem of receptive fields. On the other hand, with the attention mechanism, the model can focus on areas of concentration of people, helping to solve the problem of background noise. In this paper, we propose a Crowd counting model combining Transformer and Density map (TDCrowd) to estimate the number of people in a crowd. With the use of a Transformer, TDCrowd can still be trained so that it does not need information about the location of people in the crowd, but only information about the count. Experiments on three datasets ShanghaiTech, UCF_QNR, and JHU-Crowd++, show that TDCrowd gives better results when compared to regression-based methods (need only the count information) and density map-based (need the count information and location information).