{"title":"现实世界目标分布估计的远程监督强化定位","authors":"Haojie Guo , Junyu Gao , Yuan Yuan","doi":"10.1016/j.patcog.2025.112385","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting the distribution of objects in the real world from monocular images is a challenging task due to the disparity between object distributions in perspective images and reality. Many researchers focus on predicting object distributions by converting perspective images into Bird’s-Eye View (BEV) images. In scenarios where camera parameter information is unavailable, the prediction of vanishing lines becomes critical for performing inverse perspective transformations. However, accurately predicting vanishing lines necessitates accounting for variations in object size, which cannot be effectively captured through simple regression models. Therefore, this paper proposes a size variation-aware method, utilizing expert knowledge from object detection to build a reinforcement learning framework for predicting vanishing lines in traffic scenes. Specifically, this method leverages size information from trained detectors to convert perspective images into BEV images without the need for additional camera intrinsic parameters. First, we design a novel reward mechanism that utilizes prior knowledge of scale differences between similar objects in perspective images, allowing the network to automatically update and learn specific vanishing line positions. Second, we propose a fast inverse perspective transformation method, which accelerates the training speed of the proposed approach. To evaluate the effectiveness of the method, experiments are conducted on two traffic flow datasets. The experimental results demonstrate that the proposed algorithm accurately predicts vanishing line positions and successfully transforms perspective images into BEV images. Furthermore, the proposed algorithm performs competitively with directly supervised methods. The code is available at: <span><span>https://github.com/HotChieh/DDRL.</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112385"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distantly supervised reinforcement localization for real-world object distribution estimation\",\"authors\":\"Haojie Guo , Junyu Gao , Yuan Yuan\",\"doi\":\"10.1016/j.patcog.2025.112385\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Predicting the distribution of objects in the real world from monocular images is a challenging task due to the disparity between object distributions in perspective images and reality. Many researchers focus on predicting object distributions by converting perspective images into Bird’s-Eye View (BEV) images. In scenarios where camera parameter information is unavailable, the prediction of vanishing lines becomes critical for performing inverse perspective transformations. However, accurately predicting vanishing lines necessitates accounting for variations in object size, which cannot be effectively captured through simple regression models. Therefore, this paper proposes a size variation-aware method, utilizing expert knowledge from object detection to build a reinforcement learning framework for predicting vanishing lines in traffic scenes. Specifically, this method leverages size information from trained detectors to convert perspective images into BEV images without the need for additional camera intrinsic parameters. First, we design a novel reward mechanism that utilizes prior knowledge of scale differences between similar objects in perspective images, allowing the network to automatically update and learn specific vanishing line positions. Second, we propose a fast inverse perspective transformation method, which accelerates the training speed of the proposed approach. To evaluate the effectiveness of the method, experiments are conducted on two traffic flow datasets. The experimental results demonstrate that the proposed algorithm accurately predicts vanishing line positions and successfully transforms perspective images into BEV images. Furthermore, the proposed algorithm performs competitively with directly supervised methods. The code is available at: <span><span>https://github.com/HotChieh/DDRL.</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112385\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010465\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010465","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Distantly supervised reinforcement localization for real-world object distribution estimation
Predicting the distribution of objects in the real world from monocular images is a challenging task due to the disparity between object distributions in perspective images and reality. Many researchers focus on predicting object distributions by converting perspective images into Bird’s-Eye View (BEV) images. In scenarios where camera parameter information is unavailable, the prediction of vanishing lines becomes critical for performing inverse perspective transformations. However, accurately predicting vanishing lines necessitates accounting for variations in object size, which cannot be effectively captured through simple regression models. Therefore, this paper proposes a size variation-aware method, utilizing expert knowledge from object detection to build a reinforcement learning framework for predicting vanishing lines in traffic scenes. Specifically, this method leverages size information from trained detectors to convert perspective images into BEV images without the need for additional camera intrinsic parameters. First, we design a novel reward mechanism that utilizes prior knowledge of scale differences between similar objects in perspective images, allowing the network to automatically update and learn specific vanishing line positions. Second, we propose a fast inverse perspective transformation method, which accelerates the training speed of the proposed approach. To evaluate the effectiveness of the method, experiments are conducted on two traffic flow datasets. The experimental results demonstrate that the proposed algorithm accurately predicts vanishing line positions and successfully transforms perspective images into BEV images. Furthermore, the proposed algorithm performs competitively with directly supervised methods. The code is available at: https://github.com/HotChieh/DDRL.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.