现实世界目标分布估计的远程监督强化定位

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-08-31 DOI:10.1016/j.patcog.2025.112385

Haojie Guo , Junyu Gao , Yuan Yuan

{"title":"现实世界目标分布估计的远程监督强化定位","authors":"Haojie Guo , Junyu Gao , Yuan Yuan","doi":"10.1016/j.patcog.2025.112385","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting the distribution of objects in the real world from monocular images is a challenging task due to the disparity between object distributions in perspective images and reality. Many researchers focus on predicting object distributions by converting perspective images into Bird’s-Eye View (BEV) images. In scenarios where camera parameter information is unavailable, the prediction of vanishing lines becomes critical for performing inverse perspective transformations. However, accurately predicting vanishing lines necessitates accounting for variations in object size, which cannot be effectively captured through simple regression models. Therefore, this paper proposes a size variation-aware method, utilizing expert knowledge from object detection to build a reinforcement learning framework for predicting vanishing lines in traffic scenes. Specifically, this method leverages size information from trained detectors to convert perspective images into BEV images without the need for additional camera intrinsic parameters. First, we design a novel reward mechanism that utilizes prior knowledge of scale differences between similar objects in perspective images, allowing the network to automatically update and learn specific vanishing line positions. Second, we propose a fast inverse perspective transformation method, which accelerates the training speed of the proposed approach. To evaluate the effectiveness of the method, experiments are conducted on two traffic flow datasets. The experimental results demonstrate that the proposed algorithm accurately predicts vanishing line positions and successfully transforms perspective images into BEV images. Furthermore, the proposed algorithm performs competitively with directly supervised methods. The code is available at: <span><span>https://github.com/HotChieh/DDRL.</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112385"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distantly supervised reinforcement localization for real-world object distribution estimation\",\"authors\":\"Haojie Guo , Junyu Gao , Yuan Yuan\",\"doi\":\"10.1016/j.patcog.2025.112385\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Predicting the distribution of objects in the real world from monocular images is a challenging task due to the disparity between object distributions in perspective images and reality. Many researchers focus on predicting object distributions by converting perspective images into Bird’s-Eye View (BEV) images. In scenarios where camera parameter information is unavailable, the prediction of vanishing lines becomes critical for performing inverse perspective transformations. However, accurately predicting vanishing lines necessitates accounting for variations in object size, which cannot be effectively captured through simple regression models. Therefore, this paper proposes a size variation-aware method, utilizing expert knowledge from object detection to build a reinforcement learning framework for predicting vanishing lines in traffic scenes. Specifically, this method leverages size information from trained detectors to convert perspective images into BEV images without the need for additional camera intrinsic parameters. First, we design a novel reward mechanism that utilizes prior knowledge of scale differences between similar objects in perspective images, allowing the network to automatically update and learn specific vanishing line positions. Second, we propose a fast inverse perspective transformation method, which accelerates the training speed of the proposed approach. To evaluate the effectiveness of the method, experiments are conducted on two traffic flow datasets. The experimental results demonstrate that the proposed algorithm accurately predicts vanishing line positions and successfully transforms perspective images into BEV images. Furthermore, the proposed algorithm performs competitively with directly supervised methods. The code is available at: <span><span>https://github.com/HotChieh/DDRL.</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112385\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010465\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010465","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于透视图像中的物体分布与现实中的物体分布存在差异，从单眼图像中预测物体在现实世界中的分布是一项具有挑战性的任务。许多研究者致力于将透视图像转换成鸟瞰图来预测物体的分布。在相机参数信息不可用的情况下，消失线的预测对于执行反向透视变换至关重要。然而，准确地预测消失线需要考虑物体大小的变化，这不能通过简单的回归模型有效地捕获。因此，本文提出了一种尺寸变化感知方法，利用目标检测的专家知识构建一个用于预测交通场景中消失线的强化学习框架。具体来说，该方法利用来自训练有素的检测器的尺寸信息将透视图像转换为BEV图像，而无需额外的相机固有参数。首先，我们设计了一种新的奖励机制，该机制利用透视图像中相似物体之间尺度差异的先验知识，允许网络自动更新和学习特定的消失线位置。其次，我们提出了一种快速反视角变换方法，提高了方法的训练速度。为了验证该方法的有效性，在两个交通流数据集上进行了实验。实验结果表明，该算法准确地预测了消失线位置，并成功地将透视图像转换为BEV图像。此外，该算法在性能上与直接监督方法具有竞争力。代码可从https://github.com/HotChieh/DDRL获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distantly supervised reinforcement localization for real-world object distribution estimation

Predicting the distribution of objects in the real world from monocular images is a challenging task due to the disparity between object distributions in perspective images and reality. Many researchers focus on predicting object distributions by converting perspective images into Bird’s-Eye View (BEV) images. In scenarios where camera parameter information is unavailable, the prediction of vanishing lines becomes critical for performing inverse perspective transformations. However, accurately predicting vanishing lines necessitates accounting for variations in object size, which cannot be effectively captured through simple regression models. Therefore, this paper proposes a size variation-aware method, utilizing expert knowledge from object detection to build a reinforcement learning framework for predicting vanishing lines in traffic scenes. Specifically, this method leverages size information from trained detectors to convert perspective images into BEV images without the need for additional camera intrinsic parameters. First, we design a novel reward mechanism that utilizes prior knowledge of scale differences between similar objects in perspective images, allowing the network to automatically update and learn specific vanishing line positions. Second, we propose a fast inverse perspective transformation method, which accelerates the training speed of the proposed approach. To evaluate the effectiveness of the method, experiments are conducted on two traffic flow datasets. The experimental results demonstrate that the proposed algorithm accurately predicts vanishing line positions and successfully transforms perspective images into BEV images. Furthermore, the proposed algorithm performs competitively with directly supervised methods. The code is available at: https://github.com/HotChieh/DDRL.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.