Complexity of mental geometry for 3D pose perception

IF 1.4 4区心理学 Q4 NEUROSCIENCES

Vision Research Pub Date : 2024-06-08 DOI:10.1016/j.visres.2024.108438

Crystal Guo, Akihito Maruya, Qasim Zaidi

{"title":"Complexity of mental geometry for 3D pose perception","authors":"Crystal Guo, Akihito Maruya, Qasim Zaidi","doi":"10.1016/j.visres.2024.108438","DOIUrl":null,"url":null,"abstract":"<div><p>Biological visual systems rely on pose estimation of 3D objects to navigate and interact with their environment, but the neural mechanisms and computations for inferring 3D poses from 2D retinal images are only partially understood, especially where stereo information is missing. We previously presented evidence that humans infer the poses of 3D objects lying centered on the ground by using the geometrical back-transform from retinal images to viewer-centered world coordinates. This model explained the almost veridical estimation of poses in real scenes and the illusory rotation of poses in obliquely viewed pictures, which includes the “pointing out of the picture” phenomenon. Here we test this model for more varied configurations and find that it needs to be augmented. Five observers estimated poses of sloped, elevated, or off-center 3D sticks in each of 16 different poses displayed on a monitor in frontal and oblique views. Pose estimates in scenes and pictures showed remarkable accuracy and agreement between observers, but with a systematic fronto-parallel bias for oblique poses similar to the ground condition. The retinal projection of the pose of an object sloped wrt the ground depends on the slope. We show that observers’ estimates can be explained by the back-transform derived for close to the correct slope. The back-transform explanation also applies to obliquely viewed pictures and to off-center objects and elevated objects, making it more likely that observers use internalized perspective geometry to make 3D pose inferences while actively incorporating inferences about other aspects of object placement.</p></div>","PeriodicalId":23670,"journal":{"name":"Vision Research","volume":"222 ","pages":"Article 108438"},"PeriodicalIF":1.4000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vision Research","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0042698924000828","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"NEUROSCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Biological visual systems rely on pose estimation of 3D objects to navigate and interact with their environment, but the neural mechanisms and computations for inferring 3D poses from 2D retinal images are only partially understood, especially where stereo information is missing. We previously presented evidence that humans infer the poses of 3D objects lying centered on the ground by using the geometrical back-transform from retinal images to viewer-centered world coordinates. This model explained the almost veridical estimation of poses in real scenes and the illusory rotation of poses in obliquely viewed pictures, which includes the “pointing out of the picture” phenomenon. Here we test this model for more varied configurations and find that it needs to be augmented. Five observers estimated poses of sloped, elevated, or off-center 3D sticks in each of 16 different poses displayed on a monitor in frontal and oblique views. Pose estimates in scenes and pictures showed remarkable accuracy and agreement between observers, but with a systematic fronto-parallel bias for oblique poses similar to the ground condition. The retinal projection of the pose of an object sloped wrt the ground depends on the slope. We show that observers’ estimates can be explained by the back-transform derived for close to the correct slope. The back-transform explanation also applies to obliquely viewed pictures and to off-center objects and elevated objects, making it more likely that observers use internalized perspective geometry to make 3D pose inferences while actively incorporating inferences about other aspects of object placement.

查看原文本刊更多论文

三维姿势感知的心理几何复杂性

生物视觉系统依赖于三维物体的姿势估计来导航和与环境互动，但人们对从二维视网膜图像推断三维姿势的神经机制和计算仅有部分了解，尤其是在缺少立体信息的情况下。我们之前提出的证据表明，人类通过从视网膜图像到以观看者为中心的世界坐标的几何反变换，推断出以地面为中心的三维物体的姿势。这一模型解释了真实场景中几乎真实的姿势估计，以及斜视图片中虚幻的姿势旋转，包括 "指向图片外 "现象。在这里，我们对这一模型进行了测试，发现它需要在更多的配置上进行改进。五名观察者在显示器上以正视图和斜视图显示的 16 种不同姿势中的每一种姿势下，对倾斜、升高或偏离中心的三维木棒的姿势进行了估计。在场景和图片中的姿势估计显示出显著的准确性和观察者之间的一致性，但在与地面条件类似的斜视姿势中存在系统性的正面-平行偏差。视网膜对倾斜于地面的物体姿势的投影取决于斜度。我们的研究表明，观察者的估计值可以用接近正确坡度的反变换来解释。后向变换的解释也适用于斜视图片、偏离中心的物体和升高的物体，这使得观察者更有可能使用内化的透视几何来进行三维姿势推断，同时积极地结合物体位置的其他方面进行推断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Vision Research 医学-神经科学

CiteScore

3.70

自引率

16.70%

发文量

111

审稿时长

66 days

期刊介绍： Vision Research is a journal devoted to the functional aspects of human, vertebrate and invertebrate vision and publishes experimental and observational studies, reviews, and theoretical and computational analyses. Vision Research also publishes clinical studies relevant to normal visual function and basic research relevant to visual dysfunction or its clinical investigation. Functional aspects of vision is interpreted broadly, ranging from molecular and cellular function to perception and behavior. Detailed descriptions are encouraged but enough introductory background should be included for non-specialists. Theoretical and computational papers should give a sense of order to the facts or point to new verifiable observations. Papers dealing with questions in the history of vision science should stress the development of ideas in the field.