{"title":"Complexity of mental geometry for 3D pose perception","authors":"Crystal Guo, Akihito Maruya, Qasim Zaidi","doi":"10.1016/j.visres.2024.108438","DOIUrl":null,"url":null,"abstract":"<div><p>Biological visual systems rely on pose estimation of 3D objects to navigate and interact with their environment, but the neural mechanisms and computations for inferring 3D poses from 2D retinal images are only partially understood, especially where stereo information is missing. We previously presented evidence that humans infer the poses of 3D objects lying centered on the ground by using the geometrical back-transform from retinal images to viewer-centered world coordinates. This model explained the almost veridical estimation of poses in real scenes and the illusory rotation of poses in obliquely viewed pictures, which includes the “pointing out of the picture” phenomenon. Here we test this model for more varied configurations and find that it needs to be augmented. Five observers estimated poses of sloped, elevated, or off-center 3D sticks in each of 16 different poses displayed on a monitor in frontal and oblique views. Pose estimates in scenes and pictures showed remarkable accuracy and agreement between observers, but with a systematic fronto-parallel bias for oblique poses similar to the ground condition. The retinal projection of the pose of an object sloped wrt the ground depends on the slope. We show that observers’ estimates can be explained by the back-transform derived for close to the correct slope. The back-transform explanation also applies to obliquely viewed pictures and to off-center objects and elevated objects, making it more likely that observers use internalized perspective geometry to make 3D pose inferences while actively incorporating inferences about other aspects of object placement.</p></div>","PeriodicalId":23670,"journal":{"name":"Vision Research","volume":"222 ","pages":"Article 108438"},"PeriodicalIF":1.5000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vision Research","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0042698924000828","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Biological visual systems rely on pose estimation of 3D objects to navigate and interact with their environment, but the neural mechanisms and computations for inferring 3D poses from 2D retinal images are only partially understood, especially where stereo information is missing. We previously presented evidence that humans infer the poses of 3D objects lying centered on the ground by using the geometrical back-transform from retinal images to viewer-centered world coordinates. This model explained the almost veridical estimation of poses in real scenes and the illusory rotation of poses in obliquely viewed pictures, which includes the “pointing out of the picture” phenomenon. Here we test this model for more varied configurations and find that it needs to be augmented. Five observers estimated poses of sloped, elevated, or off-center 3D sticks in each of 16 different poses displayed on a monitor in frontal and oblique views. Pose estimates in scenes and pictures showed remarkable accuracy and agreement between observers, but with a systematic fronto-parallel bias for oblique poses similar to the ground condition. The retinal projection of the pose of an object sloped wrt the ground depends on the slope. We show that observers’ estimates can be explained by the back-transform derived for close to the correct slope. The back-transform explanation also applies to obliquely viewed pictures and to off-center objects and elevated objects, making it more likely that observers use internalized perspective geometry to make 3D pose inferences while actively incorporating inferences about other aspects of object placement.
期刊介绍:
Vision Research is a journal devoted to the functional aspects of human, vertebrate and invertebrate vision and publishes experimental and observational studies, reviews, and theoretical and computational analyses. Vision Research also publishes clinical studies relevant to normal visual function and basic research relevant to visual dysfunction or its clinical investigation. Functional aspects of vision is interpreted broadly, ranging from molecular and cellular function to perception and behavior. Detailed descriptions are encouraged but enough introductory background should be included for non-specialists. Theoretical and computational papers should give a sense of order to the facts or point to new verifiable observations. Papers dealing with questions in the history of vision science should stress the development of ideas in the field.