多阶人类视觉运动处理的机器学习建模

IF 18.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Nature Machine Intelligence Pub Date : 2025-07-15 DOI:10.1038/s42256-025-01068-w

Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, Yuan Li, Shin’ya Nishida

{"title":"多阶人类视觉运动处理的机器学习建模","authors":"Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, Yuan Li, Shin’ya Nishida","doi":"10.1038/s42256-025-01068-w","DOIUrl":null,"url":null,"abstract":"<p>Visual motion perception is a key function for agents interacting with their environment. Although recent advances in optical flow estimation using deep neural networks have surpassed human-level accuracy, a notable disparity remains. In addition to limitations in luminance-based first-order motion perception, humans can perceive motions in higher-order features—an ability lacking in conventional optical flow models that rely on intensity conservation law. To address this, we propose a dual-pathway model that mimics the cortical V1-MT motion processing pathway. It uses a trainable motion energy sensor bank and a recurrent graph network to process luminance-based motion and incorporates an additional sensing pathway with nonlinear preprocessing using a multilayer 3D CNN block to capture higher-order motion signals. We hypothesize that higher-order mechanisms are critical for estimating robust object motion in natural environments that contain complex optical fluctuations, for example, highlights on glossy surfaces. By training on motion datasets with varying material properties of moving objects, our dual-pathway model naturally developed the capacity to perceive multi-order motion as humans do. The resulting model effectively aligns with biological systems while generalizing both luminance-based and higher-order motion phenomena in natural scenes.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"670 1","pages":""},"PeriodicalIF":18.8000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning modelling for multi-order human visual motion processing\",\"authors\":\"Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, Yuan Li, Shin’ya Nishida\",\"doi\":\"10.1038/s42256-025-01068-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Visual motion perception is a key function for agents interacting with their environment. Although recent advances in optical flow estimation using deep neural networks have surpassed human-level accuracy, a notable disparity remains. In addition to limitations in luminance-based first-order motion perception, humans can perceive motions in higher-order features—an ability lacking in conventional optical flow models that rely on intensity conservation law. To address this, we propose a dual-pathway model that mimics the cortical V1-MT motion processing pathway. It uses a trainable motion energy sensor bank and a recurrent graph network to process luminance-based motion and incorporates an additional sensing pathway with nonlinear preprocessing using a multilayer 3D CNN block to capture higher-order motion signals. We hypothesize that higher-order mechanisms are critical for estimating robust object motion in natural environments that contain complex optical fluctuations, for example, highlights on glossy surfaces. By training on motion datasets with varying material properties of moving objects, our dual-pathway model naturally developed the capacity to perceive multi-order motion as humans do. The resulting model effectively aligns with biological systems while generalizing both luminance-based and higher-order motion phenomena in natural scenes.</p>\",\"PeriodicalId\":48533,\"journal\":{\"name\":\"Nature Machine Intelligence\",\"volume\":\"670 1\",\"pages\":\"\"},\"PeriodicalIF\":18.8000,\"publicationDate\":\"2025-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1038/s42256-025-01068-w\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1038/s42256-025-01068-w","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

视觉运动感知是智能体与其环境交互的关键功能。尽管利用深度神经网络进行光流估计的最新进展已经超过了人类水平的精度，但仍然存在显著的差距。除了基于亮度的一阶运动感知的局限性之外，人类可以感知高阶特征的运动，这是依赖于强度守恒定律的传统光流模型所缺乏的能力。为了解决这个问题，我们提出了一个模拟皮层V1-MT运动处理通路的双通路模型。它使用一个可训练的运动能量传感器库和一个循环图网络来处理基于亮度的运动，并结合一个附加的传感路径，使用多层3D CNN块进行非线性预处理，以捕获高阶运动信号。我们假设高阶机制对于估计包含复杂光学波动的自然环境中的鲁棒物体运动至关重要，例如，光滑表面上的高光。通过对具有不同运动物体材料属性的运动数据集进行训练，我们的双路径模型自然地发展了像人类一样感知多阶运动的能力。所得到的模型有效地与生物系统保持一致，同时推广了自然场景中基于亮度和高阶运动现象。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Machine learning modelling for multi-order human visual motion processing

查看原文本刊更多论文

Machine learning modelling for multi-order human visual motion processing

Visual motion perception is a key function for agents interacting with their environment. Although recent advances in optical flow estimation using deep neural networks have surpassed human-level accuracy, a notable disparity remains. In addition to limitations in luminance-based first-order motion perception, humans can perceive motions in higher-order features—an ability lacking in conventional optical flow models that rely on intensity conservation law. To address this, we propose a dual-pathway model that mimics the cortical V1-MT motion processing pathway. It uses a trainable motion energy sensor bank and a recurrent graph network to process luminance-based motion and incorporates an additional sensing pathway with nonlinear preprocessing using a multilayer 3D CNN block to capture higher-order motion signals. We hypothesize that higher-order mechanisms are critical for estimating robust object motion in natural environments that contain complex optical fluctuations, for example, highlights on glossy surfaces. By training on motion datasets with varying material properties of moving objects, our dual-pathway model naturally developed the capacity to perceive multi-order motion as humans do. The resulting model effectively aligns with biological systems while generalizing both luminance-based and higher-order motion phenomena in natural scenes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nature Machine Intelligence Multiple-

CiteScore

36.90

自引率

2.10%

发文量

127

期刊介绍： Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements. To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects. Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.