{"title":"视觉神经网络学习什么?","authors":"Daniella Har-Shalom, Yair Weiss","doi":"10.1146/annurev-vision-110323-112903","DOIUrl":null,"url":null,"abstract":"<p><p>Over the past decade, artificial neural networks trained to classify images downloaded from the internet have achieved astounding, almost superhuman performance and have been suggested as possible models for human vision. In this article, we review experimental evidence from multiple studies elucidating the classification strategy learned by successful visual neural networks (VNNs) and how this strategy may be related to human vision as well as previous approaches to computer vision. The studies we review evaluate the performance of VNNs on carefully designed tasks that are meant to tease out the cues they use. The use of this method shows that VNNs are often fooled by image changes to which human object recognition is largely invariant (e.g., the change of a few pixels in the image or a change of the background or illumination), and, conversely, that the networks can be invariant to very large image manipulations that disrupt human performance (e.g., randomly permuting the patches of an image). Taken together, the evidence suggests that these networks have learned relatively low-level cues that are extremely effective at classifying internet images but are ineffective at classifying many other images that humans can classify effortlessly.</p>","PeriodicalId":48658,"journal":{"name":"Annual Review of Vision Science","volume":" ","pages":"591-610"},"PeriodicalIF":5.5000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"What Do Visual Neural Networks Learn?\",\"authors\":\"Daniella Har-Shalom, Yair Weiss\",\"doi\":\"10.1146/annurev-vision-110323-112903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Over the past decade, artificial neural networks trained to classify images downloaded from the internet have achieved astounding, almost superhuman performance and have been suggested as possible models for human vision. In this article, we review experimental evidence from multiple studies elucidating the classification strategy learned by successful visual neural networks (VNNs) and how this strategy may be related to human vision as well as previous approaches to computer vision. The studies we review evaluate the performance of VNNs on carefully designed tasks that are meant to tease out the cues they use. The use of this method shows that VNNs are often fooled by image changes to which human object recognition is largely invariant (e.g., the change of a few pixels in the image or a change of the background or illumination), and, conversely, that the networks can be invariant to very large image manipulations that disrupt human performance (e.g., randomly permuting the patches of an image). Taken together, the evidence suggests that these networks have learned relatively low-level cues that are extremely effective at classifying internet images but are ineffective at classifying many other images that humans can classify effortlessly.</p>\",\"PeriodicalId\":48658,\"journal\":{\"name\":\"Annual Review of Vision Science\",\"volume\":\" \",\"pages\":\"591-610\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Review of Vision Science\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1146/annurev-vision-110323-112903\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/29 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"NEUROSCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Review of Vision Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1146/annurev-vision-110323-112903","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/29 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
Over the past decade, artificial neural networks trained to classify images downloaded from the internet have achieved astounding, almost superhuman performance and have been suggested as possible models for human vision. In this article, we review experimental evidence from multiple studies elucidating the classification strategy learned by successful visual neural networks (VNNs) and how this strategy may be related to human vision as well as previous approaches to computer vision. The studies we review evaluate the performance of VNNs on carefully designed tasks that are meant to tease out the cues they use. The use of this method shows that VNNs are often fooled by image changes to which human object recognition is largely invariant (e.g., the change of a few pixels in the image or a change of the background or illumination), and, conversely, that the networks can be invariant to very large image manipulations that disrupt human performance (e.g., randomly permuting the patches of an image). Taken together, the evidence suggests that these networks have learned relatively low-level cues that are extremely effective at classifying internet images but are ineffective at classifying many other images that humans can classify effortlessly.
期刊介绍:
The Annual Review of Vision Science reviews progress in the visual sciences, a cross-cutting set of disciplines which intersect psychology, neuroscience, computer science, cell biology and genetics, and clinical medicine. The journal covers a broad range of topics and techniques, including optics, retina, central visual processing, visual perception, eye movements, visual development, vision models, computer vision, and the mechanisms of visual disease, dysfunction, and sight restoration. The study of vision is central to progress in many areas of science, and this new journal will explore and expose the connections that link it to biology, behavior, computation, engineering, and medicine.