{"title":"The Next Generation of Neural Networks","authors":"Geoffrey E. Hinton","doi":"10.1145/3397271.3402425","DOIUrl":null,"url":null,"abstract":"The most important unsolved problem with artificial neural networks is how to do unsupervised learning as effectively as the brain. There are currently two main approaches to unsupervised learning. In the first approach, exemplified by BERT and Variational Autoencoders, a deep neural network is used to reconstruct its input. This is problematic for images because the deepest layers of the network need to encode the fine details of the image. An alternative approach, introduced by Becker and Hinton in 1992, is to train two copies of a deep neural network to produce output vectors that have high mutual information when given two different crops of the same image as their inputs. This approach was designed to allow the representations to be untethered from irrelevant details of the input. The method of optimizing mutual information used by Becker and Hinton was flawed (for a subtle reason that I will explain) so Pacannaro and Hinton (2001) replaced it by a discriminative objective in which one vector representation must select a corresponding vector representation from among many alternatives. With faster hardware, contrastive learning of representations has recently become very popular and is proving to be very effective, but it suffers from a major flaw: To learn pairs of representation vectors that have N bits of mutual information we need to contrast the correct corresponding vector with about 2N incorrect alternatives. I will describe a novel and effective way of dealing with this limitation. I will also show that this leads to a simple way of implementing perceptual learning in cortex.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397271.3402425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The most important unsolved problem with artificial neural networks is how to do unsupervised learning as effectively as the brain. There are currently two main approaches to unsupervised learning. In the first approach, exemplified by BERT and Variational Autoencoders, a deep neural network is used to reconstruct its input. This is problematic for images because the deepest layers of the network need to encode the fine details of the image. An alternative approach, introduced by Becker and Hinton in 1992, is to train two copies of a deep neural network to produce output vectors that have high mutual information when given two different crops of the same image as their inputs. This approach was designed to allow the representations to be untethered from irrelevant details of the input. The method of optimizing mutual information used by Becker and Hinton was flawed (for a subtle reason that I will explain) so Pacannaro and Hinton (2001) replaced it by a discriminative objective in which one vector representation must select a corresponding vector representation from among many alternatives. With faster hardware, contrastive learning of representations has recently become very popular and is proving to be very effective, but it suffers from a major flaw: To learn pairs of representation vectors that have N bits of mutual information we need to contrast the correct corresponding vector with about 2N incorrect alternatives. I will describe a novel and effective way of dealing with this limitation. I will also show that this leads to a simple way of implementing perceptual learning in cortex.