密集样本深度学习

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2024-03-10 DOI:10.1162/neco_a_01666

Stephen José Hanson;Vivek Yadav;Catherine Hanson

{"title":"密集样本深度学习","authors":"Stephen José Hanson;Vivek Yadav;Catherine Hanson","doi":"10.1162/neco_a_01666","DOIUrl":null,"url":null,"abstract":"Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al., 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1228-1244"},"PeriodicalIF":2.7000,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10661260","citationCount":"0","resultStr":"{\"title\":\"Dense Sample Deep Learning\",\"authors\":\"Stephen José Hanson;Vivek Yadav;Catherine Hanson\",\"doi\":\"10.1162/neco_a_01666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al., 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"36 6\",\"pages\":\"1228-1244\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10661260\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10661260/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661260/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

深度学习（DL）是神经网络算法的一种变体，最初于 20 世纪 80 年代提出（Rumelhart 等人，1986 年），它在人工智能（AI）领域取得了令人惊讶的进展，包括语言翻译、蛋白质折叠（Jumper 等人，2021 年）、自动驾驶汽车以及最近的类人语言模型（聊天机器人）。直到最近，所有这些似乎都难以解决。尽管 DL 网络的应用越来越广泛，但人们对其学习机制和表征却知之甚少。部分原因肯定是架构的巨大规模，当然还有数据的巨大规模，因为自 1986 年以来并没有发生太大变化。但是，深度学习表征的本质在很大程度上仍然是未知的。不幸的是，拥有数百万或数十亿词条的训练集具有未知的组合性，而拥有数百万或数十亿隐藏单元的网络不容易可视化，其机制也不容易揭示。在这封信中，我们在一个新颖的高密度样本任务（5 个独特的标记，每个标记有 500 多个示例）中使用一个大型（124 万权重；VGG）DL 探索了这些挑战，这使我们能够更仔细地跟踪类别结构和特征构建的出现。我们使用各种可视化方法来跟踪分类的出现以及特征检测器和结构耦合的发展，这些方法提供了一种图形引导。从这些结果中，我们收获了对 DL 学习动态的一些基本观察，并在此基础上提出了复杂特征构建的新理论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dense Sample Deep Learning

Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al., 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.