InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation

IF 9.3 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision Pub Date : 2024-12-11 DOI:10.1007/s11263-024-02296-0

Yulin Wang, Zanlin Ni, Yifan Pu, Cai Zhou, Jixuan Ying, Shiji Song, Gao Huang

{"title":"InfoPro: Locally Supervised Deep Learning by Maximizing Information Propagation","authors":"Yulin Wang, Zanlin Ni, Yifan Pu, Cai Zhou, Jixuan Ying, Shiji Song, Gao Huang","doi":"10.1007/s11263-024-02296-0","DOIUrl":null,"url":null,"abstract":"<p>End-to-end (E2E) training has become the <i>de-facto</i> standard for training modern deep networks, e.g., ConvNets and vision Transformers (ViTs). Typically, a global error signal is generated at the end of a model and back-propagated layer-by-layer to update the parameters. This paper shows that the reliance on back-propagating global errors may not be necessary for deep learning. More precisely, deep networks with a competitive or even better performance can be obtained by purely leveraging locally supervised learning, i.e., splitting a network into gradient-isolated modules and training them with local supervision signals. However, such an extension is non-trivial. Our experimental and theoretical analysis demonstrates that simply training local modules with an E2E objective tends to be short-sighted, collapsing task-relevant information at early layers, and hurting the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discarding task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm. We evaluate InfoPro extensively with ConvNets and ViTs, based on twelve computer vision benchmarks organized into five tasks (i.e., image/video recognition, semantic/instance segmentation, and object detection). InfoPro exhibits superior efficiency over E2E training in terms of GPU memory footprints, convergence speed, and training data scale. Moreover, InfoPro enables the effective training of more parameter- and computation-efficient models (e.g., much deeper networks), which suffer from inferior performance when trained in E2E. Code: https://github.com/blackfeather-wang/InfoPro-Pytorch.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"113 1","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-024-02296-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

End-to-end (E2E) training has become the de-facto standard for training modern deep networks, e.g., ConvNets and vision Transformers (ViTs). Typically, a global error signal is generated at the end of a model and back-propagated layer-by-layer to update the parameters. This paper shows that the reliance on back-propagating global errors may not be necessary for deep learning. More precisely, deep networks with a competitive or even better performance can be obtained by purely leveraging locally supervised learning, i.e., splitting a network into gradient-isolated modules and training them with local supervision signals. However, such an extension is non-trivial. Our experimental and theoretical analysis demonstrates that simply training local modules with an E2E objective tends to be short-sighted, collapsing task-relevant information at early layers, and hurting the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discarding task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple but effective algorithm. We evaluate InfoPro extensively with ConvNets and ViTs, based on twelve computer vision benchmarks organized into five tasks (i.e., image/video recognition, semantic/instance segmentation, and object detection). InfoPro exhibits superior efficiency over E2E training in terms of GPU memory footprints, convergence speed, and training data scale. Moreover, InfoPro enables the effective training of more parameter- and computation-efficient models (e.g., much deeper networks), which suffer from inferior performance when trained in E2E. Code: https://github.com/blackfeather-wang/InfoPro-Pytorch.

查看原文本刊更多论文

最大化信息传播的局部监督深度学习

端到端（E2E）训练已经成为训练现代深度网络的事实上的标准，例如ConvNets和视觉变压器（vit）。通常，在模型结束时生成一个全局错误信号，并逐层反向传播以更新参数。本文表明，对反向传播的全局误差的依赖可能不是深度学习所必需的。更准确地说，可以通过纯粹利用局部监督学习来获得具有竞争力甚至更好性能的深度网络，即将网络分成梯度隔离的模块并用局部监督信号进行训练。然而，这样的扩展是不平凡的。我们的实验和理论分析表明，简单地用E2E目标训练局部模块往往是短视的，会在早期层崩溃任务相关信息，并损害整个模型的性能。为了避免这个问题，我们提出了信息传播（InfoPro）损失，它鼓励本地模块保留尽可能多的有用信息，同时逐步丢弃与任务无关的信息。由于InfoPro损失在原始形式下难以计算，我们推导了可行的上界作为替代优化目标，得到了一个简单而有效的算法。我们使用ConvNets和ViTs对InfoPro进行了广泛的评估，基于12个计算机视觉基准，分为5个任务（即图像/视频识别、语义/实例分割和目标检测）。InfoPro在GPU内存占用、收敛速度和训练数据规模方面表现出优于E2E训练的效率。此外，InfoPro能够有效地训练更多参数和计算效率高的模型（例如，更深层的网络），这些模型在E2E中训练时性能较差。代码:https://github.com/blackfeather-wang/InfoPro-Pytorch。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.