Understanding encoder–decoder structures in machine learning using information measures

IF 3.4 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing Pub Date : 2025-03-09 DOI:10.1016/j.sigpro.2025.109983

Jorge F. Silva , Victor Faraggi , Camilo Ramirez , Alvaro Egaña , Eduardo Pavez

{"title":"Understanding encoder–decoder structures in machine learning using information measures","authors":"Jorge F. Silva , Victor Faraggi , Camilo Ramirez , Alvaro Egaña , Eduardo Pavez","doi":"10.1016/j.sigpro.2025.109983","DOIUrl":null,"url":null,"abstract":"<div><div>We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder–decoder latent predictive structure. This result formally justifies the encoder–decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance could be lost, using the cross entropy risk, when a given encoder–decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder–decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder–decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon’s information measures offer new interpretations and explanations for representation learning.</div></div>","PeriodicalId":49523,"journal":{"name":"Signal Processing","volume":"234 ","pages":"Article 109983"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165168425000970","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

We present a theory of representation learning to model and understand the role of encoder–decoder design in machine learning (ML) from an information-theoretic angle. We use two main information concepts, information sufficiency (IS) and mutual information loss to represent predictive structures in machine learning. Our first main result provides a functional expression that characterizes the class of probabilistic models consistent with an IS encoder–decoder latent predictive structure. This result formally justifies the encoder–decoder forward stages many modern ML architectures adopt to learn latent (compressed) representations for classification. To illustrate IS as a realistic and relevant model assumption, we revisit some known ML concepts and present some interesting new examples: invariant, robust, sparse, and digital models. Furthermore, our IS characterization allows us to tackle the fundamental question of how much performance could be lost, using the cross entropy risk, when a given encoder–decoder architecture is adopted in a learning setting. Here, our second main result shows that a mutual information loss quantifies the lack of expressiveness attributed to the choice of a (biased) encoder–decoder ML design. Finally, we address the problem of universal cross-entropy learning with an encoder–decoder design where necessary and sufficiency conditions are established to meet this requirement. In all these results, Shannon’s information measures offer new interpretations and explanations for representation learning.

查看原文本刊更多论文

使用信息度量理解机器学习中的编码器-解码器结构

我们提出了一种表征学习理论，从信息论的角度来建模和理解编码器-解码器设计在机器学习（ML）中的作用。我们使用两个主要的信息概念，信息充分性（IS）和互信息损失来表示机器学习中的预测结构。我们的第一个主要结果提供了一个函数表达式，该表达式表征了与IS编码器-解码器潜在预测结构一致的概率模型类别。这一结果正式证明了许多现代ML架构采用编码器-解码器前向阶段来学习用于分类的潜在（压缩）表示。为了说明IS是一个现实的和相关的模型假设，我们重新审视了一些已知的ML概念，并提出了一些有趣的新例子：不变的、鲁棒的、稀疏的和数字模型。此外，我们的IS特性使我们能够解决一个基本问题，即当在学习环境中采用给定的编码器-解码器架构时，使用交叉熵风险可以损失多少性能。在这里，我们的第二个主要结果表明，互信息损失量化了由于选择（有偏差的）编码器-解码器ML设计而导致的表达性缺乏。最后，我们用编码器-解码器设计解决了通用交叉熵学习的问题，其中建立了满足这一要求的充分必要条件。在所有这些结果中，香农的信息测量为表征学习提供了新的解释和解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing 工程技术-工程：电子与电气

CiteScore

9.20

自引率

9.10%

发文量

309

审稿时长

41 days

期刊介绍： Signal Processing incorporates all aspects of the theory and practice of signal processing. It features original research work, tutorial and review articles, and accounts of practical developments. It is intended for a rapid dissemination of knowledge and experience to engineers and scientists working in the research, development or practical application of signal processing. Subject areas covered by the journal include: Signal Theory; Stochastic Processes; Detection and Estimation; Spectral Analysis; Filtering; Signal Processing Systems; Software Developments; Image Processing; Pattern Recognition; Optical Signal Processing; Digital Signal Processing; Multi-dimensional Signal Processing; Communication Signal Processing; Biomedical Signal Processing; Geophysical and Astrophysical Signal Processing; Earth Resources Signal Processing; Acoustic and Vibration Signal Processing; Data Processing; Remote Sensing; Signal Processing Technology; Radar Signal Processing; Sonar Signal Processing; Industrial Applications; New Applications.