数据效率、降维与广义对称信息瓶颈

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2024-06-07 DOI:10.1162/neco_a_01667

K. Michael Martini;Ilya Nemenman

{"title":"数据效率、降维与广义对称信息瓶颈","authors":"K. Michael Martini;Ilya Nemenman","doi":"10.1162/neco_a_01667","DOIUrl":null,"url":null,"abstract":"The symmetric information bottleneck (SIB), an extension of the more familiar information bottleneck, is a dimensionality-reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the generalized symmetric information bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the data set size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 7","pages":"1353-1379"},"PeriodicalIF":2.7000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Efficiency, Dimensionality Reduction, and the Generalized Symmetric Information Bottleneck\",\"authors\":\"K. Michael Martini;Ilya Nemenman\",\"doi\":\"10.1162/neco_a_01667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The symmetric information bottleneck (SIB), an extension of the more familiar information bottleneck, is a dimensionality-reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the generalized symmetric information bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the data set size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"36 7\",\"pages\":\"1353-1379\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10661261/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661261/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

对称信息瓶颈（SIB）是人们更熟悉的信息瓶颈的扩展，是一种同时压缩两个随机变量以保留其压缩版本之间信息的降维技术。我们引入了广义对称信息瓶颈（GSIB），探讨了这种同步缩减成本的不同函数形式。然后，我们探讨了这种同步压缩对数据集大小的要求。为此，我们推导出了相关损失函数统计波动的边界和均方根估计值。我们发现，在典型情况下，与逐个压缩变量相比，同时压缩 GSIB 所需的数据要少得多，才能达到相同的误差。我们认为，这是一个更普遍原理的例子，即同时压缩比独立压缩每个输入变量更有效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data Efficiency, Dimensionality Reduction, and the Generalized Symmetric Information Bottleneck

The symmetric information bottleneck (SIB), an extension of the more familiar information bottleneck, is a dimensionality-reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the generalized symmetric information bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the data set size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.