Statistically Valid Information Bottleneck via Multiple Hypothesis Testing

arXiv - MATH - Information Theory Pub Date : 2024-09-11 DOI:arxiv-2409.07325

Amirmohammad Farzaneh, Osvaldo Simeone

引用次数: 0

Abstract

The information bottleneck (IB) problem is a widely studied framework in machine learning for extracting compressed features that are informative for downstream tasks. However, current approaches to solving the IB problem rely on a heuristic tuning of hyperparameters, offering no guarantees that the learned features satisfy information-theoretic constraints. In this work, we introduce a statistically valid solution to this problem, referred to as IB via multiple hypothesis testing (IB-MHT), which ensures that the learned features meet the IB constraints with high probability, regardless of the size of the available dataset. The proposed methodology builds on Pareto testing and learn-then-test (LTT), and it wraps around existing IB solvers to provide statistical guarantees on the IB constraints. We demonstrate the performance of IB-MHT on classical and deterministic IB formulations, validating the effectiveness of IB-MHT in outperforming conventional methods in terms of statistical robustness and reliability.

查看原文本刊更多论文

通过多重假设检验的统计有效信息瓶颈

信息瓶颈（IB）问题是机器学习中一个被广泛研究的框架，用于提取对下游任务具有信息意义的压缩特征。然而，目前解决 IB 问题的方法依赖于超参数的启发式调整，无法保证学习到的特征满足信息论约束。在这项工作中，我们针对这一问题提出了一种统计上有效的解决方案，称为通过多重假设检验的 IB（IB-MHT），无论可用数据集的大小如何，它都能确保学习到的特征高概率地满足 IB 约束条件。所提出的方法建立在帕累托测试和先学习后测试（LTT）的基础上，并与现有的 IB 求解器相结合，为 IB 约束条件提供统计保证。我们演示了 IB-MHT 在经典和确定性 IB 公式上的性能，验证了 IB-MHT 在统计稳健性和可靠性方面优于传统方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - MATH - Information Theory

自引率

0.00%

发文量