面向降维、特征选择和网络稀疏化的最小算法信息损失方法

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-07-24 DOI:10.1016/j.ins.2025.122520

Hector Zenil , Narsis A. Kiani , Alyssa Adams , Felipe S. Abrahão , Antonio Rueda-Toicen , Allan A. Zea , Luan Ozelim , Jesper Tegnér

{"title":"面向降维、特征选择和网络稀疏化的最小算法信息损失方法","authors":"Hector Zenil , Narsis A. Kiani , Alyssa Adams , Felipe S. Abrahão , Antonio Rueda-Toicen , Allan A. Zea , Luan Ozelim , Jesper Tegnér","doi":"10.1016/j.ins.2025.122520","DOIUrl":null,"url":null,"abstract":"<div><div>We present a novel, domain-agnostic, model-independent, unsupervised, and universally applicable Machine Learning approach for dimensionality reduction based on the principles of algorithmic complexity. Specifically, but without loss of generality, we focus on addressing the challenge of reducing certain dimensionality aspects, such as the number of edges in a network, while retaining essential features of interest. These features include preserving crucial network properties like degree distribution, clustering coefficient, edge betweenness, and degree and eigenvector centralities but can also go beyond edges to nodes and weights for network pruning and trimming. Our approach outperforms classical statistical Machine Learning techniques and state-of-the-art dimensionality reduction algorithms by preserving a greater number of data features that statistical algorithms would miss, particularly nonlinear patterns stemming from deterministic recursive processes that may look statistically random but are not. Moreover, previous approaches heavily rely on a priori feature selection, which requires constant supervision. Our findings demonstrate the effectiveness of the algorithms in overcoming some of these limitations while maintaining a time-efficient computational profile. Our approach not only matches, but also exceeds, the performance of established and state-of-the-art dimensionality reduction algorithms. We extend the applicability of our method to lossy compression tasks involving images and any multi-dimensional data. This highlights the versatility and broad utility of the approach in multiple domains.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"720 ","pages":"Article 122520"},"PeriodicalIF":8.1000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification\",\"authors\":\"Hector Zenil , Narsis A. Kiani , Alyssa Adams , Felipe S. Abrahão , Antonio Rueda-Toicen , Allan A. Zea , Luan Ozelim , Jesper Tegnér\",\"doi\":\"10.1016/j.ins.2025.122520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>We present a novel, domain-agnostic, model-independent, unsupervised, and universally applicable Machine Learning approach for dimensionality reduction based on the principles of algorithmic complexity. Specifically, but without loss of generality, we focus on addressing the challenge of reducing certain dimensionality aspects, such as the number of edges in a network, while retaining essential features of interest. These features include preserving crucial network properties like degree distribution, clustering coefficient, edge betweenness, and degree and eigenvector centralities but can also go beyond edges to nodes and weights for network pruning and trimming. Our approach outperforms classical statistical Machine Learning techniques and state-of-the-art dimensionality reduction algorithms by preserving a greater number of data features that statistical algorithms would miss, particularly nonlinear patterns stemming from deterministic recursive processes that may look statistically random but are not. Moreover, previous approaches heavily rely on a priori feature selection, which requires constant supervision. Our findings demonstrate the effectiveness of the algorithms in overcoming some of these limitations while maintaining a time-efficient computational profile. Our approach not only matches, but also exceeds, the performance of established and state-of-the-art dimensionality reduction algorithms. We extend the applicability of our method to lossy compression tasks involving images and any multi-dimensional data. This highlights the versatility and broad utility of the approach in multiple domains.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"720 \",\"pages\":\"Article 122520\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525006528\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525006528","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种新颖的、领域不可知的、模型独立的、无监督的、普遍适用的基于算法复杂性原理的降维机器学习方法。具体地说，在不失去一般性的情况下，我们专注于解决降低某些维度方面的挑战，例如网络中边的数量，同时保留感兴趣的基本特征。这些特征包括保留关键的网络属性，如度分布、聚类系数、边缘之间、度和特征向量中心性，但也可以超越边缘到节点和权重进行网络修剪和修剪。我们的方法优于经典的统计机器学习技术和最先进的降维算法，因为它保留了统计算法可能遗漏的更多数据特征，特别是源于确定性递归过程的非线性模式，这些模式可能看起来是统计随机的，但实际上并非如此。此外，以前的方法严重依赖于先验的特征选择，这需要不断的监督。我们的研究结果证明了算法在克服这些限制的同时保持时间效率的计算轮廓的有效性。我们的方法不仅匹配，而且超过了现有的最先进的降维算法的性能。我们将我们的方法扩展到涉及图像和任何多维数据的有损压缩任务的适用性。这突出了该方法在多个领域中的通用性和广泛实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification

We present a novel, domain-agnostic, model-independent, unsupervised, and universally applicable Machine Learning approach for dimensionality reduction based on the principles of algorithmic complexity. Specifically, but without loss of generality, we focus on addressing the challenge of reducing certain dimensionality aspects, such as the number of edges in a network, while retaining essential features of interest. These features include preserving crucial network properties like degree distribution, clustering coefficient, edge betweenness, and degree and eigenvector centralities but can also go beyond edges to nodes and weights for network pruning and trimming. Our approach outperforms classical statistical Machine Learning techniques and state-of-the-art dimensionality reduction algorithms by preserving a greater number of data features that statistical algorithms would miss, particularly nonlinear patterns stemming from deterministic recursive processes that may look statistically random but are not. Moreover, previous approaches heavily rely on a priori feature selection, which requires constant supervision. Our findings demonstrate the effectiveness of the algorithms in overcoming some of these limitations while maintaining a time-efficient computational profile. Our approach not only matches, but also exceeds, the performance of established and state-of-the-art dimensionality reduction algorithms. We extend the applicability of our method to lossy compression tasks involving images and any multi-dimensional data. This highlights the versatility and broad utility of the approach in multiple domains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.