小数据分类的量纲最优近似学习

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2024-03-10 DOI:10.1162/neco_a_01664

Edoardo Vecchi;Davide Bassetti;Fabio Graziato;Lukáš Pospíšil;Illia Horenko

{"title":"小数据分类的量纲最优近似学习","authors":"Edoardo Vecchi;Davide Bassetti;Fabio Graziato;Lukáš Pospíšil;Illia Horenko","doi":"10.1162/neco_a_01664","DOIUrl":null,"url":null,"abstract":"Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents—under the assumption of a discrete segmentation of the feature space—a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1198-1227"},"PeriodicalIF":2.1000,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gauge-Optimal Approximate Learning for Small Data Classification\",\"authors\":\"Edoardo Vecchi;Davide Bassetti;Fabio Graziato;Lukáš Pospíšil;Illia Horenko\",\"doi\":\"10.1162/neco_a_01664\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents—under the assumption of a discrete segmentation of the feature space—a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"36 6\",\"pages\":\"1198-1227\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10661258/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661258/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

小数据学习问题的特点是，有限的响应变量观测数据与庞大的特征空间维度之间存在巨大差异。在这种情况下，普通的学习工具很难从不具相关信息的特征中识别出对分类任务重要的特征，也无法得出适当的学习规则来区分不同的类别。作为这一问题的潜在解决方案，我们在这里利用了在低维尺度上缩小和旋转特征空间的想法，并提出了尺度最优近似学习（GOAL）算法，它为小数据学习问题中的维度缩小、特征分割和分类问题提供了一种可分析的联合解决方案。我们证明，GOAL 算法的最优解由欧几里得空间中的片断线性函数组成，它可以通过一种单调收敛的算法来近似，该算法在特征空间离散分割的假设下，为每个优化子步骤和整体线性迭代成本缩放提供了闭式解。在合成数据以及气候科学和生物信息学等具有挑战性的实际应用（即预测厄尔尼诺南方涛动和从有限的实验数据推断表观遗传诱导的基因活动网络）上，GOAL 算法与其他最先进的机器学习工具进行了比较。实验结果表明，在这些问题上，所提出的算法在学习性能和计算成本上都优于已报道的最佳竞争对手。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Gauge-Optimal Approximate Learning for Small Data Classification

Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents—under the assumption of a discrete segmentation of the feature space—a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.