{"title":"论小数据区中噪声时间序列的熵学习","authors":"Davide Bassetti, Lukáš Pospíšil, Illia Horenko","doi":"10.3390/e26070553","DOIUrl":null,"url":null,"abstract":"In this work, we present a novel methodology for performing the supervised classification of time-ordered noisy data; we call this methodology Entropic Sparse Probabilistic Approximation with Markov regularization (eSPA-Markov). It is an extension of entropic learning methodologies, allowing the simultaneous learning of segmentation patterns, entropy-optimal feature space discretizations, and Bayesian classification rules. We prove the conditions for the existence and uniqueness of the learning problem solution and propose a one-shot numerical learning algorithm that—in the leading order—scales linearly in dimension. We show how this technique can be used for the computationally scalable identification of persistent (metastable) regime affiliations and regime switches from high-dimensional non-stationary and noisy time series, i.e., when the size of the data statistics is small compared to their dimensionality and when the noise variance is larger than the variance in the signal. We demonstrate its performance on a set of toy learning problems, comparing eSPA-Markov to state-of-the-art techniques, including deep learning and random forests. We show how this technique can be used for the analysis of noisy time series from DNA and RNA Nanopore sequencing.","PeriodicalId":11694,"journal":{"name":"Entropy","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Entropic Learning from Noisy Time Series in the Small Data Regime\",\"authors\":\"Davide Bassetti, Lukáš Pospíšil, Illia Horenko\",\"doi\":\"10.3390/e26070553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we present a novel methodology for performing the supervised classification of time-ordered noisy data; we call this methodology Entropic Sparse Probabilistic Approximation with Markov regularization (eSPA-Markov). It is an extension of entropic learning methodologies, allowing the simultaneous learning of segmentation patterns, entropy-optimal feature space discretizations, and Bayesian classification rules. We prove the conditions for the existence and uniqueness of the learning problem solution and propose a one-shot numerical learning algorithm that—in the leading order—scales linearly in dimension. We show how this technique can be used for the computationally scalable identification of persistent (metastable) regime affiliations and regime switches from high-dimensional non-stationary and noisy time series, i.e., when the size of the data statistics is small compared to their dimensionality and when the noise variance is larger than the variance in the signal. We demonstrate its performance on a set of toy learning problems, comparing eSPA-Markov to state-of-the-art techniques, including deep learning and random forests. We show how this technique can be used for the analysis of noisy time series from DNA and RNA Nanopore sequencing.\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e26070553\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e26070553","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
在这项工作中,我们提出了一种新方法,用于对有时间顺序的噪声数据进行监督分类;我们称这种方法为带马尔科夫正则化的熵稀疏概率逼近法(eSPA-Markov)。它是熵学习方法的扩展,允许同时学习分割模式、熵优化特征空间离散化和贝叶斯分类规则。我们证明了学习问题解的存在性和唯一性条件,并提出了一种单次数值学习算法,该算法在前序维度上呈线性扩展。我们展示了这种技术如何用于从高维非平稳和高噪声时间序列中,即当数据统计量的大小与维度相比较小时,以及当噪声方差大于信号方差时,以可计算扩展的方式识别持续(可转移)的制度隶属关系和制度转换。我们在一组玩具学习问题上展示了 eSPA-Markov 的性能,并将其与深度学习和随机森林等最先进的技术进行了比较。我们展示了该技术如何用于分析来自 DNA 和 RNA Nanopore 测序的噪声时间序列。
On Entropic Learning from Noisy Time Series in the Small Data Regime
In this work, we present a novel methodology for performing the supervised classification of time-ordered noisy data; we call this methodology Entropic Sparse Probabilistic Approximation with Markov regularization (eSPA-Markov). It is an extension of entropic learning methodologies, allowing the simultaneous learning of segmentation patterns, entropy-optimal feature space discretizations, and Bayesian classification rules. We prove the conditions for the existence and uniqueness of the learning problem solution and propose a one-shot numerical learning algorithm that—in the leading order—scales linearly in dimension. We show how this technique can be used for the computationally scalable identification of persistent (metastable) regime affiliations and regime switches from high-dimensional non-stationary and noisy time series, i.e., when the size of the data statistics is small compared to their dimensionality and when the noise variance is larger than the variance in the signal. We demonstrate its performance on a set of toy learning problems, comparing eSPA-Markov to state-of-the-art techniques, including deep learning and random forests. We show how this technique can be used for the analysis of noisy time series from DNA and RNA Nanopore sequencing.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.