Sketched Stochastic Dictionary Learning for large‐scale data and application to high‐throughput mass spectrometry

Statistical Analysis and Data Mining: The ASA Data Science Journal Pub Date : 2021-08-20 DOI:10.1002/sam.11542

O. Permiakova, T. Burger

引用次数: 2

Abstract

Factorization of large data corpora has emerged as an essential technique to extract dictionaries (sets of patterns that are meaningful for sparse encoding). Following this line, we present a novel algorithm based on compressive learning theory. In this framework, the (arbitrarily large) dataset of interest is replaced by a fixed‐size sketch resulting from a random sampling of the data distribution characteristic function. We apply our algorithm to the extraction of chromatographic elution profiles in mass spectrometry data, where it demonstrates its efficiency and interest compared to other related algorithms.

查看原文本刊更多论文

大规模数据的随机字典学习和高通量质谱的应用

大型数据语料库的分解已经成为提取字典(对稀疏编码有意义的模式集)的基本技术。在此基础上，我们提出了一种基于压缩学习理论的新算法。在这个框架中，感兴趣的(任意大的)数据集被由数据分布特征函数的随机抽样产生的固定大小的草图所取代。我们将我们的算法应用于质谱数据中色谱洗脱剖面的提取，与其他相关算法相比，它显示了它的效率和兴趣。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Analysis and Data Mining: The ASA Data Science Journal

自引率

0.00%

发文量