Scalable Tensor Decompositions for Multi-aspect Data Mining

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI:10.1109/ICDM.2008.89

T. Kolda, Jimeng Sun

{"title":"Scalable Tensor Decompositions for Multi-aspect Data Mining","authors":"T. Kolda, Jimeng Sun","doi":"10.1109/ICDM.2008.89","DOIUrl":null,"url":null,"abstract":"Modern applications such as Internet traffic, telecommunication records, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multi-way arrays) provide a natural representation for such data. Consequently, tensor decompositions such as Tucker become important tools for summarization and analysis. One major challenge is how to deal with high-dimensional, sparse data. In other words, how do we compute decompositions of tensors where most of the entries of the tensor are zero. Specialized techniques are needed for computing the Tucker decompositions for sparse tensors because standard algorithms do not account for the sparsity of the data. As a result, a surprising phenomenon is observed by practitioners: Despite the fact that there is enough memory to store both the input tensors and the factorized output tensors, memory overflows occur during the tensor factorization process. To address this intermediate blowup problem, we propose Memory-Efficient Tucker (MET). Based on the available memory, MET adaptively selects the right execution strategy during the decomposition. We provide quantitative and qualitative evaluation of MET on real tensors. It achieves over 1000X space reduction without sacrificing speed; it also allows us to work with much larger tensors that were too big to handle before. Finally, we demonstrate a data mining case-study using MET.","PeriodicalId":252958,"journal":{"name":"2008 Eighth IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"371","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Eighth IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2008.89","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 371

Abstract

Modern applications such as Internet traffic, telecommunication records, and large-scale social networks generate massive amounts of data with multiple aspects and high dimensionalities. Tensors (i.e., multi-way arrays) provide a natural representation for such data. Consequently, tensor decompositions such as Tucker become important tools for summarization and analysis. One major challenge is how to deal with high-dimensional, sparse data. In other words, how do we compute decompositions of tensors where most of the entries of the tensor are zero. Specialized techniques are needed for computing the Tucker decompositions for sparse tensors because standard algorithms do not account for the sparsity of the data. As a result, a surprising phenomenon is observed by practitioners: Despite the fact that there is enough memory to store both the input tensors and the factorized output tensors, memory overflows occur during the tensor factorization process. To address this intermediate blowup problem, we propose Memory-Efficient Tucker (MET). Based on the available memory, MET adaptively selects the right execution strategy during the decomposition. We provide quantitative and qualitative evaluation of MET on real tensors. It achieves over 1000X space reduction without sacrificing speed; it also allows us to work with much larger tensors that were too big to handle before. Finally, we demonstrate a data mining case-study using MET.

查看原文本刊更多论文

面向多方面数据挖掘的可伸缩张量分解

现代应用程序，如互联网流量、电信记录和大规模社交网络，产生大量多方面、高维的数据。张量(即多向数组)为这些数据提供了一种自然的表示。因此，像Tucker这样的张量分解成为总结和分析的重要工具。一个主要的挑战是如何处理高维、稀疏的数据。换句话说，我们如何计算张量的分解当张量的大部分元素为零时。由于标准算法没有考虑到数据的稀疏性，因此需要专门的技术来计算稀疏张量的Tucker分解。因此，从业者观察到一个令人惊讶的现象:尽管有足够的内存来存储输入张量和分解后的输出张量，但在张量分解过程中会发生内存溢出。为了解决这个中间爆炸问题，我们提出了Memory-Efficient Tucker (MET)。基于可用内存，MET在分解过程中自适应地选择正确的执行策略。我们提供了实张量上的MET的定量和定性评价。它在不牺牲速度的情况下实现了超过1000倍的空间缩减;它还允许我们处理更大的张量，这些张量以前太大而无法处理。最后，我们展示了一个使用MET的数据挖掘案例研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 Eighth IEEE International Conference on Data Mining

自引率

0.00%

发文量