Accelerated SGD for Tensor Decomposition of Sparse Count Data

Huan He, Yuanzhe Xi, Joyce C. Ho
{"title":"Accelerated SGD for Tensor Decomposition of Sparse Count Data","authors":"Huan He, Yuanzhe Xi, Joyce C. Ho","doi":"10.1109/ICDMW51313.2020.00047","DOIUrl":null,"url":null,"abstract":"The rapid growth in the collection of high-dimensional data has led to the emergence of tensor decomposition, a powerful analysis method for the exploration of multidimensional data. Since tensor decomposition can extract hidden structures and capture underlying relationships between variables, it has been used successfully across a broad range of applications. However, tensor decomposition is a computationally expensive task, and existing methods developed to decompose large sparse tensors of count data are not efficient enough when being performed with limited computing resources. Therefore, we propose AS-CP, a novel algorithm to accelerate convergence of the stochastic gradient descent based CANDECOMP/PARAFAC (CP) decomposition model through an extrapolation method. The proposed framework can be easily parallelized in an asynchronous way. Our empirical results on three real-world datasets demonstrate that AS-CP decreases the total computation time and scales readily to large datasets without necessitating a high-performance computing platform or environment. The advantage of AS-CP over several state-of-the-art methods is also shown through a machine learning task as the computed factors by AS-CP can help identify better clinical characteristics from EHR data.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"181 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid growth in the collection of high-dimensional data has led to the emergence of tensor decomposition, a powerful analysis method for the exploration of multidimensional data. Since tensor decomposition can extract hidden structures and capture underlying relationships between variables, it has been used successfully across a broad range of applications. However, tensor decomposition is a computationally expensive task, and existing methods developed to decompose large sparse tensors of count data are not efficient enough when being performed with limited computing resources. Therefore, we propose AS-CP, a novel algorithm to accelerate convergence of the stochastic gradient descent based CANDECOMP/PARAFAC (CP) decomposition model through an extrapolation method. The proposed framework can be easily parallelized in an asynchronous way. Our empirical results on three real-world datasets demonstrate that AS-CP decreases the total computation time and scales readily to large datasets without necessitating a high-performance computing platform or environment. The advantage of AS-CP over several state-of-the-art methods is also shown through a machine learning task as the computed factors by AS-CP can help identify better clinical characteristics from EHR data.
稀疏计数数据张量分解的加速SGD
高维数据收集的快速增长导致了张量分解的出现,这是一种探索多维数据的强大分析方法。由于张量分解可以提取隐藏的结构并捕获变量之间的潜在关系,因此它已经成功地应用于广泛的应用中。然而,张量分解是一项计算成本很高的任务,现有的用于分解计数数据的大型稀疏张量的方法在计算资源有限的情况下效率不够高。因此,我们提出了一种新的AS-CP算法,通过外推法加速基于随机梯度下降的CANDECOMP/PARAFAC (CP)分解模型的收敛。提出的框架可以很容易地以异步方式并行化。我们在三个真实数据集上的经验结果表明,AS-CP减少了总计算时间,并且很容易扩展到大型数据集,而不需要高性能的计算平台或环境。as - cp比几种最先进的方法的优势也通过机器学习任务显示出来,因为as - cp计算的因素可以帮助从EHR数据中识别更好的临床特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信