一个优化的张量补全库为多个gpu

Ming Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, D. Qian
{"title":"一个优化的张量补全库为多个gpu","authors":"Ming Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, D. Qian","doi":"10.1145/3447818.3460692","DOIUrl":null,"url":null,"abstract":"Tensor computations are gaining wide adoption in big data analysis and artificial intelligence. Among them, tensor completion is used to predict the missing or unobserved value in tensors. The decomposition-based tensor completion algorithms have attracted significant research attention since they exhibit better parallelization and scalability. However, existing optimization techniques for tensor completion cannot sustain the increasing demand for applying tensor completion on ever larger tensor data. To address the above limitations, we develop the first tensor completion library cuTC on multiple Graphics Processing Units (GPUs) with three widely used optimization algorithms such as alternating least squares (ALS), stochastic gradient descent (SGD) and coordinate descent (CCD+). We propose a novel TB-COO format that leverages warp shuffle and shared memory on GPU to enable efficient reduction. In addition, we adopt the auto-tuning method to determine the optimal parameters for better convergence and performance. We compare cuTC with state-of-the-art tensor completion libraries on real-world datasets, and the results show cuTC achieves significant speedup with similar or even better accuracy.","PeriodicalId":73273,"journal":{"name":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An optimized tensor completion library for multiple GPUs\",\"authors\":\"Ming Dun, Yunchun Li, Hailong Yang, Qingxiao Sun, Zhongzhi Luan, D. Qian\",\"doi\":\"10.1145/3447818.3460692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tensor computations are gaining wide adoption in big data analysis and artificial intelligence. Among them, tensor completion is used to predict the missing or unobserved value in tensors. The decomposition-based tensor completion algorithms have attracted significant research attention since they exhibit better parallelization and scalability. However, existing optimization techniques for tensor completion cannot sustain the increasing demand for applying tensor completion on ever larger tensor data. To address the above limitations, we develop the first tensor completion library cuTC on multiple Graphics Processing Units (GPUs) with three widely used optimization algorithms such as alternating least squares (ALS), stochastic gradient descent (SGD) and coordinate descent (CCD+). We propose a novel TB-COO format that leverages warp shuffle and shared memory on GPU to enable efficient reduction. In addition, we adopt the auto-tuning method to determine the optimal parameters for better convergence and performance. We compare cuTC with state-of-the-art tensor completion libraries on real-world datasets, and the results show cuTC achieves significant speedup with similar or even better accuracy.\",\"PeriodicalId\":73273,\"journal\":{\"name\":\"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3447818.3460692\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447818.3460692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

张量计算在大数据分析和人工智能中得到了广泛的应用。其中,张量补全用于预测张量中的缺失值或未观测值。基于分解的张量补全算法由于具有更好的并行性和可扩展性而引起了广泛的研究关注。然而,现有的张量补全优化技术无法满足在越来越大的张量数据上应用张量补全的需求。为了解决上述限制,我们在多个图形处理单元(gpu)上开发了第一个张量补全库cuTC,其中包括三种广泛使用的优化算法,如交替最小二乘(ALS),随机梯度下降(SGD)和坐标下降(CCD+)。我们提出了一种新的TB-COO格式,它利用GPU上的warp shuffle和共享内存来实现有效的缩减。此外,我们采用自调谐方法来确定最优参数,以获得更好的收敛性和性能。我们将cuTC与现实世界数据集上最先进的张量补全库进行了比较,结果表明cuTC在相似甚至更好的精度下实现了显著的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An optimized tensor completion library for multiple GPUs
Tensor computations are gaining wide adoption in big data analysis and artificial intelligence. Among them, tensor completion is used to predict the missing or unobserved value in tensors. The decomposition-based tensor completion algorithms have attracted significant research attention since they exhibit better parallelization and scalability. However, existing optimization techniques for tensor completion cannot sustain the increasing demand for applying tensor completion on ever larger tensor data. To address the above limitations, we develop the first tensor completion library cuTC on multiple Graphics Processing Units (GPUs) with three widely used optimization algorithms such as alternating least squares (ALS), stochastic gradient descent (SGD) and coordinate descent (CCD+). We propose a novel TB-COO format that leverages warp shuffle and shared memory on GPU to enable efficient reduction. In addition, we adopt the auto-tuning method to determine the optimal parameters for better convergence and performance. We compare cuTC with state-of-the-art tensor completion libraries on real-world datasets, and the results show cuTC achieves significant speedup with similar or even better accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信