协同过滤中的简洁表示:在 1,000 个内核上使用小波树的案例研究

Xiangjun Peng, Qingfeng Wang, Xu Sun, Chunye Gong, Yaohua Wang
{"title":"协同过滤中的简洁表示:在 1,000 个内核上使用小波树的案例研究","authors":"Xiangjun Peng, Qingfeng Wang, Xu Sun, Chunye Gong, Yaohua Wang","doi":"10.1109/PDCAT46702.2019.00083","DOIUrl":null,"url":null,"abstract":"User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Succinct Representations in Collaborative Filtering: A Case Study using Wavelet Tree on 1,000 Cores\",\"authors\":\"Xiangjun Peng, Qingfeng Wang, Xu Sun, Chunye Gong, Yaohua Wang\",\"doi\":\"10.1109/PDCAT46702.2019.00083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.\",\"PeriodicalId\":166126,\"journal\":{\"name\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT46702.2019.00083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

用户-项目(U-I)矩阵一直被用作协同过滤(CF)的主要数据基础结构。为了减少运行时和存储时的空间消耗(这是由数据稀疏性和协同过滤设计中日益增长的容纳边信息的需求造成的),我们需要超越 U-I 矩阵。在本文中,我们对协同过滤中的简洁表示法进行了案例研究,而不是使用 U-I 矩阵。我们的主要观点是引入简洁数据结构作为协同过滤的新基础架构。为此,我们通过小波树实现了一个基于用户的 K 近邻 CF 原型,首先设计了一个可访问压缩文件(ACD)来压缩小波树上的 U-I 数据,它在存储和运行时都很高效。然后,我们利用 ACD 的特点,证明了 ACD 可用于开发无需解压缩的高效交叉算法。我们在天河二号超级计算机的 1000 个内核上评估了我们的设计,并使用了最大的公共数据集之一 ml-20m。结果表明,我们的原型平均可以在 3.7 分钟内得出结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Succinct Representations in Collaborative Filtering: A Case Study using Wavelet Tree on 1,000 Cores
User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one needs to go beyond the U-I Matrix. In this paper, we took a case study of Succinct Representations in Collaborative Filtering, rather than using a U-I Matrix. Our key insight is to introduce Succinct Data Structures as a new infrastructure of CF. Towards this, we implemented a User-based K-Nearest-Neighbor CF prototype via Wavelet Tree, by first designing a Accessible Compressed Documents (ACD) to compress U-I data in Wavelet Tree, which is efficient in both storage and runtime. Then, we showed that ACD can be applied to develop an efficient intersection algorithm without decompression, by taking advantage of ACD's characteristics. We evaluated our design on 1,000 cores of Tianhe-II supercomputer, with one of the largest public data set ml-20m. The results showed that our prototype could achieve 3.7 minutes on average to deliver the results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信