BiPS:热感知推荐模型的双层参数同步

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2021-05-01 DOI:10.1109/IPDPS49936.2021.00069

Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, M. Guo

{"title":"BiPS:热感知推荐模型的双层参数同步","authors":"Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, M. Guo","doi":"10.1109/IPDPS49936.2021.00069","DOIUrl":null,"url":null,"abstract":"While current deep learning frameworks are mainly optimized for dense-accessed models, they show low throughput and poor scalability in training sparse-accessed recommendation models. Our investigation shows that the poor performance is due to the parameter synchronization bottleneck. We therefore propose BiPS, a bi-tier parameter synchronization system that alleviates the parameter update and the sparse-accessed parameters communication bottleneck. BiPS includes a bi-tier parameter server that accelerates the traditional CPU-based parameter update process, a hotness-aware parameter placement and communication policy to balance the workloads between CPU and GPU and optimize the communication of sparse-accessed parameters. BiPS overlaps the worker computation with the synchronization stage to enable parameter updates in advance. We implement BiPS and incorporate it into mainstream DL frameworks including TensorFlow, MXNet, and PyTorch. The experimental results based on various deep learning frameworks show that BiPS greatly speeds up the training of recommenders (5 - 9$\\times$) as the model scale increases, without degrading the accuracy.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation Models\",\"authors\":\"Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, M. Guo\",\"doi\":\"10.1109/IPDPS49936.2021.00069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While current deep learning frameworks are mainly optimized for dense-accessed models, they show low throughput and poor scalability in training sparse-accessed recommendation models. Our investigation shows that the poor performance is due to the parameter synchronization bottleneck. We therefore propose BiPS, a bi-tier parameter synchronization system that alleviates the parameter update and the sparse-accessed parameters communication bottleneck. BiPS includes a bi-tier parameter server that accelerates the traditional CPU-based parameter update process, a hotness-aware parameter placement and communication policy to balance the workloads between CPU and GPU and optimize the communication of sparse-accessed parameters. BiPS overlaps the worker computation with the synchronization stage to enable parameter updates in advance. We implement BiPS and incorporate it into mainstream DL frameworks including TensorFlow, MXNet, and PyTorch. The experimental results based on various deep learning frameworks show that BiPS greatly speeds up the training of recommenders (5 - 9$\\\\times$) as the model scale increases, without degrading the accuracy.\",\"PeriodicalId\":372234,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"130 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS49936.2021.00069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

目前的深度学习框架主要针对密集访问模型进行优化，在训练稀疏访问推荐模型时表现出较低的吞吐量和较差的可扩展性。我们的研究表明，性能差是由于参数同步瓶颈造成的。因此，我们提出了一种双层参数同步系统BiPS，以缓解参数更新和稀疏访问参数通信的瓶颈。BiPS包括一个双层参数服务器，它加速了传统的基于CPU的参数更新过程，一个热感知参数放置和通信策略，以平衡CPU和GPU之间的工作负载，并优化稀疏访问参数的通信。BiPS将worker计算与同步阶段重叠，以便提前更新参数。我们实现了BiPS，并将其整合到主流的深度学习框架中，包括TensorFlow, MXNet和PyTorch。基于各种深度学习框架的实验结果表明，随着模型规模的增加，BiPS极大地加快了推荐器的训练速度(5 - 9$\times$)，而不降低准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation Models

While current deep learning frameworks are mainly optimized for dense-accessed models, they show low throughput and poor scalability in training sparse-accessed recommendation models. Our investigation shows that the poor performance is due to the parameter synchronization bottleneck. We therefore propose BiPS, a bi-tier parameter synchronization system that alleviates the parameter update and the sparse-accessed parameters communication bottleneck. BiPS includes a bi-tier parameter server that accelerates the traditional CPU-based parameter update process, a hotness-aware parameter placement and communication policy to balance the workloads between CPU and GPU and optimize the communication of sparse-accessed parameters. BiPS overlaps the worker computation with the synchronization stage to enable parameter updates in advance. We implement BiPS and incorporate it into mainstream DL frameworks including TensorFlow, MXNet, and PyTorch. The experimental results based on various deep learning frameworks show that BiPS greatly speeds up the training of recommenders (5 - 9$\times$) as the model scale increases, without degrading the accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量