Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, M. Guo
{"title":"BiPS:热感知推荐模型的双层参数同步","authors":"Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, M. Guo","doi":"10.1109/IPDPS49936.2021.00069","DOIUrl":null,"url":null,"abstract":"While current deep learning frameworks are mainly optimized for dense-accessed models, they show low throughput and poor scalability in training sparse-accessed recommendation models. Our investigation shows that the poor performance is due to the parameter synchronization bottleneck. We therefore propose BiPS, a bi-tier parameter synchronization system that alleviates the parameter update and the sparse-accessed parameters communication bottleneck. BiPS includes a bi-tier parameter server that accelerates the traditional CPU-based parameter update process, a hotness-aware parameter placement and communication policy to balance the workloads between CPU and GPU and optimize the communication of sparse-accessed parameters. BiPS overlaps the worker computation with the synchronization stage to enable parameter updates in advance. We implement BiPS and incorporate it into mainstream DL frameworks including TensorFlow, MXNet, and PyTorch. The experimental results based on various deep learning frameworks show that BiPS greatly speeds up the training of recommenders (5 - 9$\\times$) as the model scale increases, without degrading the accuracy.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation Models\",\"authors\":\"Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, M. Guo\",\"doi\":\"10.1109/IPDPS49936.2021.00069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While current deep learning frameworks are mainly optimized for dense-accessed models, they show low throughput and poor scalability in training sparse-accessed recommendation models. Our investigation shows that the poor performance is due to the parameter synchronization bottleneck. We therefore propose BiPS, a bi-tier parameter synchronization system that alleviates the parameter update and the sparse-accessed parameters communication bottleneck. BiPS includes a bi-tier parameter server that accelerates the traditional CPU-based parameter update process, a hotness-aware parameter placement and communication policy to balance the workloads between CPU and GPU and optimize the communication of sparse-accessed parameters. BiPS overlaps the worker computation with the synchronization stage to enable parameter updates in advance. We implement BiPS and incorporate it into mainstream DL frameworks including TensorFlow, MXNet, and PyTorch. The experimental results based on various deep learning frameworks show that BiPS greatly speeds up the training of recommenders (5 - 9$\\\\times$) as the model scale increases, without degrading the accuracy.\",\"PeriodicalId\":372234,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"130 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS49936.2021.00069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation Models
While current deep learning frameworks are mainly optimized for dense-accessed models, they show low throughput and poor scalability in training sparse-accessed recommendation models. Our investigation shows that the poor performance is due to the parameter synchronization bottleneck. We therefore propose BiPS, a bi-tier parameter synchronization system that alleviates the parameter update and the sparse-accessed parameters communication bottleneck. BiPS includes a bi-tier parameter server that accelerates the traditional CPU-based parameter update process, a hotness-aware parameter placement and communication policy to balance the workloads between CPU and GPU and optimize the communication of sparse-accessed parameters. BiPS overlaps the worker computation with the synchronization stage to enable parameter updates in advance. We implement BiPS and incorporate it into mainstream DL frameworks including TensorFlow, MXNet, and PyTorch. The experimental results based on various deep learning frameworks show that BiPS greatly speeds up the training of recommenders (5 - 9$\times$) as the model scale increases, without degrading the accuracy.