Facebook基于dnn的个性化推荐的架构含义

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-06-06 DOI:10.1109/HPCA47549.2020.00047

Udit Gupta, Xiaodong Wang, M. Naumov, Carole-Jean Wu, Brandon Reagen, D. Brooks, Bradford Cottel, K. Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, M. Smelyanskiy, Liang Xiong, Xuan Zhang

{"title":"Facebook基于dnn的个性化推荐的架构含义","authors":"Udit Gupta, Xiaodong Wang, M. Naumov, Carole-Jean Wu, Brandon Reagen, D. Brooks, Bradford Cottel, K. Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, M. Smelyanskiy, Liang Xiong, Xuan Zhang","doi":"10.1109/HPCA47549.2020.00047","DOIUrl":null,"url":null,"abstract":"The widespread application of deep learning has changed the landscape of computation in data centers. In particular, personalized recommendation for content ranking is now largely accomplished using deep neural networks. However, despite their importance and the amount of compute cycles they consume, relatively little research attention has been devoted to recommendation systems. To facilitate research and advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inference jobs can drastically improve latency-bounded throughput, and diversity across recommendation models leads to different optimization strategies.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"216","resultStr":"{\"title\":\"The Architectural Implications of Facebook's DNN-Based Personalized Recommendation\",\"authors\":\"Udit Gupta, Xiaodong Wang, M. Naumov, Carole-Jean Wu, Brandon Reagen, D. Brooks, Bradford Cottel, K. Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, M. Smelyanskiy, Liang Xiong, Xuan Zhang\",\"doi\":\"10.1109/HPCA47549.2020.00047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The widespread application of deep learning has changed the landscape of computation in data centers. In particular, personalized recommendation for content ranking is now largely accomplished using deep neural networks. However, despite their importance and the amount of compute cycles they consume, relatively little research attention has been devoted to recommendation systems. To facilitate research and advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inference jobs can drastically improve latency-bounded throughput, and diversity across recommendation models leads to different optimization strategies.\",\"PeriodicalId\":339648,\"journal\":{\"name\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"216\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA47549.2020.00047\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 216

摘要

深度学习的广泛应用已经改变了数据中心计算的格局。特别是，内容排名的个性化推荐现在主要是使用深度神经网络完成的。然而，尽管它们的重要性和它们消耗的计算周期的数量，相对较少的研究关注已经投入到推荐系统。为了促进研究和促进对这些工作负载的理解，本文提出了一组真实世界的、生产规模的深度神经网络，用于个性化推荐，并结合相关的性能指标进行评估。除了发布一组开源工作负载外，我们还进行了深入的分析，为大规模推荐的未来系统设计和优化奠定了基础:推断延迟在三代英特尔服务器中变化60%，推理作业的批处理和协同定位可以大大提高延迟限制的吞吐量，并且推荐模型的多样性导致了不同的优化策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Architectural Implications of Facebook's DNN-Based Personalized Recommendation

The widespread application of deep learning has changed the landscape of computation in data centers. In particular, personalized recommendation for content ranking is now largely accomplished using deep neural networks. However, despite their importance and the amount of compute cycles they consume, relatively little research attention has been devoted to recommendation systems. To facilitate research and advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inference jobs can drastically improve latency-bounded throughput, and diversity across recommendation models leads to different optimization strategies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量