DL Inference and Training Optimization Towards Speed and Scale

Companion Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442442.3452297

Minjia Zhang

引用次数: 0

Abstract

The application of deep learning models presents significant improvement to many services and products in Microsoft. However, it is challenging to provide efficient computation and memory capabilities for both DNN workload inference and training given that the model size and complexities keep increasing. From the serving aspect, many DL models suffer from long inference latency and high cost, preventing their deployment in production. On the training side, large-scale model training often requires complex refactoring of models and access to prohibitively expensive GPU clusters, which are not always accessible to many practitioners. We want to deliver solid solutions and systems while exploring the cutting-edge techniques to address these issues. In this talk, I will introduce our experience and lessons from designing and implementing optimizations for both DNN serving and training at large scale with remarkable compute and memory efficiency improvement and infrastructure cost reduction.

查看原文本刊更多论文

面向速度和规模的深度学习推理和训练优化

深度学习模型的应用为微软的许多服务和产品带来了显著的改进。然而，由于模型的大小和复杂性不断增加，为DNN工作负载推理和训练提供高效的计算和存储能力是具有挑战性的。从服务方面来看，许多深度学习模型存在推理延迟长、成本高的问题，阻碍了它们在生产环境中的部署。在训练方面，大规模模型训练通常需要复杂的模型重构和访问昂贵的GPU集群，这对于许多从业者来说并不总是可以访问的。我们希望提供可靠的解决方案和系统，同时探索解决这些问题的尖端技术。在这次演讲中，我将介绍我们在设计和实现深度神经网络服务和大规模训练的优化方面的经验和教训，这些优化显著提高了计算和内存效率，降低了基础设施成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion Proceedings of the Web Conference 2021

自引率

0.00%

发文量